How do you create a frequency polygon

Frequency polygons are a fantastic way to visualize data, especially when you want to see the shape of a distribution and compare it with other datasets. Think of them as a more dynamic version of a histogram. While histograms use bars to represent frequencies, frequency polygons use points connected by lines. This makes it easier to spot trends, peaks, and dips in your data.

Understanding the Building Blocks: Frequency Tables and Midpoints

Before you can draw a frequency polygon, you need to lay the groundwork. This involves two key steps:

Creating a Frequency Table: This is where you organize your raw data into classes or bins. For each class, you count how many data points fall within its range. This count is the frequency for that class.
Calculating Class Midpoints: For each class in your frequency table, you need to find its midpoint. This is simply the average of the lower and upper limits of the class. For example, if a class runs from 10 to 20, its midpoint is (10 + 20) / 2 = 15. The midpoint represents the center of that class.

The Step-by-Step Guide to Creating a Frequency Polygon

Once you have your frequency table with class midpoints, you're ready to create the polygon. Here's how:

Step 1: Set Up Your Axes

You'll need two axes for your graph:

The Horizontal Axis (X-axis): This axis will represent the class midpoints. You'll label these points. It's crucial to extend this axis slightly at both ends to "anchor" the polygon to the baseline. We'll discuss this more in Step 3.
The Vertical Axis (Y-axis): This axis will represent the frequency. It should be scaled appropriately to accommodate the highest frequency in your data.

Step 2: Plot Your Points

Now, you'll plot points on your graph. For each class in your frequency table:

The X-coordinate of your point will be the class midpoint.
The Y-coordinate of your point will be the frequency for that class.

So, if your first class has a midpoint of 5 and a frequency of 12, you'll plot a point at (5, 12).

Step 3: Connect the Points

This is where the "polygon" part comes in. You'll connect the plotted points with straight lines. However, to ensure the polygon starts and ends at zero frequency (forming a closed shape), you need to make a couple of adjustments:

Anchoring the Beginning: Before your first actual data class, imagine a class with a midpoint that is one interval *before* your first class's midpoint. The frequency for this imaginary class is zero. Plot a point at this midpoint with a frequency of zero. Connect this point to the point representing your first actual class.
Anchoring the End: Similarly, after your last actual data class, imagine a class with a midpoint that is one interval *after* your last class's midpoint. The frequency for this imaginary class is also zero. Plot a point at this midpoint with a frequency of zero. Connect the point representing your last actual class to this final zero-frequency point.

By anchoring the polygon to the X-axis at both ends, you create a complete shape that accurately represents the distribution of your data.

Step 4: Label Your Graph

A well-labeled graph is essential for understanding. Make sure to include:

A clear title for the frequency polygon.
Labels for both the X-axis (e.g., "Class Midpoints" or the specific variable being measured) and the Y-axis (e.g., "Frequency" or "Number of Observations").

Example Scenario

Let's say we have the following data on the ages of people at a community event:

15, 18, 22, 25, 28, 30, 32, 35, 38, 40, 42, 45, 48, 50, 52, 55, 58, 60

We can group this data into classes, for instance, 10-year intervals:

10-19: Frequency = 2 (15, 18)
20-29: Frequency = 5 (22, 25, 28, 30, 32)
30-39: Frequency = 5 (35, 38, 40, 42, 45)
40-49: Frequency = 4 (48, 50, 52, 55)
50-59: Frequency = 2 (58, 60)

Now, calculate the midpoints:

10-19: Midpoint = (10 + 19) / 2 = 14.5
20-29: Midpoint = (20 + 29) / 2 = 24.5
30-39: Midpoint = (30 + 39) / 2 = 34.5
40-49: Midpoint = (40 + 49) / 2 = 44.5
50-59: Midpoint = (50 + 59) / 2 = 54.5

To anchor the polygon:

Before the first class (10-19), we imagine a class with a midpoint of 4.5 (14.5 - 10). Its frequency is 0.
After the last class (50-59), we imagine a class with a midpoint of 64.5 (54.5 + 10). Its frequency is 0.

We would then plot points for (4.5, 0), (14.5, 2), (24.5, 5), (34.5, 5), (44.5, 4), (54.5, 2), and (64.5, 0), and connect them with lines.

Why Use a Frequency Polygon?

Frequency polygons are particularly useful when:

You want to compare the shapes of two or more distributions on the same graph. Overlaying multiple frequency polygons allows for direct visual comparison of their forms.
You have a large dataset and want a clear, smooth representation of its distribution. The connecting lines can smooth out some of the jaggedness that might be present in a histogram, making trends more apparent.
You are interested in identifying the central tendency, spread, and skewness of your data.

Frequently Asked Questions (FAQ)

How do you choose the number of classes for a frequency polygon?

The number of classes, or bins, for a frequency polygon is a choice that can affect the appearance of the polygon. Generally, you want enough classes to show the shape of the distribution without having too many classes with very low or zero frequencies, which can make the graph look sparse. A common guideline is to have between 5 and 20 classes, but this can vary depending on the size of your dataset. Sometimes, statistical software or specific formulas (like Sturges' Rule) can help suggest an appropriate number of classes.

Why do we anchor the frequency polygon to the x-axis at both ends?

Anchoring the frequency polygon to the x-axis at both ends is crucial for creating a closed shape that accurately represents the entire distribution of the data. By adding imaginary classes with zero frequency before the first actual class and after the last actual class, we ensure that the polygon starts and ends at zero. This makes it clear that there are no data points outside this range, and it allows for accurate comparison with other frequency polygons that are similarly anchored.

What's the difference between a frequency polygon and a histogram?

The main difference lies in how the frequencies are represented. A histogram uses adjacent bars to represent the frequencies of data within specific intervals. The width of the bars represents the class interval, and the height represents the frequency. A frequency polygon, on the other hand, uses points plotted at the midpoint of each class interval, with the height of the point corresponding to the frequency. These points are then connected by straight lines. This line-based representation makes frequency polygons particularly good for comparing multiple distributions on the same graph.