Explore histograms

Applies to
Microsoft Office Excel 2003
Microsoft Excel 2002
Book cover This article was adapted from Microsoft Office Excel 2003 Inside Out by Craig Stinson and Mark Dodge. Visit Microsoft Learning to buy this book.

So what exactly is a histogram and how is it useful? A histogram is a chart (usually a simple column chart) that takes a collection of measurements and plots the number of measurements (called the frequency) that fall within each of several intervals (called bins). For example, a histogram can show you exactly how the SAT test scores of all students who took the test measure up—an example we'll further explore in this article.

You can create a histogram in Excel by using the Histogram tool of the Analysis ToolPak, an add-in module that provides a collection of functions and tools to augment the built-in analytic capabilities of Excel. This data analysis add-in is available in Excel, but may not be automatically loaded.

 Note   If you don't see Data Analysis on the Tools menu, you will need to load the add-in. For more information, see Load or unload add-in programs.

Using the Histogram tool

Using a sample table of students and their SAT test scores, we'll explore how the Histogram tool works in Excel. It's important that the input range for a histogram contains quantitative numeric data (such as the test scores); the Histogram tool doesn't work with qualitative numeric data (such as the student ID numbers).

Also, to see the output of the Histogram tool, for example a breakdown of the total scores at 50-point intervals, bins must be set up for distribution. These bins don't have to be equally spaced, as they are in column F in the following picture, but they must be in ascending order.

Column F contains the distribution bins.  

With all input data in the right place, you are ready to start the Histogram tool. Simply click Data Analysis on the Tools menu, and then select Histogram. The Histogram dialog box will be displayed.

This dialog box appears after you select the Histogram tool in the Data Analysis dialog box.  

In the Histogram dialog box, you need to supply the following information:

  • Input range    This is the location of the input data in your worksheet. The input range in our example is cells D2:D1001.
  • Bin range    This is the location of the bins in your worksheet. The bin range in our example is cells F2:F23.

 Note   If you want, you can leave the Bin Range box blank. The Histogram tool then automatically creates evenly distributed bin intervals using the minimum and maximum values in the input range as beginning and end points. The number of intervals is equal to the square root of the number of input values (rounded down). However, because these bin intervals may have little relation to your data, or the type of analysis you are looking for, it is recommended that you create your own bins.

  • Output location    This would be the upper left cell of the range where you want the analysis to appear. In our example, we'll start the output range in cell G1. You can also choose to place the output in another worksheet or workbook.
  • Display of the output     In our example, Chart Output is selected. You can also sort the output or display it in cumulative percentages.

The Histogram tool then analyzes all of the input information, calculates the output, and displays it in columns G and H, as shown in the following picture. This analysis tells us that three scores were at least 900 but less than 950, 48 are at least 950 but less than 1000, and so on.

This analysis shows us the scores.  

When you review the output of the Histogram tool, you should keep the following in mind:

  • The histogram reports the number of input values that are equal to or greater than the bin value but less than the next bin value and displays it in the Frequency column.
  • The last value in the table reports the number of input values equal to or greater than the last bin value.
  • If you specified the Pareto option in the Histogram dialog box, the output values are sorted in descending order.
  • If you specified the Cumulative Percentage option in the Histogram dialog box, a table that lists the cumulative percentages of each bin level is created.
  • If you specified the Chart Output option in the Histogram dialog box (like we did in our example), the Histogram tool simultaneously generates a chart along with the frequency distribution table.

The Histogram tool can automatically create a column chart like this one.

Creating histograms with PivotTables

You can create a histogram from data in a PivotTable, which allows you to quickly establish useful bin patterns. Unfortunately, the histogram tool will skip any zeros in the data. Therefore, you may want to avoid using data with zeros if you plan to create a histogram.

Beware of formulas in bins

The Histogram tool duplicates your column of bin values in the Bin column, which is convenient if you place the output somewhere else in your workbook. But because the Histogram tool copies the bin values, it's best if the bin range contains numeric constants rather than formulas. If you do use formulas, be sure they don't include relative references; otherwise, when the Histogram tool copies the range, the formulas might produce unwanted results.

Analyzing distribution with the FREQUENCY function

The Histogram tool generates a set of numeric constants. If you'd rather create formulas linked to the input values, you can use the built-in FREQUENCY array function, which calculates the number of times specified values occur in a population. This function has two arguments, and the syntax is as follows:

FREQUENCY(data_array,bins_array)

To use the FREQUENCY function, you must set up a column of bin values, just as you would with the Histogram tool. Then you can select the entire range where you want the output to appear, which in our example would be G2:G20 (the cells directly adjacent to the bin values in column F). The range you select must be within a column, because the FREQUENCY function cannot use a row or multicolumn range as its output range. The following picture shows the FREQUENCY function applied to the data.

Use the FREQUENCY function to link the distribution analysis to the input data.  

 
 
Applies to:
Excel 2003