Typically, statisticians are going to use software to help them look at data using a box plot. However, when you are first learning about box plots, it can be helpful to learn how to sketch them by hand. This way, you will be very comfortable with understanding the output from a computer or your calculator.
Remember, the goal of any graph is to summarize a data set. There are many possible graphs that one can use to do this. One of the more common options is the histogram, but there are also dotplots, stem and leaf plots, and as I will show you here – boxplots (which are sometimes called box and whisker plots). Like a histogram, box plots ignore information about each individual data value and instead show the overall pattern.
In this example, I will use the data set below. Let’s suppose this data set represents the salaries (in thousands) of a random sample of employees at a small company.
Steps to Making Your Box plot
- Calculate the five number summary for your data set.
- Identify outliers.
- Sketch the box plot using the model below.
The five number summary consists of the minimum value, the first quartile, the median, the third quartile, and the maximum value. While these numbers can also be calculated by hand (here is how to calculate the median by hand for instance), they can quickly be found on a TI83 or 84 calculator under 1-varstats. The video below shows you how to get to that menu on the TI84:
For this data set, I got the following output:
Other than “a unique value”, there is not ONE definition across statistics that is used to find an outlier. You will see over time studying statistics that different settings will use different techniques to flag or mark a potential outlier. With boxplots, this is done using something called “fences”. The idea is that anything outside the fences is a potential outlier and shouldn’t be included in the main group that we graph.
The lower fence is defined by the formula where the IQR or inter-quartile range is . Any value in the data set that is less than this number will be treated as an outlier and marked with a star on the graph. Let’s do the calculation:
Since there are no values in the data set that are less than -10, there are no lower outliers.
The upper fence is defined by the formula and anything greater than this number will be considered an outlier. As before, if a number is identified as an outlier, it will be marked with a star on the graph. Here the calculation would be: (remember IQR was 20)
The largest value in the data set is 65, so this means there is no upper outlier either!
The main part of the box plot will be a line from the smallest number that is not an outlier to the largest number in our data set that is not an outlier. If a data set doesn’t have any outliers (like this one), then this will just be a line from the smallest value to the largest value. The rest of the plot is made by drawing a box from to with a line in the middle for the median. As a general example:
Additionally, if you are drawing your box plot by hand you must think of scale. In this data set, the smallest is 7 and the largest is 65. So starting my scale at 5 and counting by 5 up to 65 or 70 would probably give a nice picture. The since, none of these are outliers, I will draw a line from 7 to 65 as the main part of the graph. Finally, I will add a box from our quartiles (20 and 40) and a line at the median of 31. All together:
Of course, a software version will look quite a bit better. Also note that boxplots can be drawn horizontally or vertically and you may run across either as you continue your studies. As an example, here is the same boxplot done with R (a statistical software program) instead:
Remember – pay attention to how these box plots are put together in order to do a better job at reading the information they provide. Since you now know that middle line is the median, you can just look at the box plot and know that 50% of the salaries were less than $31,000 or so. As you can see, a box plot can not only show you the overall pattern but also contains a lot of information about the data set!