Confidence Intervals for Proportions – By Hand

Before you ever even thought about statistics, you had seen a confidence interval for a proportion. Don’t believe me? Well think back – does this sound familiar?

“A new poll shows that 45% of Americans believe something interesting about some idea….(on the screen is a note: *margin of error 4%)”

Yup, that exciting piece of news (I really should be a news writer come to think of it) is really representing a confidence interval for some population proportion. We use these types of confidence intervals anytime we want to understand the percentage or proportion of a population that has a certain property. The idea is to take a sample, see what percentage of our sample has the property, and finally use these calculations to estimate how well that translates to the whole group or population.

[adsenseWide]

Important symbols:

  • P-hat or \hat{p} is the sample proportion. It can be found by dividing the sample number of successes x (whatever we are counting) by the sample size n
  • p will represent the population proportion. In some textbooks, \pi is used

The Assumptions

In order to use these procedures, we will need to have the central limit theorem for sample proportions (the big ideas of this theorem are for another article and day). This means that it’s initial assumptions must hold. Those are:

  • We have a random sample from the population.
  • The population size is much larger than the sample size
  • We have a “big enough” sample. In this case we will need \hat{p}n \geq 10 AND (1 - \hat{p})n \geq 10

For these examples, we will assume that the first two statements are true, but we will check the last condition. Remember that if you are working with real data in a research situation, that you must be careful to make sure the first two conditions holds as well.

The Calculation – By Hand

The general formula for this type of confidence interval is:


<br /> \hat{p} \pm z_{c}\sqrt{\dfrac{\hat{p}(1-\hat{p}}{n}}<br />

where z_{c} is a critical value from the normal distribution (see below) and n is the sample size.

Common values of z_{c} are:

Confidence Level Critical Value
90% 1.645
95% 1.96
99% 2.575

Now that we have that out of the way, let’s try it on an example!

In a sample of 680 young adults (ages 18 – 25) residing in a large city, 471 stated that they regularly use public transportation. Use this information to calculate a 95% confidence interval for the proportion of all young adults in this city that regularly use public transportation.

Since we are estimating a population proportion, we know that we will use the formula above. Before we can use that formula though, we must check assumptions. Here \hat{p}= 471/680 = 0.6926 and \hat{p}n = 471 while (1-\hat{p})n = 209. Both of these numbers are larger than 10 so we can continue.

The formula:


<br /> \hat{p} \pm z_{c}\sqrt{\dfrac{\hat{p}(1-\hat{p}}{n}}<br />

Plugging values in:

<br /> 0.6926 \pm 1.96\sqrt{\dfrac{0.6926(1-0.6926}{680}}<br />

Simplifying:


<br /> 0.6926 \pm 0.0347<br />

Note that in the last step, I did every calculation in my calculator and avoided rounding until the end. To see this “in action” Please check the video of this example on the right hand side of your screen. At this stage, you can convert to percentages and use this as your final answer.


Final Answer: 69.26\% \pm 3.47\%

You can also actually perform the addition and subtraction and write the final answer using left and right endpoints.

Left endpoint:
0.6926 – 0.0347 = 0.6579

Right endpoint:
0.6926 + 0.0347 = 0.7273

Final Answer: (65.79\% , 72.73\%)

A Note About Interpretation

With all confidence intervals, it is easy to get caught up in the calculations and forget that they have a real world meaning. Not only that, but the real world meaning is often confused or misunderstood. Make sure you read about how to interpret confidence intervals and take note of the common misinterpretations.

In this example, we could say We are 95% confidence that between 66% and 72% of all young adults in this city regularly use public transportation. This means that about 95% of the time, an interval produced this way will work as intended. In other words, about 95% of the time, the actual percentage will be between the two values we calculate. (still confused? Read the article linked above!)