# Math 225

## Introduction to Biostatistics

### Notes from Lecture #12

1. #### Confidence Intervals for a Single Proportion

A motivating problem. What is the proportion of red beads in the bucket?

A bucket contains several thousand colored beads, some of which are red. We wish to estimate the proportion of red beads in the bucket. If a random sample of 100 beads contains 25 red ones, how can we use this information to estimate the proportion of red beads in the entire bucket, and how confident can we be in our estimate? This is a model for many problems that arise. For example, in political polls, we could think of the red beads as representing the individuals in favor of a specific candidate. In the health sciences, we could think of the red beads as representing individuals with high blood pressure. In a biology setting, we could think of the red beads as representing individuals that had been previously captured and tagged.

2. Confidence Intervals. A confidence interval is an estimate of an unknown parameter along with a margin of error and a level of confidence. The basic format of a confidence interval for a parameter is below.

(estimate) ± (margin of error)

or

(estimate) ± (multiplier)(standard error)

The standard error is the standard deviation of the sampling distribution of the estimate.

3. Confidence Intervals for p. For proportions, the sample proportion is an obvious estimate of the population proportion. We will use the notation p for the population proportion, n for the sample size, X for the count of individuals in the sample, and p-hat = X/n for the sample proportion. Provided that the sample size is sufficiently large (observing at least five individuals of each type in the sample is a common rule of thumb) the Central Limit Theorem tells us that the sampling distribution of the sample proportion is approximately normal with mean p and standard deviation sqrt( p(1-p)/n ). To calculate a 95% confidence interval for example, we know that 95% of all sample proportions will be within 1.96 standard errors of the true population proportion. Therefore, we can be 95% confident that the true population proportion is within 1.96 standard errors of the actual observed sample proportion. More formally, a confidence interval for a proportion takes the form

p-hat ± z* sqrt ( p-hat (1 - p-hat) / n )

Notice that we use the estimate p-hat instead of the true p in the standard error because we do not know p! The value of z* is chosen so that the area between -z* and z* is the desired confidence level. Some common choices are:

Confidence Levelz*
90%1.645
95%1.960
99%2.576

In the example data, p-hat = 25/100 = 0.25. A 95% confidence interval for the proportion of red beads in the bucket is

0.25 ± 1.96 sqrt((0.25)(0.75)/100)

or

0.250 ± 0.087

We can be 95% confident that the proportion of red balls in the bucket is between 16.3% and 33.7%.

It is good practice to round the margin of error to two significant digits and to round the estimate to the same accuracy.

4. The logic of confidence intervals. Confidence intervals are based on this logical sequence. Similar logic holds for confidence levels other than 95%.
1. The sampling distribution of an estimate is approximately normal and is centered at the value of the parameter we wish to estimate.
2. Therefore, 95% of all possible estimates from random samples are within 1.96 standard errors of the unknowm parameter value.
3. Thus, we can be 95% confident that the particular estimate from our actual sample is within 1.96 standard errors of the parameter value.

5. Interpretations of confidence intervals. A confidence interval is a statement about the location of an unknown parameter. It is not a statement about the population. The width of a confidence interval is based on the sampling distribution of the estimate.