One experimental design is to either take two independent samples, one from each population, or to randomly assign subjects into each of two treatment groups. In these situations, the number of individuals in each sample or treatment group is potentially different. There is no particular reason why any single individual should be matched or paired with a specific individual from the other group. These are examples of independent samples.
In contrast, an alternative design is to take a single sample of pairs of measurements. Perhaps each subject is measured twice, once under each set of treatment conditions. Or each experimental unit may be a pair of individuals, such as siblings or twins. Notice that in these situations, there is always perfect balance between the two groups. These are examples of matched pair designs.
In both cases, we will consruct confidence intervals with the basic form
(sample mean 1) - (sample mean 2) +/- (multiplier)(SE)
The multiplier will usually be from the t distribution (unless we know the population standard deviations) but the formula for the SE differs in the two cases. Also, the test statistic for hypothesis tests will be:
t = ((sample mean 1) - (sample mean 2)) / SE
For independent samples of size n_1 and n_2, the exact standard error is:
SE = sqrt( (sigma_1)^2 / n_1 + (sigma_2)^2 / n_2 )If we assume that the two population standard deviations are equal, we can estimate their common value best as
s_p = sqrt( ( (n_1 - 1)*(s_1)^2 + (n_2 - 1)*(s_2)^2 ) / (n_1 + n_2 - 2) )Notice that this is the square root of the weighted average of the sample variances (s_1)^2 and (s_2)^2. The SE is approximated by:
SE = s_p * sqrt( 1/(n_1) + 1/(n_2) )which follows algebraically if you substitute in s_p for (sigma_1) and (sigma_2). The t distribution has n_1 + n_2 - 2 degrees of freedom in this case. (n_1 - 1 + n_2 - 1 = n_1 + n_2 - 2.)
For the matched pair design case, the idea is to take all of the individual differences first, and to consider your data to be a single sample of n differences. There are then n-1 degrees of freedom and you treat the sampled differences as a sample from a single population.
This lab will ask questions about the HARVEST trial data set.
Notice: The linked dataset is a modified version of the original data set where subjects with missing data in the SMOKEYES and HRCB variables have been removed. The menu command to compare two samples and will not work if there is missing data. If you use the original data set, you will have trouble, so make sure you use the data set harvestTwo.
We will make several comparisons between smokers and nonsmokers. The grouping variable for these comparisons is SMOKEYES. Treat the smokers and nonsmokers in the HARVEST data set as independent random samples from some larger populations. Other questions will make comparisons between variables measured in the clinic (C) and at home (A for ambulatory). Because these measurements are made on the same individuals, these comparisons are an example where matched pair methods are appropriate.
A. There is convincing evidence
that the mean diastolic blood pressure of smokers
is different than that of nonsmokers.
B. There is fairly strong evidence
that the mean diastolic blood pressure of smokers
is different than that of nonsmokers.
C. There is fairly strong evidence
that the mean diastolic blood pressure of smokers
is exactly equal to that of nonsmokers.
D. The data is consistent with no difference
in the mean diastolic blood pressure of smokers and nonsmokers.
The observed difference in sample means can be explained
by sample variation.
A. There is convincing evidence
that the mean heart rate of smokers
is different than that of nonsmokers.
B. There is fairly strong evidence
that the mean heart rate of smokers
is different than that of nonsmokers.
C. There is fairly strong evidence
that the mean heart rate of smokers
is exactly equal to that of nonsmokers.
D. The data is consistent with no difference
in the mean heart rate of smokers and nonsmokers.
The observed difference in sample means can be explained
by sample variation.
A. There is convincing evidence
that the measurements of diastolic blood pressure
at home and at the clinic are different.
B. There is fairly strong evidence
that the measurements of diastolic blood pressure
at home and at the clinic are different.
C. There is fairly strong evidence evidence
that the measurements of diastolic blood pressure
at home and at the clinic are exactly the same.
D. The data is consistent with no difference
in the measurements of diastolic blood pressure
at home and at the clinic.
The observed differences can be explained
by sample variation.
A. There is convincing evidence
that the measurements of heart rate
at home and at the clinic are different.
B. There is fairly strong evidence
that the measurements of heart rate
at home and at the clinic are different.
C. There is fairly strong evidence evidence
that the measurements of heart rate
at home and at the clinic are exactly the same.
D. The data is consistent with no difference
in the measurements of heart rate
at home and at the clinic.
The observed differences can be explained
by sample variation.
Bret Larget, larget@mathcs.duq.edu