Preface
These questions are meant for you to exercise your new skills in data analysis and programming. There is a series of questions for each dataset. You are required to collaborate with your classmates assigned the same dataset and submit a report describing your findings, by 5:00 pm on Friday, 22nd July 2005. Graphics will probably form a major part of your analysis.
Your report will be looked at and any of your questions will be answered; assessments will be very lenient. Feel free to explore the data beyond the scope of the questions and include any questions you might have about the data in your report (along with your emails, so that we can contact you after the course).
Please contact Deepayan Sarkar with questions.
Enjoy the rest of your summer. We enjoyed having you here.
Questions
VA Lung Cancer trial
You can address the following questions with survival analysis methods. Does the treatment appear to be effective? Does the effectiveness, such as it may be, appear to vary with
- tumor cytology (cell type)?
- prior therapy?
- Karnofsky score at treatment time?
- patient age?
The Karnofsky score is a measure of a patient's general health, ranging from 0 (dead) to 100 (unimpaired). For further details, see http://virtualtrials.com/karnofsky.cfm.
COAST
For the COAST study, consider the following question. Did cytokine levels (IL-5, IL-10, IL-13, IFN-gamma) change from cord blood to 1 year? Did the change in cytokine levels appear to vary according to
- parental (maternal or paternal) history of asthma?
- pet (cat or dog) ownership at birth?
- the child's sex (boy or girl)?
- whether or not the child wheezed in the 1st year of life?
DIG
You can address the following questions with survival analysis methods. Does the treatment appear to be effective? Use death as the endpoint, but consider death due to non-cardiac, non-vascular causes as censored. Does the treatment appear to be effective in reducing time to first hospitalization (for any reason)? Does the effectiveness, such as it may be, appear to vary with prior history of hypertension?
VEST
You can address the following questions with survival analysis methods. Do either the high dose (60 mg.) or low dose (30 mg.) treatment appear to be effective? Does the effectiveness, such as it may be, appear to vary with New York Heart Association class (a scale reflecting severity of symptoms)?
PROMISE
You can address the following questions with survival analysis methods. Does the treatment appear to be effective? Does the effectiveness, such as it may be, appear to vary with
- baseline left ventricular ejection fraction?
- baseline New York Heart Association class ( a scale reflecting severity of symptoms)?
PRAISE
You can address the following questions with survival analysis methods. Does the treatment appear to be effective? Does the effectiveness, such as it may be, appear to vary with
- sex?
- baseline New York Heart Association class (a scale reflecting severity of symptoms)?
- etiology of heart failure?
PRAISE 2
You can address the following questions with survival analysis methods. Does the treatment appear to be effective? Does the effectiveness, such as it may be, appear to vary with
- sex?
- baseline left ventricular ejection fraction?
The PRAISE and PRAISE2 studies had the same treatment regimens and nearly identical eligibility requirements. PRAISE enrolled patients with both ischemic and non-ischemic etiology of heart failure while PRAISE2 enrolled only patients with non-ischemic etiology. Looking only at patients with non-ischemic etiology, compare the survival of the placebo groups in the two trials. Are there baseline characteristics among those available in your datasets which are different in these two groups of subjects?
FRAMINGHAM
We are interested in studying whether survival differs between men/women and smokers/non-smokers. The event times recorded in the study are relative to the date of the first clinic exam, which has no particular biological meaning. A more reasonable baseline is the patients' birth. Unfortunately, this makes it impossible to naively use the survival analysis techniques we have learnt. These techniques assume that all patients have been followed since the baseline, whereas in this case, we only see patients who are alive at the time of the first clinic exam.
To answer the questions below, you will first need to extract some information from the dataset. For each of the 4434 patients, obtain their
- sex
- smoking status at first exam
- age at first exam
- time to death / censoring (in days since first exam)
- status (dead / censored)
From this information, derive (assuming 365.25 days in a year) each patient's age (in years) at the time of death / censoring (round the results).
look at only the subset of 1550 patients that died during follow-up. Construct a two-way frequency table of age at death against sex. Plot this table using a bar chart and conduct a chi-square test to see if sex is independent of age at death (you can pool age groups together if you feel that the assumptions of the chi-square tests are being violated). Do the same analysis replacing sex by smoking status.
In a study like this, one interesting question is whether survival
patterns change over time (due to improvement in medical care, etc).
Artificially group the patients that died according to their age at
the first exam (say 34-48, 49-55, 56-61 and 62-70). The
cut function may be helpful here. Repeat the plots and
tests above for each subgroup and comment on the results.
Spellman
Available as a PDF file here.
Summer
Institute for Training in Biostatistics (SIBS)