Introduction to Biostatistics
Graphical Exploratory Data Analysis
- Attach the data frame so that we may refer to variables by name.
- Look at the data in the spread-sheet like window.
Identify the measurement scale
(nominal, ordinal, interval, or ratio)
of each variable.
- Examine histograms of several quantitative variables.
- Display the variable
in a box-and-whisker plot (boxplot).
- Find the five-number summary
(minimum, lower quartile, median, upper quartile, maximum)
You can also find the minimum, median, mean, or maximum individually
with the commands:
Try each of these.
- Examine box-and-whisker plots of several quantitative variables.
Do any have potential outliers?
Identify any strong skewness.
- You can examine the relationship between a categorical (qualitative) variable
and a quantitative variable with a side-by-side box-and-whisker plot.
(The data in each category is displayed in its own box-and-whisker
and all box-and-whisker plots share the same vertical scale.)
is there a difference in the distribution of cholesterol level
between those who smoke and those who do not?
Open the ``Plots2D'' palette,
select the data frame,
put the categorical variable in for ``x Column(s)''
and the quantitative variable in for ``y Column(s)''.
Make a side-by-side box-and-whisker plot of
- Summarize all variables numerically in a single report.
- From the ``Statistics'' menu,
select ``Data Summaries'' and then ``Summary Statistics...''.
- Choose the data frame and click on or off any statistics you wish to see.
You can do all variables simultaneously
or one or several at a time.
click on an individual variable to do only one.
- Clicking on a second variable while holding the shift key
allows you to see summary statistics for all variables between the two.
A report sheet opens with the summary statistics.
Load the cereal data set into S-PLUS
and answer the questions below.
You should write your answers on
and turn it in to your lab instructor by the due date.
Further S-PLUS help is available in this
- Find a variable that is skewed to the right.
Plot its histogram.
Estimate by eye the mean and median from the graph.
- Find the mean and median of the variable precisely
you found in this previous problem.
- Explain the difference in how the mean
and median measure the ``center'' of a distribution.
For data that is skewed to the right,
which measure of ``center'' will be larger?
- Make a box-and-whisker plot of the variable
(mg potassium per serving).
Describe how this plot indicates the presence of potential outliers.
Identify the brands of cereal which are outliers.
In addition to high potassium levels,
what else do they have in common?
- Construct a side-by-side box-and-whisker plot of
Does the distribution of sugar content vary by grocery shelf?
Give a hypothetical explanation for the pattern you see in the plot.
Last modified: September 9, 1999