- [How?]
- Attach the data frame so that we may refer to variables by name. [How?]
- Look at the data in the spread-sheet like window. Identify the measurement scale (nominal, ordinal, interval, or ratio) of each variable.
- Examine histograms of several quantitative variables. [How?]
- Display the variable
`cholesterol`

in a box-and-whisker plot (boxplot). [How?] - Find the five-number summary
(minimum, lower quartile, median, upper quartile, maximum)
of
`cholesterol`

numerically by typing`quantile(cholesterol)`

. [How?] -
You can also find the minimum, median, mean, or maximum individually
with the commands:
> min(cholesterol) > median(cholesterol) > mean(cholesterol) > max(cholesterol)

Try each of these. [How?] - Examine box-and-whisker plots of several quantitative variables. Do any have potential outliers? Identify any strong skewness.
- You can examine the relationship between a categorical (qualitative) variable
and a quantitative variable with a side-by-side box-and-whisker plot.
(The data in each category is displayed in its own box-and-whisker
and all box-and-whisker plots share the same vertical scale.)
For example,
is there a difference in the distribution of cholesterol level
between those who smoke and those who do not?
Open the ``Plots2D'' palette,
select the data frame,
put the categorical variable in for ``x Column(s)''
and the quantitative variable in for ``y Column(s)''.
[How?]
Make a side-by-side box-and-whisker plot of
`weight`

versus`activity`

. - Summarize all variables numerically in a single report.
- From the ``Statistics'' menu, select ``Data Summaries'' and then ``Summary Statistics...''.
- Choose the data frame and click on or off any statistics you wish to see. You can do all variables simultaneously or one or several at a time. click on an individual variable to do only one.
- Clicking on a second variable while holding the shift key allows you to see summary statistics for all variables between the two. A report sheet opens with the summary statistics.

Further S-PLUS help is available in this on-line guide.

- Find a variable that is skewed to the right. Plot its histogram. Estimate by eye the mean and median from the graph.
- Find the mean and median of the variable precisely you found in this previous problem.
- Explain the difference in how the mean and median measure the ``center'' of a distribution. For data that is skewed to the right, which measure of ``center'' will be larger?
- Make a box-and-whisker plot of the variable
`potass`

(mg potassium per serving). Describe how this plot indicates the presence of potential outliers. Identify the brands of cereal which are outliers. In addition to high potassium levels, what else do they have in common? - Construct a side-by-side box-and-whisker plot of
`sugar`

versus`shelf`

. Does the distribution of sugar content vary by grocery shelf? Give a hypothetical explanation for the pattern you see in the plot.

Last modified: September 9, 1999

Bret Larget, larget@mathcs.duq.edu