Math 325W

Introduction to Biostatistics

Loading Data

Basic Calculations

Graphical Exploratory Data Analysis

  1. [How?]
  2. Attach the data frame so that we may refer to variables by name. [How?]
  3. Look at the data in the spread-sheet like window. Identify the measurement scale (nominal, ordinal, interval, or ratio) of each variable.
  4. Examine histograms of several quantitative variables. [How?]
  5. Display the variable cholesterol in a box-and-whisker plot (boxplot). [How?]
  6. Find the five-number summary (minimum, lower quartile, median, upper quartile, maximum) of cholesterol numerically by typing quantile(cholesterol). [How?]
  7. You can also find the minimum, median, mean, or maximum individually with the commands:
    > min(cholesterol)
    > median(cholesterol)
    > mean(cholesterol)
    > max(cholesterol)
    Try each of these. [How?]
  8. Examine box-and-whisker plots of several quantitative variables. Do any have potential outliers? Identify any strong skewness.
  9. You can examine the relationship between a categorical (qualitative) variable and a quantitative variable with a side-by-side box-and-whisker plot. (The data in each category is displayed in its own box-and-whisker and all box-and-whisker plots share the same vertical scale.) For example, is there a difference in the distribution of cholesterol level between those who smoke and those who do not? Open the ``Plots2D'' palette, select the data frame, put the categorical variable in for ``x Column(s)'' and the quantitative variable in for ``y Column(s)''. [How?] Make a side-by-side box-and-whisker plot of weight versus activity.
  10. Summarize all variables numerically in a single report.

    1. From the ``Statistics'' menu, select ``Data Summaries'' and then ``Summary Statistics...''.
    2. Choose the data frame and click on or off any statistics you wish to see. You can do all variables simultaneously or one or several at a time. click on an individual variable to do only one.
    3. Clicking on a second variable while holding the shift key allows you to see summary statistics for all variables between the two. A report sheet opens with the summary statistics.

Homework Assignment

Load the cereal data set into S-PLUS and answer the questions below. You should write your answers on this form and turn it in to your lab instructor by the due date.

Further S-PLUS help is available in this on-line guide.

  1. Find a variable that is skewed to the right. Plot its histogram. Estimate by eye the mean and median from the graph.
  2. Find the mean and median of the variable precisely you found in this previous problem.
  3. Explain the difference in how the mean and median measure the ``center'' of a distribution. For data that is skewed to the right, which measure of ``center'' will be larger?
  4. Make a box-and-whisker plot of the variable potass (mg potassium per serving). Describe how this plot indicates the presence of potential outliers. Identify the brands of cereal which are outliers. In addition to high potassium levels, what else do they have in common?
  5. Construct a side-by-side box-and-whisker plot of sugar versus shelf. Does the distribution of sugar content vary by grocery shelf? Give a hypothetical explanation for the pattern you see in the plot.

Last modified: September 9, 1999

Bret Larget,