Learning a Statistical Language

This Special Topics course is the prototype for a new statistics course suite, Stat 327, "Learning a Statistical Language", similar to Computer Science 368, "Learning a Programming Language" (see UW-Madison Course Guide Fall 2011 entry). That is, we will focus on learning statistical languages, with instruction at the beginning, intermediate or advanced level. Each 1-credit module will be half-semester in length. The first two modules, to be taught in Spring 2012, will be "Data Analysis with R" at a beginning (first half) or intermediate (second half) level. Below are some preliminary details.

  • Beginning Data Analysis with R: Undergraduates at an intermediate level, including a growing number of statistics majors as well as students in many other disciplines, require more specialized programming skills for inferring data relationships with models, visualizing raw data and results, and interpreting data. These undergraduates need an introduction to the way we do data science with computers. That is, they must learn how to use the R language to load data, form basic summaries, and create graphics for homework or reports. This module will include tutorial introductions to complement other statistics courses. Prerequisite is an introductory statistics course.
  • Intermediate Data Analysis with R: Many researchers uses linear models and more advanced statistical methods, which require deep understanding of computer tools. Efficient programming skills are essential to delve deeply into novel data problems. A new 1-credit course on will focus on adapting data analysis tools, annotating graphics, and documenting work. Tutorials will cover advanced material useful in many advanced undergraduate and graduate statistics courses. This module will assume previous exposure to R and will address subjects including matrix operations, functions, (avoiding) loops, and conditional expressions. Additional topics will involve programming in other languages (e.g. Python) for project management. This module is aimed at undergraduate statistics majors, students taking a second course in statistics below 600, and new graduate students in statistics taking our core linear models series, Stat 849-850.
  • Advanced Data Analysis with R: Graduate students, including our MS and PhD students as well as quantitatively skilled graduate and undergraduate students in other programs, will benefit from advanced training in reproducible research methods. Graduate students engaged in large analysis projects must learn how to recreate all the research results quickly and accurately should data or analysis tools change in subtle ways. Skills include source code management, package development, and command-level integration with other computational resources. The primary focus will be on open-source tools that complement R, with considerable attention to web-based information technologies: version control (SVN, GIT), dynamic documentation (Sweave, asciidoc), and project management using web interfaces (Redmine). Prior experience with low-level (C) and high-level (Python) programming languages is highly recommended.