Statistical Computing

UW Madison Statistics 771 Spring 2011

Prof Michael Newton

Overview

This course introduces graduate students to some basic elements of statistical computing, and computational statistics, more broadly defined. There are no graduate level prerequisites, although students are expected to be familiar with essential features of probability and statistical inference as usually covered in an intermediate undergraduate course.

Statistical computing concerns the relation between computers and statistical analysis.  Early problems were how to accurately, stably, and efficiently calculate simple statistics, such as sample moments, in the presence of inexact computer arithmetic.  Numerical analysis issues continue to arise in other important computational problems: function evaluation, optimization, integration, and the solution of equations.  One goal of this course is to study algorithms that effectively solve these problems.

There is some truth to the statement that statistical theory regarding what should be calculated in a data analysis is intimately related to computational concerns about what can be calculated.  A second goal of this course is to present a number of modern, compute-intensive methods for data analysis.

A third goal of this course is to have students practice statistical computing by coding algorithms and studying their properties, and to have students become familiar with the R environment for statistical computing.

Format

Lectures TR 2:30-3:45

Fortnightly homework (25 pts)

Quizes (50 pts)

Project (build R package) (25 pts)

Office hours: as requested

Textbooks are not required, but optional texts are:

Computational Statistics, G Givens and J Hoeting, Wiley, 2005

A first course in statistical programming with R, W.J. Braun and D.J. Murdoch,

The R Book, M. J. Crawley

Elements of Statistical Computing, R.A. Thisted, Chapman and Hall, 1988

Syllabus

Numerical methods (basics; floating point issues, representation error, other errors, log-sum-exp trick; power-series and continued fraction approximations to probabilities; pnorm)

Numerical linear algebra (basics; decompositions (Cholesky, QR, SVD); solving linear systems (qr, chol, svd, backsolve) )

Nonlinear optimization (selected topics): from maximum likelihood, Nelder-Mead, coordinate descent (nlminb, optim)

Root finding (selected topics): Newton-Raphson, iteratively reweighted least squares for generalized linear models (glm)

Expectation/Maximization (convex analysis, optimization transfer, mixture models, clustering/nonparametrics)

Probabilistic graphical models (selected topics): hidden Markov models, Baum-Welch algorithm, Viterbi algorithm, graphical representations, message-passing algorithms

Constrained optimization (selected topics): pool-adjacent-violators algorithm

Monte Carlo methods: static/dynamic, non-uniform sampling, multivariate sampling, Markov chain sampling, Gibbs sampling, Metropolis Hastings, adaptive sampling

Building R packages (selected topics)

Other topics (dictated by student interest)

Homework

Homework 1. Due in class Feb 3/2011

Homework 2 (Due Feb 10).   R data archive for homework (bir.RData)

Homework 3 (Due March 10)

Homework 4 (Due May 3)

Project information, part 2: By Thursday April 7 please send me further information on your R-package project.  Specifically, I want to see some functionality demonstrated in R language code.  It may be the illustration of a function that will be in your package, or some source code that does elements of interest. It should include an R implementation of some data set that will be used by the package.

Lectures