Spring 2008. Statistics 992-2 (877)


Statistical methods for molecular biology


When/where: MW 2:25-3:40, 5295 MSC


Instructors: Michael Newton (lead), Bret Larget, Cecile Ane, Sunduz

                    Keles, Christina Kendziorski, Karl Broman, Brian Yandell


Description: The course will provide a statistical perspective on some current biological problems, with an introduction to statistical analysis in genomics, phylogenetics, gene regulation, gene expression, gene mapping by linkage or association, and related areas. Statistical concepts will include: stochastic modeling, hierarchical modeling, likelihood methods, Bayesian methods, multivariate analysis methods, model selection, high-dimensional parameters, experimental design strategies, and multiple testing. Biological concepts will include: microarray and related measurement of DNA, RNA, and protein; genomic resources; the relationship between genotype and phenotype; breeding designs; pedigrees; and phylogenies. Specific content may vary by lead instructor, with a core of agreed-upon material. Statistics graduate students should gain useful background for their own research at the interface of statistics and molecular biology.


Outline: [29 lecture periods] [approximate date assignment]


1. Elements of statistics and molecular biology [1 lecture] [MAN] [1/23]


2. Sequence analysis I:  [2 lectures, MAN] [1/28 – 1/30]


   2.1   Statistics of sequencing and assembly

   2.2   Statistics of alignment


3. Sequence analysis II:  Comparative genomics [4 lectures, BL,CA] [2/4 – 2/13]


  3.1  Introduction to phylogenetics and molecular evolution.

  3.2  Models of molecular evolution, maximum likelihood estimation.

  3.3  Bootstrapping phylogenies and statistical tests of monophyly.

  3.4  Bayesian phylogenetics.


4.  Transcription I: Regulation [4 lectures, SK] [2/18 – 2/27]


    4.1-2 Background; motif finding problem

    4.2-3 Tiling array technologies

    4.4     Beyond independent site models for motif finding


5. Transcription II: Expression  [7 lectures; CK, MAN] [3/3 – 3/31 incl break]


   5.1 Microarray data generation [guest SS]

   5.2 Preprocessing:  background correction; normalization; summarization

   5.3 Multivariate methods 1: hierarchical clustering; dimension reduction

   5.4 Differential expression 1: fold, t, multiple comparison issues

   5.5 Differential expression 2: mixture Empirical Bayes methods; q-values

   5.6 Multivariate methods 2:  network inference [Schaffer-Strimmer/Ledoit-Wolf]

   5.7 Data integration: gene set analysis, Gene Ontology, enrichment


6. Linkage analysis [KB/BY][6 lectures] [4/2 – 4/21]


   6.1    Meiosis and recombination

   6.2-3 QTL mapping in experimental crosses

   6.4    Parametric linkage in humans

   6.5.   Allele sharing methods

   6.6.   QTL mapping in humans


7.  Association studies [3 lectures, MAN] [?4/30-5/7]


    7.1  Population genetics; KingmanÕs coalescent; linkage disequilibrium

    7.2  Study designs, confounding; TDT

    7.3  Genome-wide association [Balding paper]



+ 2 spare lectures TBA


Evaluation:  1 homework set per instructor; 1 class project presented in poster

                    6 homework sets at 12pts/set; project at 28 pts.