Spring
2008. Statistics 992-2 (877)
Statistical
methods for molecular biology
When/where: MW 2:25-3:40, 5295 MSC
Instructors: Michael Newton (lead), Bret Larget, Cecile Ane, Sunduz
Keles, Christina Kendziorski, Karl Broman, Brian Yandell
Description: The course will provide a statistical perspective on some current
biological problems, with an introduction to statistical analysis in genomics,
phylogenetics, gene regulation, gene expression, gene mapping by linkage or
association, and related areas. Statistical concepts will include: stochastic
modeling, hierarchical modeling, likelihood methods, Bayesian methods,
multivariate analysis methods, model selection, high-dimensional parameters,
experimental design strategies, and multiple testing. Biological concepts will
include: microarray and related measurement of DNA, RNA, and protein; genomic
resources; the relationship between genotype and phenotype; breeding designs;
pedigrees; and phylogenies. Specific content may vary by lead instructor, with
a core of agreed-upon material. Statistics graduate students should gain useful
background for their own research at the interface of statistics and molecular
biology.
Outline:
[29 lecture periods] [approximate date assignment]
1. Elements of statistics
and molecular biology [1 lecture] [MAN] [1/23]
2. Sequence analysis
I: [2 lectures, MAN] [1/28 –
1/30]
2.1 Statistics of sequencing and assembly
2.2 Statistics of alignment
3. Sequence analysis
II: Comparative genomics [4 lectures,
BL,CA] [2/4 – 2/13]
3.1 Introduction to phylogenetics and molecular evolution.
3.2 Models of molecular evolution, maximum likelihood
estimation.
3.3 Bootstrapping phylogenies and statistical tests of
monophyly.
3.4 Bayesian phylogenetics.
4. Transcription I: Regulation [4
lectures, SK] [2/18 – 2/27]
4.1-2 Background; motif
finding problem
4.2-3 Tiling array
technologies
4.4 Beyond independent
site models for motif finding
5. Transcription II:
Expression [7 lectures; CK, MAN]
[3/3 – 3/31 incl break]
5.1 Microarray data generation
[guest SS]
5.2 Preprocessing: background correction; normalization;
summarization
5.3 Multivariate methods 1:
hierarchical clustering; dimension reduction
5.4 Differential expression 1:
fold, t, multiple comparison issues
5.5 Differential expression 2:
mixture Empirical Bayes methods; q-values
5.6 Multivariate methods 2: network inference
[Schaffer-Strimmer/Ledoit-Wolf]
5.7 Data integration: gene set
analysis, Gene Ontology, enrichment
6. Linkage analysis
[KB/BY][6 lectures] [4/2 – 4/21]
6.1 Meiosis and recombination
6.2-3 QTL mapping in experimental
crosses
6.4 Parametric linkage in humans
6.5. Allele sharing methods
6.6. QTL mapping in humans
7. Association studies [3 lectures, MAN]
[?4/30-5/7]
7.1 Population genetics; KingmanÕs
coalescent; linkage disequilibrium
7.2 Study designs, confounding; TDT
7.3 Genome-wide association [Balding paper]
+ 2 spare lectures TBA
Evaluation: 1 homework set per instructor; 1 class
project presented in poster
6 homework sets at 12pts/set; project at 28 pts.