|
Week
|
Topic
|
Assigned
Reading
|
Additional
Reading
|
Websites
|
Lecture Notes
|
Hmw
|
|
Jan 18
|
Course logistics
Introduction to the field of
computationalbiology
Introduction to Genome Biology
|
Introduction
to molecular biology
|
|
|
[1] [4]
|
|
|
Jan 23
|
Introduction to sequence similarity, pairwise alignment
|
Durbin et al.
Chapter 2
|
Gibbs, A.J. and McIntyre, G.A. (1970)
The diagram, a method for comparing sequences. Its use with amino acid
and nucleotide sequences. European Journal of Biochemistry.
Sep;16(1):1-11. [DOT
plots]
|
|
[4]
|
|
|
Jan 25,30
Feb 1, 6, 8
|
Sequence alignment: dynamic programming, local versus
global alignment
|
Needleman, S. B. and Wunsch, C. D. (1970). A
general method applicable to the search for similarities in the amino
acid sequence of two proteins. Journal of Molecular Biology
48, 443-453. [Global
alignment]
Temple F. Smith and Michael S. Waterman
(1981). Identification of Common Molecular Subsequences. Journal
of Molecular Biology, 147:195-197. [Local alignment]
|
Substitution
matrices:
Dayhoff, M. O., Schwartz, R. M. and
Orcutt, B. C. (1978). A model of evolutionary change in
proteins. matrices for detecting distant relationships. In Atlas of protein sequence and
structure, (Dayhoff, M. O., ed.), vol. 5, pp. 345-358.
National biomedical research foundation Washington DC. [PAM derivation]
Henikoff, S. & Henikoff,
J. G. (1992). Amino acid substitution matrices from protein
blocks. PNAS, 89, 10915-10919. [BLOSUM derivation]
Henikoff, S. and Henikoff, J.
G. (1993). Performance evaluation of amino acid substitution matrices. Proteins: Struct., Funct.,
Genet.
17, 49-61.
Altschul, S.F. (1991). Aminoacid substitution
matrices from an information theoretic perspective. Journal of
Molecular Biology 219:555-565. [Comparing
substitutions matrices]
|
|
[1] (Jan 30)
Notes on the Jukes-Cantor model (Feb 1)
[1] [4] (Feb 1, 6)
[1] [4] (Feb 8)
|
|
|
Feb 13, 15
|
Heuristic alignment algorithms, significance of scores,
extreme value distribution
|
Karlin, S and Altschul, SF (1990) Methods for
Assessing the Statistical Significance of Molecular Sequence Features
by Using General Scoring Schemes. PNAS, Vol 87, 2264-2268. [Derivation of the
distribution of S under random matching model]
Altschul, S.F., Gish, W., Miller, W., Myers,
E.W. & Lipman, D.J. (1990) Basic local alignment search tool.
Journal of Molecular Biology, 215:403-410. [BLASTP]
Pearson, W. R. and Lipman, D.
J. (1988) Improved tools for biological sequence comparison Proc. Natl.
Acad. Sci. US 85:2444-2448. [FASTA]
Mott , R. (1992) Maximum Likelihood
estimation of the statistical distribution of Smith-Waterman local
sequence similarity scores Bull. Math. Biol 54: 59-75.[MLE
for fitting extreme value distribution to gapped local alignment scores]
Pearson, W. R. (1998) Empirical statistical
estimates for sequence similarity searches. J. Mol. Biol. 276:71-84. [correcting length bias
in gapped local alignment scores]
|
Altschul, S.; Madden, T.; Schaffer, A.;
Zhang, J.; Zhang, Z.; Miller, W.; and Lipman, D. (1997). Gapped BLAST
and PSI-BLAST: a new generation of protein database search programs.
Nucleic Acids Research. 25, 3389-3402. [PSI-BLAST]
Zhang Z, Schaffer AA, Miller W, Madden TL,
Lipman DJ, Koonin EV, Altschul SF. (1998). Protein sequence similarity
searches using patterns as seeds. (PHI-BLAST) Nucleic Acids Res. 26,
3986-90. [PHI-BLAST]
Waterman M. (1994) Estimating statistical
significance of sequence alignments.
Philos Trans R Soc Lond B Biol Sci.344(1310):383-90. [Application
of the Azuma Hoeffding Lemma, includes a likelihood formulation]
S. Karlin, A. Dembo, and T.
Kawabata. Statistical composition of high-scoring segments from
molecular sequences. The Annals of Statistics, 2:571--581, 1990. [Derivation
of the distribution of S under random matching model]
|
BLAST server at NCBI
|
[1] [4] (Feb 13)
[1] [4] (Feb 15)
|
|
|
Feb 20
|
A brief introduction to Perl
|
|
|
Perl RegExp Quick Reference
Card
Perl tutorial WikiBooks
|
[1] [4] (Feb 20)
|
Hmw #1
Due March 8
blastp_example.txt
BLOSUM62
gap open = 11
gap extend = 1
seq1.txt
seq2.txt
|
|
Feb 22, 27
|
Hidden Markov Models: Estimation and Inference
|
Durbin et al.
Chapter 3
|
A tutorial on Hidden Markov
Models by L.R. Rabiner
|
|
|
|
|
Mar 1, 6, 8
|
Hidden Markov Models: Estimation and Inference
|
Durbin et al.
Chapter 3
Dempster, A.P., Laird,
N.M., Rubin, D. B. (1977) Maximum Likelihood from Incomplete Data via
the EM Algorithm. JRSSB, 39 (1): 1-38. [EM algorithm]
|
|
|
|
|
|
Mar 20
|
Hidden Markov Models for Sequence Alignment
Generalization of HMMs
|
A. Krogh, M. Brown, I.S. Mian, K. Sjolander,
D. Haussler. (1994) Hidden Markov models in computational biology:
applications to protein modeling. JMB 235:1501-1531.
Presenters: Hui Wang, Deyuan Jiang.
|
|
|
Slides (Mar 20)
|
|
|
Mar 22
|
Hidden Markov Models for Gene Prediction
Generalizations of HMMs
|
C. Burge, S. Karlin. (1997) Prediction of
complete gene structures in human genomic DNA. JMB 268:78-94.
Presenters: Hyonho Chun and Qin Ren .
|
|
|
|
|
|
Mar 27
|
Hidden Markov Models for gene prediction
|
Alexandersson, M., Cawley, S. and Pachter, L.
(2003)
SLAM: cross-species gene finding and alignment with a generalized pair
hidden Markov model.
Genome Research. 13(3):496-502.
|
Rogic, S., Mackworth, A.K.,
and Ouellette, F.B.F. (2001)
Evaluation of Gene-Finding Programs on Mammalian Sequences. Genome
Research Vol. 11, Issue 5, 817-832.
S.R. Eddy. (1998) Profile hidden Markov
models. Bioinformatics. 14, 755-763. |
|
Slides (Mar 27)
|
Hmw #1 solution
Hmw #2
Due
April 10
PWMmix_seq.txt (paremeters to
consider W = 6, 7, 8, K = 1, 2, 3)
|
Mar 29,
Apr 3 |
Motif finding based on single species sequence data
|
Lawrence, C. E.
and Reilly, A. A. (1990) An expectation maximization (EM) algorithm for
the identification and characterization of common sites in unaligned
biopolymer sequences. Proteins 7:41-51.
Bailey, T.L. and Elkan C. (1995)
Unsupervised Learning of Multiple Motifs in Biopolymers using EM.
Machine Learning, 21(1-2):51-80.
Lawrence, C.E., Altschul, S.F., Bogouski,
M.S., Liu, J.S., Neuwald, A.F., and Wooten, J.C. (1993) Detecting
Subtle Sequence Signals: A Gibbs Sampling Strategy for Multiple
Alignment. Science, 262, 208-214.
Presenter: Xiwen Ma
|
Liu, J.S., Neuwald, A.F.
and Lawrence, C. E. (1995) Bayesian Models for Multiple Local
Sequence Alignment and Gibbs Sampling Strategies. JASA. 90: 1156-1170.
Liu, J.S. and Lawrence,
C.E. (1999) Bayesian Inference on Biopolymer Models.
Bioinformatics,15(1): 38-52.
Boys, R.J. and Henderson, D.A. (2005) A
Bayesian Approach to DNA Sequence Segmentation. Biometrics. 61(2):
573:588.
|
MEME
Gibbs Motif Sample
|
Slides (Apr
3)
|
|
|
Mar 29,
Apr 3
|
Motif finding based on single species sequence data
|
S. Keles, M.J. van der Laan, S. Dudoit, B.
Xing, and M.B. Eisen (2003). Supervised detection of regulatory
motifs in DNA sequences. Statistical Applications in Genetics and
Molecular Biology, Vol. 2, No. 1, Article 5.
|
|
|
Slides
(Mar 29)
|
|
|
Apr 5
|
From motifs to modules
|
Zhou, Q. and Wong, W. H. (2004)
CisModule: De Novo discovery of cis-regulatory modules by hierarchical
mixture modeling. Proc. Natl. Acad. Sci. 101(33): 12114-12119
Presenters: Jee
Young Moon and Anglinia Washington
|
|
|
Slides (Apr
5)
|
|
|
Apr 10
|
Incorporating phylogenetic information into motif finding
|
Li, X. and Wong, W.H. (2005) Sampling motifs on
phylogenetic trees. PNAS. 102 (27): 9481-9486.
Moses, A.M., Chiang, D.Y., Eisen, M.B.
(2004). Phylogenetic Motif Detection by Expectation Maximization of
Evolutionary Mixtures. Pacific Symposium on Biocomputing
Li, X., Zhong, S. and Wong, W.H.
(2005) Reliable prediction of transcription factor
binding sites by phylogenetic verification PNAS. 102 (47): 16945-16950.
Presenters: Tim
Grant & Oscar Perez |
|
|
Slides (Apr 10)
|
|
|
Apr 12
|
Incorporating phylogenetic information into motif finding
|
Siepel, A., Bejerano, G., Pedersen, J.S.,
Hinrichs, A., Hou, M., Rosenbloom, K., Clawson, H., Spieth, J.,
Hillier, L.W., Richards, S., Weinstock, G.M., Wilson, R.K., Gibbs,
R.A., Kent, W.J., Miller, W. and Haussler, D. (2005) Evolutionarily
conserved elements in vertebrate, insect, worm, and yeast genomes.
Genome Res 15:1034-1050.
Siepel, A. and Haussler, D. (2004) Combining
phylogenetic and hidden Markov models in biosequence analysis. Journal
of Comput Biolology 11:413-428.
|
Ludwig, M.Z., Palsson, A., Alekseeva, E.,
Bergman, C.M., Nathan, J., Kreitman, M. (2005) Functional evolution of
a cis-regulatory module. PLoS Biology. 3(4):e93.
|
|
|
|
| Apr 17 |
Dictionary models for motif finding
|
Sabatti, C., Rohlin, R., Lange, K., Liao,
J.C. (2005) Vocabulon: a dictionary model approach for reconstruction
and localization of transcription factor binding sites. Bioinformatics
21(7): 922-931.
Presenters:
Yuan Jiang & Tao Yu
|
|
|
Slides (Apr 17)
|
|
| Apr 19 |
Beyond independent site models for motif
finding |
Qing
Zhou,
Jun S. Liu:
Modeling within-motif dependence for transcription factor binding site
predictions.
Bioinformatics 20(6): 909-916 (2004)
Barash, Y,
Elidan, G, Friedman, N, and Kaplan T (2003). Modeling
Dependencies in Protein-DNA Binding Sites. RECOMB 2003.
OD
King and FP
Roth. A non-parametric model for transcription factor binding
sites. Nucleic Acids Research
31(19):e116 (2003).
Irad
E.
Ben-Gal,
Ayala Shani,
Andre Gohr,
Jan Grau,
S. Arviv,
Armin Shmilovici,
Stefan Posch,
Ivo Grosse:
Identification of transcription factor binding sites with
variable-order Bayesian networks.
Bioinformatics 21(11): 2657-2666 (2005) |
|
|
Slides (Apr 17)
|
|
|
Apr 24
|
Introduction to tiling arrays - Analysis of ChIP-chip
experiments
|
Buck, M. J. and Lieb, J.D. (2004) ChIP-chip:
Considerations for the Design, Analysis, and Application of Genome-wide
Chromatin Immunoprecipitation Experiments. Genomics. 83(3):349-360.
Cawley, S., Bekiranov, S., Ng, H.H.,
Kapranov, P., Sekinger, E.A., Kampa, D., Piccolboni, A., Sementchenko,
V.I., Cheng, J., Williams, A.J., Wheeler, R., Wong, B., Drenkow, J.,
Yamanaka, M., Patel, S., Brubaker, S., Tammana, H., Helt, G., Struhl,
K. and Gingeras, T.R. (2004) Unbiased Mapping of Transcription Factor
Binding Sites Along Human Chromosomes 21 and 22 Points to Widespread
Regulation of Non-Coding RNAs ,Cell 116(4):499-511.
|
S. Keles, M. J. van der Laan, Sandrine
Dudoit, and Simon E. Cawley (2004). Multiple Testing Methods for
ChIP-Chip High Density Oligonucleotide Array Data. To appear in the
Journal of Computational Biology.
|
|
Slides (Apr 24)
|
|
|
Apr 24, 26
|
Analysis of ChIP-chip experiments
|
Ji, H. and Wong, W.H. (2005) TileMap: create
chromosomal map of tiling array hybridizations. Bioinformatics. 21(18):
3629-3636.
Keles, S. (2005)
Mixture modeling for genome-wide localization of transcription factors. |
|
|
|
|
| May 1 May 3 |
Final project presentations |
|
|
|
|
|