Research Interests

My main research interests are the area of statistical inference for molecular evolution and for trait evolution. This work invoves using stochastic processes (discrete Markov processes or continuous diffusions), and developing tools for model selection, Bayesian inference, and new models that capture enough realism, but that remain computationally feasible in our Big Data era.

I also got interested in a number of various biological applications through statistical consulting across campus, like food science and veterinary science.

Molecular Evolution

One of my aim is to detect what groups of genes share the same genealogy, to draw inference on the distribution of genealogies across the genome, and then reconstruct phylogenetic networks when the relationships are best depicted by a network. This area involves statistical issues of model selection, hierarchical modelling of species genealogies and gene genealogies, and it also involves computational challenges. Indeed, molecular data become available faster than appropriate methods of analysis. Development of these methods is currently funded by the NSF to study reticulate evolution and species delimitation in baobabs. See also these earlier awards, on the tree of Enterobacteriaceae, discordance patterns, and monocot AToL.

Trait Evolution

More recently, I have been interested in using phylogenetic trees to analyze trait evolution, using the phylogenetic 'comparative methods'. Data collected on species (or related individuals) do not form a random sample because they lack independence: sister species are expected to have similar traits. Such samples can show a high level of dependence, and there need to be adapted statistical methods of analysis. I am interested in the statistical properties of estimation methods, in the effective degree of freedom for parameters in these models, and adapted model selection procedure, to discover abrupt shifts in trait evolution for example. I am also extending these phylogenetic comparative methods to accommodate reticulation evolution, when the phylogenetic relationships are best depicted by a network. See this NSF project.

Software development

Methods are good to nothing if they are not implemented and user-friendly! I got involved in software development more and more (see here) first using C/C++ and R, now using Julia because it combines speed (like C) with interactivity (like R).

Students

  • Sabrina Yu (BS Stat major), Nan Ji (BS Stat major)
  • Mohammad Khabbazian (Karl Rohe primary advisor) (ECE PhD 2016)
  • Mengyao Yang (BS Stat major, 2016)
  • Claudia Solis-Lemus (Stat PhD, 2015) - now postdoc at Emory University
  • Lam Ho (Stat PhD, 2014) - now Assistant Professor at Dalhousie University
  • Yicheng Li (BS Stat major, 2014)
  • Charles-Elie Rabier (postdoc)
  • Yujin Chung (Stat PhD, 2012) - now Assistant Professor at Auburn University
  • Satish Kumar (Computer Science MS, 2010)