Hey Treethinkers!
Just a quick update on some recent work—by
the marvelous Mike May and sensational Sebastian Höhna—that we’re very excited about in the Moore lab.

First, we have a paper in review that describes a new Bayesian approach for detecting mass-extinction events. Briefly, this is a novel method for detecting mass-extinction events from phylogenies estimated from molecular sequence data. We develop our approach in a Bayesian statistical framework, which enables us to harness prior information on the frequency and magnitude of mass-extinction events. The approach is based on an episodic stochastic-branching process model in which rates of speciation and extinction are constant between rate-shift events. We model three types of events: (1) instantaneous tree-wide shifts in speciation rate; (2) instantaneous tree-wide shifts in extinction rate, and; (3) instantaneous tree-wide mass-extinction events.

Each of the events is described by a separate compound Poisson process (CPP) model,
where the waiting times between each event are exponentially distributed with event-specific rate parameters. The magnitude of each event is drawn from an event-specific prior distribution. Parameters of the model are then estimated using a reversible-jump Markov chain Monte Carlo (rjMCMC) algorithm. We demonstrate via simulation that this method has substantial power to detect the number of mass-extinction events, provides unbiased estimates of the timing of mass-extinction events, while exhibiting an appropriate (i.e., below 5%) false discovery rate even in the case of background diversification rate variation. Finally, we provide an empirical application of this approach to conifers, which reveals that this group has experienced two major episodes of mass extinction. This new approach—the CPP on Mass Extinction Times (CoMET) model—provides an effective tool for identifying mass-extinction events from molecular phylogenies, even when the history of those groups includes more prosaic temporal variation in diversification rate.

This paper is available from the bioRxiv here.

We’ve also submitted an application note for our new R package, TESS 2.0, a Bayesian software package implementing the CoMET model and many other tasty methods for inferring rates of lineage diversification. Briefly, TESS implements statistical approaches for estimating rates of lineage diversification (speciation — extinction) from phylogentic trees. The program provides a flexible Bayesian framework for specifying an effectively infinite array of diversification models—where diversification rates are constant, vary continuously, or change episodically through time—and implements numerical methods to estimate parameters of these models from molecular phylogenies.

We provide robust Bayesian methods for assessing the relative fit of these models of lineage diversification to a given study tree–-e.g., where stepping-stone simulation is used to estimate the marginal likelihoods of competing models, which can then be compared using Bayes factors. We also provide Bayesian methods for evaluating the absolute fit of these branching-process models to a given study tree—i.e., where posterior-predictive simulation is used to assess the ability of a candidate model to generate the observed phylogenetic data.

This paper is available from the bioRxiv here.

Finally, all this good stuff is implemented in the newly released TESS 2.0 R package (including the source code, comprehensive user manual, and example files) is available from CRAN here.

UPDATE: Must read papers for graduate students

Following up on my previous post, here is the list of ‘Must Read’ papers in phylogenetics that were suggested on Twitter. I think that this is a great start, even though it is missing some classics and some important topics (divergence time estimation, for example). Thanks to everyone for chipping in with their thoughts and thanks again to Matt Hahn and Matt Pennell for getting the conversation started.

I apologize if I missed anyone’s contributions. Feel free to suggest additions, either here in the comments or on twitter with the hashtag #mustreadphylo.

Bull, J. J., Huelsenbeck, J. P., Cunningham, C. W., Swofford, D. L., & Waddel, P. J. (1993). Partitioning and combining data in phylogenetic analysis. Systematic Biology, 42(3), 384–397.

Cavalli-Sforza, L. L., & Edwards, a W. F. (1967). Phylogenetic analysis. Models and estimation procedures. The American Journal of Human Genetics, 19, 233–257.

Edwards, S. V. (2009). Is a new and general theory of molecular systematics emerging? Evolution, 63, 1–19.

Felsenstein, J. (1973). Maximum likelihood and minimum-steps methods for estimating evolutionary trees from data on discrete characters. Systematic Biology, 22, 240–249.

Felsenstein, J. (1978). Cases in which parsimony or compatibility methods will be positively misleading. Systematic Zoology, 27, 401–410.

Felsenstein, J. (1981). Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution, 17, 368–376.

Felsenstein, J. (1985). Confidence limits on phylogenies: an approach using the bootstrap. Evolution, 39, 783–791.

Felsenstein, J. (1985). Phylogenies and the comparative method. American Naturalist, 125, 1–15.

Goldman, N. (1993). Statistical tests of models of DNA substitution. Journal of Molecular Evolution, 36, 182–198.

Hillis, D. M., & Bull, J. J. (1993). An Empirical Test of Bootstrapping as a Method for Assessing Confidence in Phylogenetic Analysis. Systematic Biology, 42, 182–192.

Holder, M., & Lewis, P. O. (2003). Phylogeny estimation: traditional and Bayesian approaches. Nature Reviews. Genetics, 4, 275–284.

Kumar, S., Filipski, A. J., Battistuzzi, F. U., Kosakovsky Pond, S. L., & Tamura, K. (2012). Statistics and truth in phylogenomics. Molecular Biology and Evolution, 29, 457–472.

Maddison, W. P. (1997). Gene Trees in Species Trees. Systematic Biology, 46, 523–536.

Pauling, L., & Zuckerkandl, E. (1963). Chemical paleogenetics. Acta Chem. Scand, 17, S9 – S16.

Sullivan, J., & Swofford, D. (1997). Are Guinea Pigs Rodents?? The Importance of Adequate Models in Molecular Phylogenetics. Journal of Mammalian Evolution, 4, 77–86.

Must read papers for graduate students

This post is sparked by an ongoing conversation on twitter that was kicked off when Matthew Hahn and Matt Pennell got to talking about developing a list of papers that should be required reading for graduate students with an interest in phylogenetics. This a good question, and I can’t recall seeing one. I start teaching on phylogenetics in our graduate core course here at UH next week and the 2015 Bodega workshop is only a few weeks away, so I’m finding this to be a timely and useful conversation.

There are already several good suggestions from folks on twitter, including, well….most of Joe Felsenstein’s early phylogenetics papers and his book, Maddison’s 1997 paper and Edwards 2009 paper on gene tree conflicts, and Sullivan and Swofford’s 1997 paper on the importance of adequate models (of course, guinea pigs are also a noble beast deserving of study in their own right).

Please jump into the conversation on twitter with your suggestions, or leave them here in the comments. I’ll post an update with a bibliography in a few days. Thanks to Matt and Matt for bringing this up!

On building a small cluster

Treethinkers reader Nick left a comment on one of my earlier posts asking for some details about the cluster that I built for my lab. I’ll do that with this post. I’ll start by outlining some information about the cluster, list the specific parts I used (although note that this was two years ago, so good choices would likely be different today), and then give a couple of general thoughts on building and maintaining your own cluster.

Our cluster is a small machine intended to crunch through moderate numbers of phylogenetic analyses and to serve as a resource for projects where it’s convenient to have more administrative access than you typically have on large shared clusters. It comprises 4 compute machines and a head node. Each compute machine has two 6-core Xeons, 500Gb of storage, and 24 Gb of memory. Because these processors are threaded, each chip with 6 physical cores has 12 threads available, meaning the 4 compute machines have 96 threads available. I built it using pretty standard commodity parts available from your favorite internet based vendor. Many of these parts are tailored to the gaming market, which is actually a little annoying…lots of fancy LEDs lighting everything up. I built the head node from a cheap barebones PC that I bought from Newegg. It provides a lot of storage and has plenty of power for compiling, transfers, and other maintenance tasks. This cluster is far from being blazing fast, but it’s a good workhorse for us that is roughly on par with 4 high end mac pros from a couple of years ago. It’s small enough to not cause any problems with cooling and it can run on a single 20 amp breaker. In short, I built it trying to find a balance between processing power and difficulty in setup and maintenance.


Continue reading

Two new workshops on phylogenetics and macroevolution

NESCent Academy will be hosting two workshops this summer that may be of interest to folks reading this blog and the deadline for applications is 1st May 2014.

Paleobiological and Phylogenetic Approaches to Macroevolution, July 22-29 

This course will teach participants to use fossil and phylogenetic data to analyze macroevolutionary patterns using traditional paleobiological stratigraphic methods, phylogenetic comparative methods and combined fossil and tree approaches. Macroevolutionary research is currently split into two quite isolated branches, one based on fossils and the other on extant taxa and phylogenies. Increasingly,evolutionary biologists in both camps are realizing that, only by combining neontological and paleontological data and approaches, can a new, and more powerful integrative macroevolution emerge. Unfortunately, these two disciplines utilize very different data and quantitative methods. Therefore to truly initiate a synthesis of these two approaches we need to train students and researchers to understand the intricacies of both fossil and phylogenetic data, and the methods necessary to integrate them.  APPLY HERE. More information can be found here.

Roger Benson Dept. of Earth Sciences, University of Oxford
Samantha Hopkins Clark Honors College and the Department of Geological Sciences, University of Oregon
Gene Hunt Dept. of Paleobiology, National Museum of Natural History, The Smithsonian Institution, Washington DC 20013-7012, USA.
Samantha Price Dept. Evolution & Ecology, University of California Davis
Daniel Rabosky Dept. of Ecology and Evolutionary Biology, University of Michigan
Lars Schmitz Keck Science Department, Claremont McKenna, Pitzer, and Scripps Colleges
Graham Slater Dept. of Paleobiology, National Museum of Natural History, The Smithsonian Institution

Phylogenetic Analysis Using RevBayes, August 25-31

The Bayesian statistical framework for phylogeny estimation has facilitated the development of models that better capture biological complexity. This course is built around the use of the new, open-source program RevBayes ( RevBayes implements an R-like language (complete with control statements, user-defined functions, and loops) that enables the user to build up phylogenetic models from simple parts. This course cover the basics of probability theory, graphical models, and phylogenetics. Then, building on these concepts, we will provide lectures on statistical methods for phylogenetic inference, macroevolution, and epidemiology. APPLY HERE. More information can be found here.

Bastien Boussau, LBBE, Lyon, France
Tracy Heath, UC Berkeley & U Kansas
Sebastian Höhna, UC Davis & UC Berkeley
John Huelsenbeck, UC Berkeley
Michael Landis, UC Berkeley
Nicolas Lartillot, LBBE, Lyon, France
Brian Moore, UC Davis
Fredrik Ronquist, NRM Stockholm
Tanja Stadler, ETH Zürich

A new phylogeneticist blogger

I’d like to advertise a newcomer among bloggers in phylogenetics: Nicolas Lartillot, now a researcher in Lyon. Nicolas just started blogging a couple of weeks ago but, judging from the number of posts he has already contributed, he seems bound to become a very prolific blogger.

Nicolas has made several noteworthy contributions to the field of phylogenetics, in particular Bayesian phylogenetics. For instance he has developed the CAT model of protein evolution, which seems to be more resilient against the Long Branch Attraction artifact, he has proposed Thermodynamic Integration for computing Bayes factors, he has developed a model for investigating correlations between continuous traits and rates of molecular evolution along a phylogeny, and he maintains the PhyloBayes package.

His blog is called “The Bayesian kitchen”, which I believe means that, underneath the nice theoretical properties of Bayesian inference, a fair amount of cooking is sometimes necessary to get things to work. So far his posts have been about the Bayesian/frequentist divide, about the philosophy of Bayesian inference, or about the interpretation of posterior probabilities, among other things. He uses examples from phylogenetics (e.g. dating, diversification models, ), comparative methods, or gene tree-species tree methods) or population genetics to help make his points. I’m certain I’m going to learn a lot from his posts, and I believe some of the readers of this blog will enjoy them too!

Is There Life After Graduate School?

In an earlier post, I discussed the decision about attending graduate school in the sciences. I argued that graduate school is certainly not the right choice for everyone. For people of a certain mind-set, though, it is the perfect choice. And even if you have all the right attributes for graduate school, you can still be miserable if you pick the wrong advisor or graduate program, so that choice is also important. But let’s assume that you decided that graduate school was the right choice for you, you did the research, found the perfect advisor, happily toiled away long hours discovering things about the natural world that no one else in the world knew about, published lots of exciting papers about those results, finished a dissertation, and successfully completed a Ph.D. Now you have to address the question that friends and family have been asking you for years: What will you do for the rest of your life, and how will you make a living doing it? How can you make a living doing something as specialized and arcane as phylogenetics, for example?

Continue reading

I’ll Admit It: I Loved Graduate School

At least once a month, I see blog posts from disgruntled current or former graduate students about “The Terrible Experience of Graduate School.” I advise a group of extremely bright undergraduates who are interested in research careers in the sciences, and they get scared to death by all these internet horror stories. The problem is, almost the only people who blog about their graduate school experience are the people who are (or were) extremely unhappy. There are certainly unhappy graduate students, but the truth is that many graduate students love the experience. But no one seems to want to write or read a blog post about the writer’s wonderful experience in graduate school. It sounds like gloating or bragging, and happy people usually are just content to be happy.

Continue reading