UPDATE: Must read papers for graduate students

Following up on my previous post, here is the list of ‘Must Read’ papers in phylogenetics that were suggested on Twitter. I think that this is a great start, even though it is missing some classics and some important topics (divergence time estimation, for example). Thanks to everyone for chipping in with their thoughts and thanks again to Matt Hahn and Matt Pennell for getting the conversation started.

I apologize if I missed anyone’s contributions. Feel free to suggest additions, either here in the comments or on twitter with the hashtag #mustreadphylo.

Bull, J. J., Huelsenbeck, J. P., Cunningham, C. W., Swofford, D. L., & Waddel, P. J. (1993). Partitioning and combining data in phylogenetic analysis. Systematic Biology, 42(3), 384–397.

Cavalli-Sforza, L. L., & Edwards, a W. F. (1967). Phylogenetic analysis. Models and estimation procedures. The American Journal of Human Genetics, 19, 233–257.

Edwards, S. V. (2009). Is a new and general theory of molecular systematics emerging? Evolution, 63, 1–19.

Felsenstein, J. (1973). Maximum likelihood and minimum-steps methods for estimating evolutionary trees from data on discrete characters. Systematic Biology, 22, 240–249.

Felsenstein, J. (1978). Cases in which parsimony or compatibility methods will be positively misleading. Systematic Zoology, 27, 401–410.

Felsenstein, J. (1981). Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution, 17, 368–376.

Felsenstein, J. (1985). Confidence limits on phylogenies: an approach using the bootstrap. Evolution, 39, 783–791.

Felsenstein, J. (1985). Phylogenies and the comparative method. American Naturalist, 125, 1–15.

Goldman, N. (1993). Statistical tests of models of DNA substitution. Journal of Molecular Evolution, 36, 182–198.

Hillis, D. M., & Bull, J. J. (1993). An Empirical Test of Bootstrapping as a Method for Assessing Confidence in Phylogenetic Analysis. Systematic Biology, 42, 182–192.

Holder, M., & Lewis, P. O. (2003). Phylogeny estimation: traditional and Bayesian approaches. Nature Reviews. Genetics, 4, 275–284.

Kumar, S., Filipski, A. J., Battistuzzi, F. U., Kosakovsky Pond, S. L., & Tamura, K. (2012). Statistics and truth in phylogenomics. Molecular Biology and Evolution, 29, 457–472.

Maddison, W. P. (1997). Gene Trees in Species Trees. Systematic Biology, 46, 523–536.

Pauling, L., & Zuckerkandl, E. (1963). Chemical paleogenetics. Acta Chem. Scand, 17, S9 – S16.

Sullivan, J., & Swofford, D. (1997). Are Guinea Pigs Rodents?? The Importance of Adequate Models in Molecular Phylogenetics. Journal of Mammalian Evolution, 4, 77–86.

Must read papers for graduate students

This post is sparked by an ongoing conversation on twitter that was kicked off when Matthew Hahn and Matt Pennell got to talking about developing a list of papers that should be required reading for graduate students with an interest in phylogenetics. This a good question, and I can’t recall seeing one. I start teaching on phylogenetics in our graduate core course here at UH next week and the 2015 Bodega workshop is only a few weeks away, so I’m finding this to be a timely and useful conversation.

There are already several good suggestions from folks on twitter, including, well….most of Joe Felsenstein’s early phylogenetics papers and his book, Maddison’s 1997 paper and Edwards 2009 paper on gene tree conflicts, and Sullivan and Swofford’s 1997 paper on the importance of adequate models (of course, guinea pigs are also a noble beast deserving of study in their own right).

Please jump into the conversation on twitter with your suggestions, or leave them here in the comments. I’ll post an update with a bibliography in a few days. Thanks to Matt and Matt for bringing this up!

On building a small cluster

Treethinkers reader Nick left a comment on one of my earlier posts asking for some details about the cluster that I built for my lab. I’ll do that with this post. I’ll start by outlining some information about the cluster, list the specific parts I used (although note that this was two years ago, so good choices would likely be different today), and then give a couple of general thoughts on building and maintaining your own cluster.

Our cluster is a small machine intended to crunch through moderate numbers of phylogenetic analyses and to serve as a resource for projects where it’s convenient to have more administrative access than you typically have on large shared clusters. It comprises 4 compute machines and a head node. Each compute machine has two 6-core Xeons, 500Gb of storage, and 24 Gb of memory. Because these processors are threaded, each chip with 6 physical cores has 12 threads available, meaning the 4 compute machines have 96 threads available. I built it using pretty standard commodity parts available from your favorite internet based vendor. Many of these parts are tailored to the gaming market, which is actually a little annoying…lots of fancy LEDs lighting everything up. I built the head node from a cheap barebones PC that I bought from Newegg. It provides a lot of storage and has plenty of power for compiling, transfers, and other maintenance tasks. This cluster is far from being blazing fast, but it’s a good workhorse for us that is roughly on par with 4 high end mac pros from a couple of years ago. It’s small enough to not cause any problems with cooling and it can run on a single 20 amp breaker. In short, I built it trying to find a balance between processing power and difficulty in setup and maintenance.

Thomson_Lab_Cluster

Continue reading

Two new workshops on phylogenetics and macroevolution

NESCent Academy will be hosting two workshops this summer that may be of interest to folks reading this blog and the deadline for applications is 1st May 2014.

Paleobiological and Phylogenetic Approaches to Macroevolution, July 22-29 

This course will teach participants to use fossil and phylogenetic data to analyze macroevolutionary patterns using traditional paleobiological stratigraphic methods, phylogenetic comparative methods and combined fossil and tree approaches. Macroevolutionary research is currently split into two quite isolated branches, one based on fossils and the other on extant taxa and phylogenies. Increasingly,evolutionary biologists in both camps are realizing that, only by combining neontological and paleontological data and approaches, can a new, and more powerful integrative macroevolution emerge. Unfortunately, these two disciplines utilize very different data and quantitative methods. Therefore to truly initiate a synthesis of these two approaches we need to train students and researchers to understand the intricacies of both fossil and phylogenetic data, and the methods necessary to integrate them.  APPLY HERE. More information can be found here.

Instructors
Roger Benson Dept. of Earth Sciences, University of Oxford
Samantha Hopkins Clark Honors College and the Department of Geological Sciences, University of Oregon
Gene Hunt Dept. of Paleobiology, National Museum of Natural History, The Smithsonian Institution, Washington DC 20013-7012, USA.
Samantha Price Dept. Evolution & Ecology, University of California Davis
Daniel Rabosky Dept. of Ecology and Evolutionary Biology, University of Michigan
Lars Schmitz Keck Science Department, Claremont McKenna, Pitzer, and Scripps Colleges
Graham Slater Dept. of Paleobiology, National Museum of Natural History, The Smithsonian Institution

Phylogenetic Analysis Using RevBayes, August 25-31

The Bayesian statistical framework for phylogeny estimation has facilitated the development of models that better capture biological complexity. This course is built around the use of the new, open-source program RevBayes (http://sourceforge.net/projects/revbayes/). RevBayes implements an R-like language (complete with control statements, user-defined functions, and loops) that enables the user to build up phylogenetic models from simple parts. This course cover the basics of probability theory, graphical models, and phylogenetics. Then, building on these concepts, we will provide lectures on statistical methods for phylogenetic inference, macroevolution, and epidemiology. APPLY HERE. More information can be found here.

Instructors
Bastien Boussau, LBBE, Lyon, France
Tracy Heath, UC Berkeley & U Kansas
Sebastian Höhna, UC Davis & UC Berkeley
John Huelsenbeck, UC Berkeley
Michael Landis, UC Berkeley
Nicolas Lartillot, LBBE, Lyon, France
Brian Moore, UC Davis
Fredrik Ronquist, NRM Stockholm
Tanja Stadler, ETH Zürich

A new phylogeneticist blogger

I’d like to advertise a newcomer among bloggers in phylogenetics: Nicolas Lartillot, now a researcher in Lyon. Nicolas just started blogging a couple of weeks ago but, judging from the number of posts he has already contributed, he seems bound to become a very prolific blogger.

Nicolas has made several noteworthy contributions to the field of phylogenetics, in particular Bayesian phylogenetics. For instance he has developed the CAT model of protein evolution, which seems to be more resilient against the Long Branch Attraction artifact, he has proposed Thermodynamic Integration for computing Bayes factors, he has developed a model for investigating correlations between continuous traits and rates of molecular evolution along a phylogeny, and he maintains the PhyloBayes package.

His blog is called “The Bayesian kitchen”, which I believe means that, underneath the nice theoretical properties of Bayesian inference, a fair amount of cooking is sometimes necessary to get things to work. So far his posts have been about the Bayesian/frequentist divide, about the philosophy of Bayesian inference, or about the interpretation of posterior probabilities, among other things. He uses examples from phylogenetics (e.g. dating, diversification models, ), comparative methods, or gene tree-species tree methods) or population genetics to help make his points. I’m certain I’m going to learn a lot from his posts, and I believe some of the readers of this blog will enjoy them too!

Is There Life After Graduate School?

In an earlier post, I discussed the decision about attending graduate school in the sciences. I argued that graduate school is certainly not the right choice for everyone. For people of a certain mind-set, though, it is the perfect choice. And even if you have all the right attributes for graduate school, you can still be miserable if you pick the wrong advisor or graduate program, so that choice is also important. But let’s assume that you decided that graduate school was the right choice for you, you did the research, found the perfect advisor, happily toiled away long hours discovering things about the natural world that no one else in the world knew about, published lots of exciting papers about those results, finished a dissertation, and successfully completed a Ph.D. Now you have to address the question that friends and family have been asking you for years: What will you do for the rest of your life, and how will you make a living doing it? How can you make a living doing something as specialized and arcane as phylogenetics, for example?

Continue reading

I’ll Admit It: I Loved Graduate School

At least once a month, I see blog posts from disgruntled current or former graduate students about “The Terrible Experience of Graduate School.” I advise a group of extremely bright undergraduates who are interested in research careers in the sciences, and they get scared to death by all these internet horror stories. The problem is, almost the only people who blog about their graduate school experience are the people who are (or were) extremely unhappy. There are certainly unhappy graduate students, but the truth is that many graduate students love the experience. But no one seems to want to write or read a blog post about the writer’s wonderful experience in graduate school. It sounds like gloating or bragging, and happy people usually are just content to be happy.

Continue reading

Workshop on Integrating Molecular Phylogenies and the Fossil Record

Last week I attended a workshop organized by Hélène Morlon, Tiago Quental, and Charles Marshall on integrating data from the fossil record into phylogenetic methods. This three-day workshop was sponsored by the the France-Berkeley Fund, a cool program that provides seed grants to build partnerships between UC Berkeley researchers and French collaborators. All of the events took place at the UCMP on the UC Berkeley campus.

Hélène, Charles, and Tiago recognized the increasing interest in methods and analyses that incorporate data from fossil taxa; and since there are several of us working in this area–particularly in methods development–the need for building a collaborative network is critical. Furthermore, as methods become more and more reliant on data from the fossil record, connections between neontologists and paleontologists must be formed. Notably, a similar working group – organized by Sam Price and Lars Schmitz – was held at NESCent this past spring and was made up of an overlapping set of researchers. One result of the NESCent catalysis meeting will be a SSE Symposium at Evolution 2014 on “Reuniting fossil and extant approaches to macroevolution”.
Continue reading