Evolution 2013: The Good, the Better, and the Future

The 2013 Evolution meetings (joint meetings of the Society for the Study of Evolution, Society of Systematic Biologists, and the American Society of Naturalists) were held in Snowbird, Utah, from 21-25 June 2013. The meetings were a great success, and as usual, the meetings featured many packed sessions on phylogenetic methods, theory, and applications. These meetings were held in Snowbird twenty years ago (1993) as well, but much has changed since then. As I flew home from Utah this week, I contemplated a few of things that made the meetings successful, and I compiled this list of thoughts and recommendations for future meetings.

Things that made #Evol2013 a success:

1. The presence of outstanding undergraduates who are working on research. This was better than I ever remember in the past. In addition to fostering science careers for undergrads, it also makes the meeting much more attractive to faculty who are interested in recruiting outstanding graduate students. It gives undergraduates exposure to professional scientific communities, gives them a chance to practice presenting research papers in public, and allows them to explore opportunities for graduate school. I hope all three societies will continue and even ramp-up efforts to attract research-oriented undergraduates to the meetings.
Continue reading →

Motivating/rewarding reviewers

Among other things, researchers are expected to do research, publish the results of this research, and review the research of others. It is this reviewing part that I want to talk about today.

Reviewing is obviously one of the most important responsibilities of a researcher, one that can take a significant amount of time, but one that brings little reward, as it’s usually done for free. All a reviewer can show for it is a line on a CV saying “Reviewer for [put your favorite journal here]”. The purpose of this post is to propose a way to reward researchers who are good reviewers and spend a significant amount of time improving the work of others, often anonymously.

What if journals had awards for “Best reviewer of the year”? The laureate could then add this award on her CV, showing that she is doing a huge amount of service to her field. The award could be based on objective measures, such as the number of reviews returned in time, the number of reviews that concurred with the Editor-In-Chief’s decision, or could be more subjective, based on the Associate Editors and Editor-In-Chief assessments of the quality of the reviews they received. The award could be given with much ceremony at conference banquets, like awards for the best student paper, and perhaps with some money attached to it. Anonymity would not be broken, because all we would know about the laureate is that she reviewed N papers for journal X, not that she reviewed my paper submitted to journal X.

One could also think of a wall-of-fame type of thing, where reviewers would compete for the largest number of reviews returned in time, for instance. Or, to keep high levels of anonymity, give a way for a reviewer to know how her reviewing work compares to others: have I been reviewing more papers than 1%, 50%, 80% of the reviewers of this journal? If I see that I review less that my fellow researchers, perhaps I’ll be willing to accept the next invitation to review a paper. If I see that I review way more than my fellow researchers, perhaps I want to put that on my CV to show how altruistic I am.

Short of paying the reviewers for their reviews, which would perhaps be expensive for the smallest scientific societies, I think some type of reward/award system could be useful to appreciate the amount of time some researchers spend reviewing and improving the work of others. Given that systems for handling submissions and revisions such as “Manuscript Central” have all the stats available, that’s probably not very hard to do.

Bodega @ Evol2013

The schedule for the 2013 Evolution meeting was just released. I’m sure there will be many Bodega instructors and former students (and future, too!) giving fascinating talks and posters. Below are a few presentations that I found for instructors and students from Bodega2013. Please add your own presentation time in the comments so that we can all work these into our meeting schedules.

Gideon Bradburd – Sun. at 5:00 – Disentangling the effects of geographic and ecological isolation on genetic differentiation

Jeremy Brown – Tues. at 4:45 – Variable phylogenetic signal in a forensically important HIV-1 transmission cluster

Tracy Heath – Sun. at 4:45 – The Fossilized Birth-Death Process: A Coherent Model of Fossil Calibration for Bayesian Divergence Time Estimation

Hannah Marx – Tues. at 2:30 – Alien encounters of the floral kind: interpreting patterns of community assembly on the San Juan Islands

Mike May – Sun. at 8:30 (am) – MCDUSA: A Monte Carlo method for more reliable detection of lineage-specific rates of diversification

Matt McGee – Sun. at 2:00 – The evolution of cichlid craniofacial diversity

Brian Moore – Tues. at 3:45 – Bayesian inference of phylogeny from partitioned data

Sam Price – Sat. at 9:30 – Assembly & early diversification of modern reef fishes

Glenn Seeholzer – Mon. at 7:00 (pm) – The phylogenetic signal and mode of climate-niche evolution in the Neotropical bird family Furnariidae

Bob Thomson – Tues. at 11:15 – Estimating Phylogeny from microRNA Data: A Critical Appraisal

Discussion About Object-Oriented Design At Carl Boettiger’s Blog, Plus Notes On “Broken” Tutorials

A blessing and a curse of the R programming interface is that methods development occurs at a community scale. Progress tends to occur in a piecemeal fashion as developers make small advances within a massive, highly integrated, but often chaotically organized community. The great thing about R is that so many people are doing so many things in a modular framework that nearly any sort of analysis is possible. The drawback is that anytime there’s a flaw or a fundamental change in a given module (e.g., an R package, or a particular function), it can have widespread and unanticipated effects on any package that depends on it.

Over at his website/blog, Carl Boettiger points out that this problem is exacerbated by the fact that object-oriented design is rare in R, and as a result, changes in the underlying mechanics of functions tend to result in critical changes in the nature of their output, which spells trouble for dependencies. Carl does a nice job explaining how the use of object-oriented design can prevent such problems.

The context of Carl’s post is that the authors of the popular “geiger” R package have just released a major update that changes to many functions, function names, and outputs. According to Carl, most of these changes are great, but the rub is that several cause failures in packages, scripts, or tutorials that depend on “geiger”. These include Carl’s excellent “pmc” package, Travis Ingram‘s “surface” package, and perhaps most relevant to the Bodega crowd… several of our R phylogenetics tutorials! I’ve gotten several emails over the past week or so mentioning that these tutorials no longer work due to the “geiger” change.

If you’re interested in this sort of thing, check out Carl’s post, as well as the lengthy discussion it’s generated in his comments section. As for our tutorials, I’ll try to post new versions of the ones I wrote for continuous traits shortly. Most of the updates will be straightforward (thanks to Jeremy Brown for highlighting the necessary changes when he first wrote me about it), but those involving “pmc” will likely have to wait until Carl releases an update that’s compatible with the new “geiger”. Of course in the meantime, it should be possible to run the tutorials using older versions of “geiger”.

Evolution Deadline

A quick note that the early registration deadline for this year’s joint annual meeting of SSE, SSB, and ASN is fast approaching. Early registration ends on Friday, April 19th. I’m really looking forward to this one. It’s in a beautiful location and should nicely avoid that convention center wasteland feel that many meetings unfortunately have these days. I’m sure that many people associated with the Bodega workshop will be there. Who’s going?

Bodega, 1 Month Anniversary

I wanted to take advantage of the 1 month anniversary of the 2013 Bodega Bay applied phylogenetics workshop to do a quick debrief of the week, now that we’re all (hopefully) rehydrated and caught up on sleep.

So first, a HUGE thank you to the instructors, the staff at Bodega Bay marine labs (especially our principal contact, Lisa Valentine), the UC Davis EVE administrative staff, and our sponsors, Sierra Nevada Brewing Company and Lagunitas Brewing Company. And of course, a HUGER thank you to the students of 2013, probably the best class that has ever passed through the course, with the notable exception of the class of 2011, which, incidentally, is when I was a student in the workshop. Seriously, one sentence can’t hold enough superlatives to describe how exciting your research is.

Second, as part of a continuing effort (the new website, the blog, the tweeting!) to maintain a more cohesive community outside the one-week-a-year workshop, we wanted to encourage this year’s students, and those of workshops past, to stay in touch. Specifically, it’d be great if you could keep us apprised of any papers you publish that use skills you picked up at the Bodega Bay workshop. We’d love to profile your work on the blog, both to give you the Treethinkers bump and to advertise the value of the course for future students. So please, keep us abreast of the latest and greatest in phylogen* research, and send us your papers!

Third, because they aren’t going to thank themselves, I’d like to lead everyone in a round of applause for this year’s faculty: {Peter, Brian, Bob, Luke, Sam, Rich, John, Bruce, Jonathan, Tracy, Jeremy}. Based on all the comments on the course evaluations, the students really benefitted from all the mental horsepower you brought to the room!

Finally, what I’m sure has been on all of our minds: the Bodega Shake video. I’m not going to post it, because, you know, some of us probably want to be hired somewhere at some point, but if you need to see it, send me an email or find me at a conference (I’ll be at Snowbird this summer!).

Thanks again to everybody for a great week! Already excited for #Bodega2014…

Trevor Bedford’s BEAST Tutorial on Viral Phylodynamics

Many participants of the molecular evolution workshops I attend are very interested in methods for estimating the evolutionary dynamics of serially-sampled pathogens. Recent versions of BEAST and BEAST2 have some of the most exciting and cutting-edge models for understanding evolutionary processes in these data. Because of this, I wanted to call your attention to a new tutorial on this subject:

Trevor Bedford has posted a tutorial entitled: Inferring spatiotemporal dynamics of the H1N1 influenza pandemic from sequence data.

He provides several detailed exercises that will surely help anyone new to these methods understand how to analyze their own data in BEAST. Interestingly, Trevor is hosting his tutorial on github, which I think is a great idea.

Updated BEAST Tutorial

I really enjoy teaching and participating in phylogenetics workshops. Currently, I’m preparing my teaching materials for the Wellcome Trust-EMBL-EBI Advanced Course on Computational Molecular Evolution, where I have the awesome opportunity to teach a section on divergence time estimation with Jeff Thorne. Since I’ve made some minor updates to the BEAST tutorial that I’ve given at recent workshops, I wanted to create a more permanent page to host the document and data files. So, for those interested, you can find the updated tutorial here. I will try to keep this tutorial as up-to-date as possible.

Trees R Us: Introduction

“What I cannot create, I do not understand.” Richard Feynman

This series of posts is intended to be a hands-on R-based companion to some of the other things our contributors discuss. We might delve deeper into the behavior of the gamma distribution (or any of the many probability distributions popular in phylogenetics), code up an MCMC algorithm, or work through Felsenstein’s pruning algorithm, to name a few exercises. Playing around with these things in R, even in a simple way, can bring understanding that reading the primary literature or staring at Wikipedia cannot.

I hope this series sheds light on some of the more black-boxy aspects of statistical phylogenetics, and also helps beginning R users develop good programming habits. I invite others to contribute to the series as much as they’d like. I assume that most readers have a rudimentary understanding of R, as in have the ability to open their R GUI (or favorite IDE), write a script, and execute it.

As an initial post, I will first provide a very rough sketch of some of the salient features of the R language (with a small dose of personal opinion), introduce some good practices for writing in R, and then make sure readers are up to speed on writing functions, using for loops and apply-like functions, and the supremely important concept of vectorization. A basic understanding of these topics will help you navigate the code that I (and others) write and should form a solid foundation for writing your own scripts.

What’s the deal with R anyways?

R is a flexible, extensible programming language with a relatively gentle learning curve. These days, it seems to be the go-to language for young biologists with little background in computer science (like me, for certain values of young) who are trying to put together their own analyses. R code can be executed line-by-line, which makes writing software much easier for people who are not used to assembling a (buggy) program from scratch.

Continue reading →

Suggestions for a Gentle Bayesian Statistics Tutorial

Last week I was hosted by Mike Palopoli and the Bowdoin College Biology Department, where I gave a departmental seminar on my current work on Bayesian divergence time estimation methods. Bowdoin College is a 4-year liberal arts college with some very bright undergraduates. Several of the honors biology majors attended my talk and after the seminar Mike and I led a discussion of my work and computational evolutionary biology, in general.

During this discussion I got stumped by a question from one of the students. He asked if I could recommend a basic and gentle primer on Bayesian statistics for someone with very little statistics training. (This particular student is currently involved in a population genetics project using Bayesian analysis.) I recommended online teaching materials from the various phylogenetics/molecular evolution workshops, including Bodega and this blog, but I couldn’t point him toward a book that I felt would be suitable for someone without a strong understanding of probability theory. As a graduate student, I took Bill Jefferys’ Bayesian Inference course at the University of Texas, where we used Bayesian Data Analysis by Gelman et al. This book is a great introduction (as was Bill’s awesome course), but might be too advanced even for a bright and motivated undergraduate biology major.

So my goal for my first post on the Treethinkers’ blog is to seek out suggestions from readers: What is a good, introductory primer on Bayesian inference that is suitable for the undergraduate level?

Please add your suggestions in the comments below.