# 2014 Bodega Bay Workshop – Apply now

Applications are now being accepted for the 2014 Workshop in Applied Phylogenetics. This year’s workshop will run from March 8 to 15 at the Bodega Bay Marine Lab on the northern California Coast. The application deadline is January 3rd. See the 2014 workshop page for more information and instructions to apply.

# Workshop on Integrating Molecular Phylogenies and the Fossil Record

Last week I attended a workshop organized by Hélène Morlon, Tiago Quental, and Charles Marshall on integrating data from the fossil record into phylogenetic methods. This three-day workshop was sponsored by the the France-Berkeley Fund, a cool program that provides seed grants to build partnerships between UC Berkeley researchers and French collaborators. All of the events took place at the UCMP on the UC Berkeley campus.

Hélène, Charles, and Tiago recognized the increasing interest in methods and analyses that incorporate data from fossil taxa; and since there are several of us working in this area–particularly in methods development–the need for building a collaborative network is critical. Furthermore, as methods become more and more reliant on data from the fossil record, connections between neontologists and paleontologists must be formed. Notably, a similar working group – organized by Sam Price and Lars Schmitz – was held at NESCent this past spring and was made up of an overlapping set of researchers. One result of the NESCent catalysis meeting will be a SSE Symposium at Evolution 2014 on “Reuniting fossil and extant approaches to macroevolution”.

# Jukes Cantor Model of DNA substitution

We have had a series of posts introducing several foundational tools in phylogenetic inference including Bayesian reasoning, Markov Chain Monte Carlo, and the gamma distribution’s many uses in phylogenetics. Today, we’ll continue with this theme in a crosspost from my UH colleague Floyd Reed‘s laboratory blog. Here, Floyd gives a simple derivation of the Jukes Cantor model of DNA substitution. Here it is in lightly edited form:

In previous posts I talked about irreversible and reversible mutations between two states or alleles.  However, there are four nucleotides, A, C, G, and T.  How can we model mutations among these four states at a single nucleotide site?  It turns out that this is important to consider for things like making gene trees to represent species relationships.  If we just use the raw number of differences between two species’ DNA sequences we can get misleading results.  It is actually better to estimate and correct for the total number of changes that have occurred, some fraction of which may not be visible to us.  The simplest way to do this is the Jukes-Cantor (1969) model.

Imagine a nucleotide can mutate with the same probability to any other nucleotide, so that the mutation rates in all directions are equal and symbolized by $\mu$.

So from the point of view of the “A” state you can mutate away with a probability of $3\mu$ (lower left above).  However, another state will only mutate to an “A” with a probability of $\mu$ (lower right above); the “T” could have just as easily mutated to a “G” or “C” instead of an “A”.

# Q&A: Excluding Character Sets with Partitions in MrBayes

Bodega Workshop alum Christoph Hiedtke has the following question regarding excluded character sets when setting up partitions in MrBayes. With his permission I’m posting it here. I’ve run into this exact problem before and I’m sure many others have also.

Christoph writes:

Hey gang, how is everybody doing?

I am going crazy over what initially seemed to be a rather trivial MrBayes operation. Initially I had set up a MrBayes file dividing my alignment into 3 partitions and it executes perfectly. I then wanted re-run the same file but this time excluding one partition from my analysis with the designated “exclude” command, but for some reason I am getting an error I cannot get around. Does anyone know whats going on?

Here is part of my command block:

begin MrBayes;
charset p1 = 1-370 371-844 845-1124 2159-2395 3018-3328;
charset p2 = 1125-1404 1685-1921 1922-2158 2396-2706 2707-3017;
charset p3 = 1405-1684;
partition parts = 3: p1, p2, p3;
exclude p3;
set partition = parts;
prset applyto=(all) ratepr=variable;
end;

MrBayes gets stuck on the “set partition = parts” line with the following error:

Defining charset called p1
Defining charset called p2
Defining charset called p3
Defining partition called parts
Excluding character(s)
Setting parts as the partition, dividing characters into 3 parts.
Setting model defaults
Seed (for generating default start values) = 1507443219
You must have at least one site in a partition. Partition 3
has 0 site patterns.
Error when setting parameter “Partition” (2)

HELP!!

Ok, before getting to the answer, I’ll just point out that the obvious alternative approach of excluding p3 and THEN defining a partition that contains only the p1 and p2 character sets will also give an error for not including all sites into one of the partitions. All sites need to be assigned to a partition and all partitions need to have at least one site, so how can we get away with excluding anything?

The solution is to change the command block as follows:

begin MrBayes;
charset p1 = 1-370 371-844 845-1124 2159-2395 3018-3328;
charset p2 = 1125-1404 1685-1921 1922-2158 2396-2706 2707-3017;
charset p3 = 1405-1684;
partition parts = 2: p1, p2 p3;
exclude p3;
set partition = parts
prset applyto=(all) ratepr=variable;
end;

We’ve defined 2 partitions (instead of 3) assigning the ‘extra’ character set to the second partition (note the missing comma). Now we have all sites assigned to a partition and no partitions are empty, so we’re free to exclude the character set.

# Phylogenetic Computing: What’s Your Solution?

Grant deadlines for DEB are coming up and this has me thinking about the best way to go about actually doing the computation that I’m proposing to do. Since my lab is still in its early “get up and running” phase, I’m also in a position to invest in new resources and set up some standard operating procedures for the future. This is an issue that all phylogeneticists struggle with at one point or another, so I thought it would be useful to poll the community. What do you use for big analysis jobs in your lab?

Like many people in my generation of phylogenetics, I started out in the days of taping warning notes to monitors (Figure 1) (i.e., cobble together whatever desktop machines one can get hands on…and then jealously guard them from the enemy lab-mates for the months it takes your analysis to finish). Times have obviously changed since then and we aren’t, as a field, nearly as computationally limited as we were 10 or even 5 years ago. Many free or easily accessible computing options are now available: CIPRES, iPLANT discovery environment, Amazon’s EC2, XSEDE (formerly TeraGrid), and any number of university/college/departmental clusters…and that’s not to mention the homebuilt clusters and trusty (dusty) desktops sitting in the corners of our labs. The workhorse software of our field is also faster than it used to be, allowing us to get more done in the same amount of time, irrespective of the hardware being used.

Figure 1 – The classic PAUP* warning note (note: I stole this from the ad for Brian O’Meara’s “Fast Free Phylogenies” HPC workshop at NIMBioS)

For the last few years, I’ve enjoyed the benefit of having my own small but speedy cluster (built cheaply using commodity parts), as well as a TeraGrid allocation. These have worked well for my needs: they’ve allowed me to get analyses finished in a timely fashion; run many tests and toy analyses without feeling limited; lend time to coworkers in a pinch…and aside from all that, the cluster allows for lots of satisfying tinkering during off hours. All that said, the TeraGrid allocation is now finished, the large cluster on Maui that I’d been hoping to get an allocation on is no longer available, and I’m already seeing that the tropical climate here on Oahu is hell on hardware (e.g. my monitor fills up with condensation anytime I leave it off for more than a day or two). I’m thinking about eventually moving completely into EC2 and XSEDE and not having to worry about hardware at all.

I’d appreciate learning about the experiences that others have had. What is your preferred solution for phylogenetic computing?

# Evolution 2013: The Good, the Better, and the Future

The 2013 Evolution meetings (joint meetings of the Society for the Study of Evolution, Society of Systematic Biologists, and the American Society of Naturalists) were held in Snowbird, Utah, from 21-25 June 2013. The meetings were a great success, and as usual, the meetings featured many packed sessions on phylogenetic methods, theory, and applications. These meetings were held in Snowbird twenty years ago (1993) as well, but much has changed since then. As I flew home from Utah this week, I contemplated a few of things that made the meetings successful, and I compiled this list of thoughts and recommendations for future meetings.

Things that made #Evol2013 a success:

1. The presence of outstanding undergraduates who are working on research. This was better than I ever remember in the past. In addition to fostering science careers for undergrads, it also makes the meeting much more attractive to faculty who are interested in recruiting outstanding graduate students. It gives undergraduates exposure to professional scientific communities, gives them a chance to practice presenting research papers in public, and allows them to explore opportunities for graduate school. I hope all three societies will continue and even ramp-up efforts to attract research-oriented undergraduates to the meetings.

# Motivating/rewarding reviewers

Among other things, researchers are expected to do research, publish the results of this research, and review the research of others. It is this reviewing part that I want to talk about today.

Reviewing is obviously one of the most important responsibilities of a researcher, one that can take a significant amount of time, but one that brings little reward, as it’s usually done for free. All a reviewer can show for it is a line on a CV saying “Reviewer for [put your favorite journal here]“. The purpose of this post is to propose a way to reward researchers who are good reviewers and spend a significant amount of time improving the work of others, often anonymously.

What if journals had awards for “Best reviewer of the year”? The laureate could then add this award on her CV, showing that she is doing a huge amount of service to her field. The award could be based on objective measures, such as the number of reviews returned in time, the number of reviews that concurred with the Editor-In-Chief’s decision, or could be more subjective, based on the Associate Editors and Editor-In-Chief assessments of the quality of the reviews they received. The award could be given with much ceremony at conference banquets, like awards for the best student paper, and perhaps with some money attached to it. Anonymity would not be broken, because all we would know about the laureate is that she reviewed N papers for journal X, not that she reviewed my paper submitted to journal X.

One could also think of a wall-of-fame type of thing, where reviewers would compete for the largest number of reviews returned in time, for instance. Or, to keep high levels of anonymity, give a way for a reviewer to know how her reviewing work compares to others: have I been reviewing more papers than 1%, 50%, 80% of the reviewers of this journal? If I see that I review less that my fellow researchers, perhaps I’ll be willing to accept the next invitation to review a paper. If I see that I review way more than my fellow researchers, perhaps I want to put that on my CV to show how altruistic I am.

Short of paying the reviewers for their reviews, which would perhaps be expensive for the smallest scientific societies, I think some type of reward/award system could be useful to appreciate the amount of time some researchers spend reviewing and improving the work of others. Given that systems for handling submissions and revisions such as “Manuscript Central” have all the stats available, that’s probably not very hard to do.

# Bodega @ Evol2013

The schedule for the 2013 Evolution meeting was just released.  I’m sure there will be many Bodega instructors and former students (and future, too!) giving fascinating talks and posters.  Below are a few presentations that I found for instructors and students from Bodega2013.  Please add your own presentation time in the comments so that we can all work these into our meeting schedules.

Gideon Bradburd – Sun. at 5:00 - Disentangling the effects of geographic and ecological isolation on genetic differentiation

Jeremy Brown – Tues. at 4:45 - Variable phylogenetic signal in a forensically important HIV-1 transmission cluster

Tracy Heath – Sun. at 4:45 - The Fossilized Birth-Death Process: A Coherent Model of Fossil Calibration for Bayesian Divergence Time Estimation

Hannah Marx – Tues. at 2:30 - Alien encounters of the floral kind: interpreting patterns of community assembly on the San Juan Islands

Mike May – Sun. at 8:30 (am) - MCDUSA: A Monte Carlo method for more reliable detection of lineage-specific rates of diversification

Matt McGee – Sun. at 2:00 - The evolution of cichlid craniofacial diversity

Brian Moore – Tues. at 3:45 - Bayesian inference of phylogeny from partitioned data

Sam Price – Sat. at 9:30 - Assembly & early diversification of modern reef fishes

Glenn Seeholzer – Mon. at 7:00 (pm) - The phylogenetic signal and mode of climate-niche evolution in the Neotropical bird family Furnariidae

Bob Thomson – Tues. at 11:15 - Estimating Phylogeny from microRNA Data: A Critical Appraisal