July | 2013 | Workshop in Applied Phylogenetics

Bodega Workshop alum Christoph Hiedtke has the following question regarding excluded character sets when setting up partitions in MrBayes. With his permission I’m posting it here. I’ve run into this exact problem before and I’m sure many others have also.

Christoph writes:

Hey gang, how is everybody doing?

I am going crazy over what initially seemed to be a rather trivial MrBayes operation. Initially I had set up a MrBayes file dividing my alignment into 3 partitions and it executes perfectly. I then wanted re-run the same file but this time excluding one partition from my analysis with the designated “exclude” command, but for some reason I am getting an error I cannot get around. Does anyone know whats going on?

Here is part of my command block:

begin MrBayes;
charset p1 = 1-370 371-844 845-1124 2159-2395 3018-3328;
charset p2 = 1125-1404 1685-1921 1922-2158 2396-2706 2707-3017;
charset p3 = 1405-1684;
partition parts = 3: p1, p2, p3;
exclude p3;
set partition = parts;
unlink statefreq=(all) revmat=(all) shape=(all) pinvar=(all);
prset applyto=(all) ratepr=variable;
end;

MrBayes gets stuck on the “set partition = parts” line with the following error:

Defining charset called p1
Defining charset called p2
Defining charset called p3
Defining partition called parts
Excluding character(s)
Setting parts as the partition, dividing characters into 3 parts.
Setting model defaults
Seed (for generating default start values) = 1507443219
You must have at least one site in a partition. Partition 3
has 0 site patterns.
Error when setting parameter “Partition” (2)

HELP!!

Ok, before getting to the answer, I’ll just point out that the obvious alternative approach of excluding p3 and THEN defining a partition that contains only the p1 and p2 character sets will also give an error for not including all sites into one of the partitions. All sites need to be assigned to a partition and all partitions need to have at least one site, so how can we get away with excluding anything?

The solution is to change the command block as follows:

begin MrBayes;
charset p1 = 1-370 371-844 845-1124 2159-2395 3018-3328;
charset p2 = 1125-1404 1685-1921 1922-2158 2396-2706 2707-3017;
charset p3 = 1405-1684;
partition parts = 2: p1, p2 p3;
exclude p3;
set partition = parts
unlink statefreq=(all) revmat=(all) shape=(all) pinvar=(all);
prset applyto=(all) ratepr=variable;
end;

We’ve defined 2 partitions (instead of 3) assigning the ‘extra’ character set to the second partition (note the missing comma). Now we have all sites assigned to a partition and no partitions are empty, so we’re free to exclude the character set.

Grant deadlines for DEB are coming up and this has me thinking about the best way to go about actually doing the computation that I’m proposing to do. Since my lab is still in its early “get up and running” phase, I’m also in a position to invest in new resources and set up some standard operating procedures for the future. This is an issue that all phylogeneticists struggle with at one point or another, so I thought it would be useful to poll the community. What do you use for big analysis jobs in your lab?

Like many people in my generation of phylogenetics, I started out in the days of taping warning notes to monitors (Figure 1) (i.e., cobble together whatever desktop machines one can get hands on…and then jealously guard them from ~~the enemy~~ lab-mates for the months it takes your analysis to finish). Times have obviously changed since then and we aren’t, as a field, nearly as computationally limited as we were 10 or even 5 years ago. Many free or easily accessible computing options are now available: CIPRES, iPLANT discovery environment, Amazon’s EC2, XSEDE (formerly TeraGrid), and any number of university/college/departmental clusters…and that’s not to mention the homebuilt clusters and trusty (dusty) desktops sitting in the corners of our labs. The workhorse software of our field is also faster than it used to be, allowing us to get more done in the same amount of time, irrespective of the hardware being used.

Figure 1 – The classic PAUP* warning note (note: I stole this from the ad for Brian O’Meara’s “Fast Free Phylogenies” HPC workshop at NIMBioS)

For the last few years, I’ve enjoyed the benefit of having my own small but speedy cluster (built cheaply using commodity parts), as well as a TeraGrid allocation. These have worked well for my needs: they’ve allowed me to get analyses finished in a timely fashion; run many tests and toy analyses without feeling limited; lend time to coworkers in a pinch…and aside from all that, the cluster allows for lots of satisfying tinkering during off hours. All that said, the TeraGrid allocation is now finished, the large cluster on Maui that I’d been hoping to get an allocation on is no longer available, and I’m already seeing that the tropical climate here on Oahu is hell on hardware (e.g. my monitor fills up with condensation anytime I leave it off for more than a day or two). I’m thinking about eventually moving completely into EC2 and XSEDE and not having to worry about hardware at all.

I’d appreciate learning about the experiences that others have had. What is your preferred solution for phylogenetic computing?

Workshop in Applied Phylogenetics

Bodega Bay, California

Monthly Archives: July 2013

Q&A: Excluding Character Sets with Partitions in MrBayes

Phylogenetic Computing: What’s Your Solution?