Phylogenetic Inference using MrBayes v3.2

A guided tutorial

Instructions are provided in a step-by-step tutorial.

The exercises require that you have the following software programs:

Source URL: http://treethinkers.org/tutorials/phylogenetic-inference-using-mrbayes-v3-2/.
(meow)

5 thoughts on “Phylogenetic Inference using MrBayes v3.2”

1. Fátima

I have being running some analyses for different markers, following the recommendation of not defining a model a priori (e.g. incorporating jModel Test best model estimations), but using the nst=mixed, averaging over the gtr models. However, when comparing with the marginal prior and posterior of my data the k_revmat are not that distinct and the ranges overlap (convergence, mixing ad sampling intensity properly assessed and reached). The gtrsubmodels are not uniform like for the prior but not as different as in the example presented in the tutorial. I understand that strong departure of marginal prior and posterior is always good and the opposite may not be bad news, but in this case what does it mean? When is it a bad news?

2. coyk

I have a question about interpreting my results from running just my priors. Is it ok to post my question here?

1. Coyk

OK. Here goes.
Here’s what my infile looks like
begin data;
dimensions ntax=53 nchar=2024;
format datatype=mixed(rna:1-1031,dna:1032-1995,Restriction:1996-2024) gap=- missing=? interleave=yes;

With data

MrBayes parameters

partition by_nine = 9:stems,notstems,COI1st,COI2nd,COI3rd,ND41st,ND42nd,ND43rd,28Sgaps;
set partition = by_nine;
lset applyto=(1) nucmodel=doublet;
lset applyto=(2,3,4,5,6,7,8) nucmodel=4by4;
lset applyto=(1,2,3,4,5,6,7,8) nst=mixed;
lset applyto=(1,2,3,4,5,6,7,8) rates=invgamma;
lset applyto=(9) rates=gamma;
lset applyto=(9) coding=variable;
prset applyto=(all) ratepr=variable;
mcmc ngen=5000000 printfreq=5000 samplefreq=4000 nchains=8 savebrlens=yes nruns=2 temp=0.01 diagnfreq=5000 Nswaps=10 stoprule=YES stopval=0.009 diagnstat=Avgstddev filename=cipres.out checkpoint=yes;
log start;
end;

Note: Other runs have been done with different mcmc parameters not on Cipres. I had a very hard time with convergence. The list above yielded better results, but ESS values are still low for TL and LnPr for both runs. Should run longer, but wanted to see if it was at least going in a good direction with the number of chains, low temp, and nswaps. Seems to be. This data has 9 partitions so there are lots and lots of parameters. The Tracer plots looked OK for most of them except for TL and LnPr as mentioned and k_revmats (there are 8 of them) and gtr_submodels (8 of these too). These look especially terrible. I have pictures but I don’t know how to attach.

Perusing various resources, I came across a tutorial from treethinkers.org. It talked about running priors without data to make sure they are valid. Guess I should’ve done that first. Better late than never. I was especially intrigued by this tutorial because it highlights both gtrsubmodel and k_revmat in particular.

The tutorial says “When using MCMC to sample without data, the k_revmat prior distribution in Tracer should (approximately) match the expected distribution.” If I go back and look at the histograms of all my k_revmats when I ran MrBayes using my data, the histograms don’t always look like right. The average of my 8 k_revmat vales is 3.7361 with a range from 3.254 to 4.732. So, if I am to interpret the histograms I get from running MrBayes with my data as the expected distributions, most of them don’t match my prior distribution. Same for gtrsubmodel.

I keep reading not to specify prior distributions – to let MrBayes work it out or you’ll bias your results by not taking into account the uncertainty in the process. For all parameters I assumed default/uninformative priors. Do these results mean I should tone down the partitioning? Or let these runs that I’ve finally managed to get to mix better, go way longer and/or combine them with a couple more independent runs? Or should I be changing something about my priors?

3. Coyk

I have a question about the expected behavior of the Splitmerge Revmat parameters. I have a dataset with 2 genes and 6 partitions: codon positions 1-3 for each gene. There are 53 taxa in the dataset. Here are the parameter settings
lset applyto=(all) nucmodel=4by4;
lset applyto=(all) nst=mixed;
lset applyto=(all) rates=invgamma;
prset applyto=(all) ratepr=variable;
mcmcp ngen=100000000 printfreq=1000 samplefreq=1000 nchains=4 savebrlens=yes nswaps=10 temp=0.01 nruns=4;
log start;
end;

As you can see, it’s alot of generation numbers, alot of swaps, and a really low temp. I used these parameters because I was getting very low swap acceptance rates for the revmat parameters.

When I look at the results in Tracer, all of the gtrsubmodel parameters and k_revmat parameters are horizontal lines instead of scattered points. Here is an example of the output

Parameter Mean Variance Lower Upper Median min ESS* avg ESS PSRF+
—————————————————————————————————-
TL{all} 4.304288 0.062514 3.821510 4.795940 4.291185 34707.27 36098.96 1.000
k_revmat{1} 3.753870 0.601820 3.000000 5.000000 4.000000 63548.51 64651.03 1.000
k_revmat{2} 3.262546 0.720611 2.000000 5.000000 3.000000 73722.77 74282.21 1.000
k_revmat{3} 4.625825 0.619045 3.000000 6.000000 5.000000 24736.48 25921.39 1.000
k_revmat{4} 3.881612 0.480283 3.000000 5.000000 4.000000 60089.20 61970.38 1.000
k_revmat{5} 3.308066 0.698417 2.000000 5.000000 3.000000 53645.31 54841.48 1.000
k_revmat{6} 3.627462 0.556238 3.000000 5.000000 4.000000 49802.24 50543.54 1.000

Here are the acceptance rates
run1 run2 run3 run4
26.0 25.9 25.9 25.9 Dirichlet(Revmat{1}
10.6 10.6 10.6 10.6 Splitmerge1(Revmat{1}
25.7 25.7 25.7 25.8 Splitmerge2(Revmat{1}
27.0 26.8 26.8 26.8 Dirichlet(Revmat{2}
43.0 43.1 43.1 43.1 Splitmerge1(Revmat{2}
43.7 43.6 43.5 43.5 Splitmerge2(Revmat{2}
25.8 25.5 25.6 25.7 Dirichlet(Revmat{3}
4.9 4.9 4.9 4.9 Splitmerge1(Revmat{3}
5.7 5.7 5.7 5.7 Splitmerge2(Revmat{3}
25.9 25.7 25.9 25.7 Dirichlet(Revmat{4}
9.3 9.3 9.2 9.2 Splitmerge1(Revmat{4}
17.1 17.0 17.0 17.0 Splitmerge2(Revmat{4}
26.2 26.2 26.0 26.3 Dirichlet(Revmat{5}
12.7 12.7 12.7 12.7 Splitmerge1(Revmat{5}
14.6 14.7 14.8 14.7 Splitmerge2(Revmat{5}
25.5 25.5 25.7 25.6 Dirichlet(Revmat{6}
7.9 7.9 7.9 7.9 Splitmerge1(Revmat{6}
10.2 10.2 10.2 10.3 Splitmerge2(Revmat{6}

Some of the Splitmerges are OK but some are still low.
I would like to stop torturing this dataset. Are the results of the gtrsubmodel parameters and the k_revmat parameters – the fact that they give essentially horizontal Tracer lines instead of scattered points expected???