Phylogenetic Inference using MrBayes v3.2

A guided tutorial

Instructions are provided in a step-by-step tutorial.

Download the output files for the exercises:

  1. Model Selection & Partitioning using Bayes Factors
  2. Averaging Over the GTR Family of Models

The exercises require that you have the following software programs:

Creative Commons License
This tutorial was written by Tracy Heath, Conor Meehan, and Brian Moore for workshops on applied phylogenetics and molecular evolution and is licensed under a Creative Commons Attribution 4.0 International License.
Source URL: http://treethinkers.org/tutorials/phylogenetic-inference-using-mrbayes-v3-2/.
(meow)

4 thoughts on “Phylogenetic Inference using MrBayes v3.2

  1. Fátima

    I have being running some analyses for different markers, following the recommendation of not defining a model a priori (e.g. incorporating jModel Test best model estimations), but using the nst=mixed, averaging over the gtr models. However, when comparing with the marginal prior and posterior of my data the k_revmat are not that distinct and the ranges overlap (convergence, mixing ad sampling intensity properly assessed and reached). The gtrsubmodels are not uniform like for the prior but not as different as in the example presented in the tutorial. I understand that strong departure of marginal prior and posterior is always good and the opposite may not be bad news, but in this case what does it mean? When is it a bad news?

    Reply
      1. Coyk

        OK. Here goes.
        Here’s what my infile looks like
        begin data;
        dimensions ntax=53 nchar=2024;
        format datatype=mixed(rna:1-1031,dna:1032-1995,Restriction:1996-2024) gap=- missing=? interleave=yes;

        With data

        MrBayes parameters

        partition by_nine = 9:stems,notstems,COI1st,COI2nd,COI3rd,ND41st,ND42nd,ND43rd,28Sgaps;
        set partition = by_nine;
        lset applyto=(1) nucmodel=doublet;
        lset applyto=(2,3,4,5,6,7,8) nucmodel=4by4;
        lset applyto=(1,2,3,4,5,6,7,8) nst=mixed;
        lset applyto=(1,2,3,4,5,6,7,8) rates=invgamma;
        lset applyto=(9) rates=gamma;
        lset applyto=(9) coding=variable;
        prset applyto=(all) ratepr=variable;
        unlink Pinvar=(all);
        unlink shape=(all);
        unlink statefreq=(all);
        unlink revmat=(all);
        mcmc ngen=5000000 printfreq=5000 samplefreq=4000 nchains=8 savebrlens=yes nruns=2 temp=0.01 diagnfreq=5000 Nswaps=10 stoprule=YES stopval=0.009 diagnstat=Avgstddev filename=cipres.out checkpoint=yes;
        log start;
        end;

        Note: Other runs have been done with different mcmc parameters not on Cipres. I had a very hard time with convergence. The list above yielded better results, but ESS values are still low for TL and LnPr for both runs. Should run longer, but wanted to see if it was at least going in a good direction with the number of chains, low temp, and nswaps. Seems to be. This data has 9 partitions so there are lots and lots of parameters. The Tracer plots looked OK for most of them except for TL and LnPr as mentioned and k_revmats (there are 8 of them) and gtr_submodels (8 of these too). These look especially terrible. I have pictures but I don’t know how to attach.

        Perusing various resources, I came across a tutorial from treethinkers.org. It talked about running priors without data to make sure they are valid. Guess I should’ve done that first. Better late than never. I was especially intrigued by this tutorial because it highlights both gtrsubmodel and k_revmat in particular.

        The tutorial says “When using MCMC to sample without data, the k_revmat prior distribution in Tracer should (approximately) match the expected distribution.” If I go back and look at the histograms of all my k_revmats when I ran MrBayes using my data, the histograms don’t always look like right. The average of my 8 k_revmat vales is 3.7361 with a range from 3.254 to 4.732. So, if I am to interpret the histograms I get from running MrBayes with my data as the expected distributions, most of them don’t match my prior distribution. Same for gtrsubmodel.

        I keep reading not to specify prior distributions – to let MrBayes work it out or you’ll bias your results by not taking into account the uncertainty in the process. For all parameters I assumed default/uninformative priors. Do these results mean I should tone down the partitioning? Or let these runs that I’ve finally managed to get to mix better, go way longer and/or combine them with a couple more independent runs? Or should I be changing something about my priors?

        Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>