MCMC robot needs love!
“It is the obvious things that are so difficult to see most of the time. People say ‘It’s as plain as the nose on your face.’ But how much of the nose on your face can you see, unless someone holds a mirror up to you?” Isaac Asimov
The ability to rigorously diagnose MCMC performance requires familiarity with some basic concepts from probability theory (discussed last time) and a strong intuitive understanding of the underlying mechanics—we need to know how the algorithms work in order to understand when they are not working. In this installment we’ll briefly cover the mechanics of the Metropolis Hastings MCMC algorithm.
Recall that Bayesian inference is focused on the posterior probability density of parameters. The posterior probability of the parameters can, in principle, be solved using Bayes’ theorem. However, (most) phylogenetic problems cannot be solved analytically, owing mainly to the denominator of Bayes’ theorem—the marginal likelihood requires solving multiple integrals (for all of the continuous parameters, such as branch lengths, substitution rates, stationary frequencies, etc.) for each tree, and summing over all trees.
Accordingly, Bayesian inference of phylogeny typically resorts to numerical methods that approximate the posterior probability density. There are many flavors of Markov chain Monte Carlo (MCMC) algorithms—Gibbs samplers, Metropolis-coupled and reversible-jump MCMC, etc.—we will consider the Metropolis Hastings (MH) algorithm because it is commonly used for phylogenetic problems, and because it is similar to many other variants (which we will cover elsewhere). Note that MCMC and Bayesian inference are distinct animals: they have a relationship similar to that between ‘optimization algorithms’ and ‘maximum-likelihood estimation.’ Some Bayesian inference can be accomplished without MCMC algorithms, and MCMC algorithms can be used to solve problems in non-Bayesian statistical frameworks.
MCMC robot needs love!
“Probability theory is nothing but common sense reduced to calculation.” Pierre Laplace
The ability to rigorously diagnose MCMC performance is founded on a solid understanding of how the algorithms work. Fortunately, this does not necessarily entail a lot of complex probability theory—knowledge of some basic probability concepts and a strong intuitive understanding of the mechanics is sufficient. We’ll dedicate this installment to a brief introduction to Bayesian inference and some related concepts from probability theory. Next time we’ll get into the mechanics of MCMC.
First, let’s consider what the MCMC is trying to approximate. Bayesian inference is focussed on posterior probabilities, P(|X), the probability of the parameter, , given the data, X, which is inferred using Bayes’ Theorem:
where P(X|) is the likelihood function (the probability of observing the data given the parameter value), P() is the prior probability for the parameter, and P(X) is the marginal likelihood of observing the data under the model.
MCMC robot needs love!
“You can never be absolutely certain that the MCMC is reliable, you can only identify when something has gone wrong.” Andrew Gelman
Model-based inference is, after all, based on the model. Careful research means being vigilant both regarding the choice of model and rigorously assessing our ability to estimate under the chosen model. These two concerns pertain both to model-based inference of phylogeny—using programs such as RaXML or MrBayes—and to inferences based on phylogeny—such as the study of character evolution, lineage diversification—and indeed to all model-based inference.
The first issue—model specification, which entails three closely related issues—is critically important for the simple reason that unbiased estimates can only be obtained under a model that provides a reasonable description of the process that gave rise to our data. Model selection entails assessing the relative fit of our dataset to a pool of candidate models. Rankings are based on model-selection methods that compare the relative fit of candidate modes based either on their maximum-likelihood estimates (which measures the fit of the data to the model at a single point in parameter space), or on the marginal likelihood of the candidate models (which measures the average fit of the candidate models to the data). Model adequacy—an equally important but relatively neglected issue—assesses the absolute fit of the data to the model. Model uncertainty is related to the common (and commonly ignored) scenario in which multiple candidate models provide a similar fit to the data: in this scenario, conditioning on any single model (even the best) will lead to biased estimates, and so model averaging is required to accommodate uncertainty in the choice of model.