We have had a series of posts introducing several foundational tools in phylogenetic inference including Bayesian reasoning, Markov Chain Monte Carlo, and the gamma distribution’s many uses in phylogenetics. Today, we’ll continue with this theme in a crosspost from my UH colleague Floyd Reed‘s laboratory blog. Here, Floyd gives a simple derivation of the Jukes Cantor model of DNA substitution. Here it is in lightly edited form:
In previous posts I talked about irreversible and reversible mutations between two states or alleles. However, there are four nucleotides, A, C, G, and T. How can we model mutations among these four states at a single nucleotide site? It turns out that this is important to consider for things like making gene trees to represent species relationships. If we just use the raw number of differences between two species’ DNA sequences we can get misleading results. It is actually better to estimate and correct for the total number of changes that have occurred, some fraction of which may not be visible to us. The simplest way to do this is the Jukes-Cantor (1969) model.
Imagine a nucleotide can mutate with the same probability to any other nucleotide, so that the mutation rates in all directions are equal and symbolized by .
So from the point of view of the “A” state you can mutate away with a probability of (lower left above). However, another state will only mutate to an “A” with a probability of (lower right above); the “T” could have just as easily mutated to a “G” or “C” instead of an “A”.