The Gamma Distribution

The gamma distribution is a continuous probability distribution that is popular for a range of phylogenetic applications. The gamma distribution is popular in part because its a bit of a shape shifter that can assume a range of shapes, from exponential to normal. This flexibility results from the fact that gamma distribution has two parameters. In most phylogenetic applications, these parameters are referred to as the shape parameter, \alpha, and the rate parameter, \beta, (note that some applications of the rate parameter, \beta, parameter is replaced by the “scale parameter,” which is simply the inverse of the rate parameter).

You can explore the impact of these two parameters on the gamma distribution using the following R scripts (note that this script focuses on plotting values that flank the default parameter settings for the gamma distribution in the program SIMMAP 1.5):

Using the R script below, you can generate probability densities for the gamma distribution that vary the value of the shape parameter (\alpha) (Fig. 1). As you can see, varying \alpha has a strong impact on the shape of the gamma distribution. The gamma distribution is the sum of \alpha independent and identically distributed (i.i.d.) exponential distributions (i.e., that have the same \beta rate parameter). Accordingly, when \alpha = 1, the gamma collapses to an exponential distribution, when \alpha >> 1, the gamma distribution increasingly resembles a normal distribution.
Figure 1: The impact of varying the shape parameter (alpha) on the gamma distribution.

Figure 1: The impact of varying the shape parameter (alpha) on the gamma distribution.

#Generate a plot of gamma distributions that vary the shape parameter (alpha).
x <- seq(0, 100, length=200)
simmapDefaultGamma <- dgamma(x, shape=1.25, scale=1/0.25) #Make probability density function for SIMMAP default gamma distribution
plot(x, simmapDefaultGamma, type="l", yaxs="i", xaxs="i", ylim=c(0,0.16), xlim=c(0,100), xlab="x value", ylab="Density", main="Probability density for gamma distribution with variable alpha and beta=0.25", lwd=0)
colors <- c("red", "black", "blue", "darkgreen", "purple", "orange")
alphas <- c(0.1, 1.25, 2, 4, 8, 10)
labels <- c("alpha=0.1", "alpha=1.25 (SIMMAP default)", "alpha=2", "alpha=4", "alpha=8", "alpha=10")
for(i in 1:length(alphas)) {
hx <- dgamma(x, shape=alphas[i], rate=1, scale=1/0.25)
lines(x, hx, lwd=3, col=colors[i])
}
legend("topright", inset=.05, title="Probability densities",
labels, lwd=3, col=colors)
Using the R script below, you can visualize the impact of varying the rate parameter (\beta) while keeping the shape parameter (\alpha) constant (Fig. 2). \beta has a strong impact on the shape of the gamma distribution. When \beta is set to less than 1, we tend to observe relatively broad distributions with long tails. As we increase the value of \beta, we observe increasingly tight distributions. This effect stems from the fact that the variance of the gamma is \alpha/\beta^2.
Figure 2: The impact of varying the rate parameter (beta) on the gamma distribution.

Figure 2: The impact of varying the rate parameter (beta) on the gamma distribution.

#Generate a plot of gamma distributions that vary the rate parameter (beta) as in Figure 2 below.
x <- seq(0, 100, length=200)
simmapDefaultGamma <- dgamma(x, shape=1.25, scale=1/0.25) #Make probability density function for SIMMAP default gamma distribution
#plot(x, simmapDefaultGamma, type="l")
plot(x, simmapDefaultGamma, type="l", yaxs="i", xaxs="i", ylim=c(0,0.9), xlim=c(0,70), xlab="x value", ylab="Density", main="Probability density for gamma distribution with a fixed alpha=1.25 and variable beta", lwd=2)
colors <- c("red", "black", "blue", "darkgreen", "purple", "orange")
betas <- c(0.1, 0.25, 2, 4, 8, 10)
labels <- c("beta=0.1", "beta=0.25 (SIMMAP default)", "beta=2", "beta=4", "beta=8", "beta=10")
for(i in 1:length(betas)) {
hx <- dgamma(x, shape=1.25, rate=1, scale=1/betas[i])
lines(x, hx, lwd=2, col=colors[i])
}
legend("topright", inset=.05, title="Probability densities",labels, lwd=2, col=colors)
For many phylogenetic applications of the gamma distribution -- e.g, to accommodate variation in substitution rate across sites (ASRV) -- the \alpha and \beta parameters are constrained to be equal. You can visualize the gamma distributions generated under these conditions using the R script below (Fig. 3). Because the mean of the gamma distribution is \alpha / \beta, this constraint ensures that the gamma distribution has a mean of one. This is important when the gamma distribution is used as a prior probability density on ASRV, as it retains the ability to interpret branch lengths as the expected (mean) number of substitutions per site.
Figure 3: Gamma distributions with alpha and beta set equal to one another.

Figure 3: Gamma distributions with alpha and beta set equal to one another.

#Generate a plot of gamma distributions with alpha and beta equal to one another as in Figure 3 below.
x <- seq(0, 100, length=200)
simmapDefaultGamma <- dgamma(x, shape=1.25, scale=1/0.25) #Make probability density function for SIMMAP default gamma distribution
#plot(x, simmapDefaultGamma, type="l")
plot(x, simmapDefaultGamma, type="l", yaxs="i", xaxs="i", ylim=c(0,2), xlim=c(0,30), xlab="x value", ylab="Density", main="Probability density for gamma distribution with a alpha and beta equal to one another", lwd=2)
colors <- c("red", "black", "blue", "darkgreen", "purple", "orange")
alphas <- c(0.1, 0.5, 1, 5, 20)
betas <- c(0.1, 0.5, 1, 5, 20)
labels <- c("alpha=0.1, beta=0.1", "alpha=1.25, beta=0.25 (SIMMAP default)", "alpha=0.5, beta=0.5", "alpha=1, beta=1", "alpha=5, beta=5", "alpha=20, beta=20")
for(i in 1:length(betas)) {
hx <- dgamma(x, shape=alphas[i], rate=1, scale=1/betas[i])
lines(x, hx, lwd=2, col=colors[i])
}
legend("topright", inset=.05, title="Probability densities",labels, lwd=2, col=colors)

6 thoughts on “The Gamma Distribution

  1. Mike May

    Great idea for a post. Personally, I find that playing around with probability functions in R is the best way to develop an intuition for them.

    This bit of code should drive home the point that the sum of k exponential random variable with rate &lambda is gamma distributed with &alpha = k and &beta = &lambda.


    k = 1000
    lambda = 1

    # simulate the sum of k exponential(1) random variables 10000 times.
    x = sapply(1:10000,function(x) sum(rexp(n=k,rate=lambda)))

    # plot the histogram of the simulated sums.
    hist(x,freq=FALSE,breaks=50)

    # plot the probability density of the gamma distribution with alpha = k and beta = lambda.
    curve(dgamma(x=x,shape=k,rate=lambda),from=min(x),to=max(x),n=1000,add=TRUE)

    (Apologies in advance if my tags are messed up, there is no preview!)

    1. Bob Thomson

      Timely post! I was just explaining this the other day and wishing I had some R code handy to demonstrate. Thanks guys.

      Mike, sorry about the tags. We have convenient latex support, although this is the first time I’ve tried it in a comment. Let’s see how it looks.

      \alpha \beta \lambda
      \LaTeX

      To use it, use latex tags as follows $latex your-code-here$

  2. Brian Moore

    I agree that this could be the start of a nice `probability distributions in R’ series (so feel free to dive in!). This would be useful in general, and a nice primer for the Bodega workshop.

    As an (obscure) aside about the mean-one gamma: one of its many uses is as a prior probability density for rate multipliers in the compound Poisson process (CPP) relaxed-clock model proposed by Huelsenbeck et al. (2000)—substitution-rate shifts occur along the tree as Poisson events (with exponentially distributed waiting times, with rate \lambda), and when they occur, the current rate is multiplied by a gamma-distributed random variable. When I originally read the CPP paper, I thought that (in keeping with the logic of the use of the mean-one gamma for accommodating ASRV), that the rate multipliers were also mean-one gamma distributed random variable m_i\sim \mbox{Gamma}(\alpha,\alpha).

    However, it turns out that the product of mean-one gamma-distributed multipliers converges to zero as K gets large. Accordingly, Huelsenbeck et al. (2000) actually parameterize the gamma prior on rate multipliers so that the mean of the log of the m_i rate multipliers is zero: \mathbb{E}[\log(m_i)] = 0. The rate multipliers are therefore approximately mean-one gamma-distributed random variables: m_i\sim \mbox{Gamma}(\alpha, e^{\psi(\alpha)}), where \psi(\cdot) is the derivative of the log of the gamma function.—Brian

  3. Samantha Price

    I think it is worth reminding people that in SIMMAP the gamma distribution is only used for the overall substitution rate of a character while a symmetric beta prior is used for the morphological state frequencies.

  4. Pingback: Jukes Cantor Model of DNA substitution | Workshop in Applied Phylogenetics

  5. Stephen Martin

    In trying to understand the gamma distribution, interactive plots are really useful.

    Shiny lets you make fully interactive data reports in the browser.
    ggvis lets you make plots interactive using javascript, but is its own plotting paradigm (similar to ggplot2).

    One more is manipulate, which requires rstudio as far as I know.
    But in rstudio, you can run:

    library(manipulate)
    library(ggplot2)
    manipulate({qplot(x=x,y=dgamma(x,shape=s,rate=r),geom='line') + scale_y_continuous(limit=c(0,1)) + scale_x_continuous(limit=c(0,100)) + theme_classic()},s=slider(0,100,step=.1),r=slider(0,100,step=.1))

    At some point, I am going to work on shiny apps for visualizing distributions. I had a few, and even had a power analysis tool, but I just recreated my own webpage and those are not yet running.

Leave a Reply