# Prior sensitivity in BAMM

By Brian R. Moore, Sebastian Höhna, Michael R. May, Bruce Rannala, and John P. Huelsenbeck

## Synopsis

BAMM (Rabosky, 2014) is a popular Bayesian method intended to estimate: (1) the number and location of diversification-rate shifts across branches of a phylogeny, and; (2) the diversification-rate parameters for every branch of the study tree. As a Bayesian method, BAMM seeks to estimate the joint posterior probability density (the ‘posterior’) of the model parameters, which reflects our belief in the parameter values after evaluating the data at hand. The posterior is an updated version of the joint prior probability density (the ‘prior’) of the model parameters, which reflects our belief in the parameter values before evaluating the data at hand. In this post, we elaborate on our recent discovery (Moore et al., 2016) that BAMM exhibits prior sensitivity; i.e., where the inferred posterior is strongly influenced by the assumed prior.

To this end, we first discuss why prior sensitivity is a theoretical concern for BAMM. We then describe our efforts to explore the prior sensitivity of BAMM by means of simulation. We present results demonstrating that posterior estimates of the number of diversification-rate shifts are sensitive to the assumed prior on the number of diversification-rate shifts. Next, we describe how the original paper (Rabosky, 2014) concealed the strong prior sensitivity of BAMM. Finally, we describe how errors in a recent presponse to our paper (Rabosky, 2016) masked the prior sensitivity of BAMM. Importantly, both the results of our study (Moore et al., 2016) and those described in the presponse to our paper (Rabosky, 2016) used the current version of BAMM (v.2.5). However, the results of Rabosky (2016) masked the prior sensitivity of BAMM because those analyses used an option in BAMM v.2.5 that incorrectly computes extinction probabilities.

## 1. How did we reach the conclusion that BAMM exhibits prior sensitivity?

BAMM provides estimates of the number and location of diversification-rate shifts across the branches of a study tree, and also estimates the diversification-rate parameters—speciation, extinction, and time-dependence parameters—for every branch of the phylogeny. BAMM is implemented in a Bayesian statistical framework, which seeks to estimate the posterior probability distribution of the model parameters.
From Bayes theorem:

$\underbrace{\text{Probability}(\text{Parameter} \mid \text{Data})}_\text{posterior probability} \propto \underbrace{\text{Probability}(\text{Data} \mid \text{Parameter})}_\text {likelihood function} \times \underbrace{\text{Probability}(\text{Parameter})}_\text {prior probability}$

The posterior probability reflects our belief regarding the parameter values after evaluating the data at hand; it is an updated version of the prior probability, which reflects our belief about the parameter values before evaluating the data at hand. The likelihood function is the vehicle that extracts the information in the data to transform our prior beliefs into our posterior beliefs about the parameter values. BAMM uses the following priors:

• an exponential prior for the speciation rate
• an exponential prior for the extinction rate
• a normal prior for the time-dependence parameter
• a Compound Poisson Process (CPP) prior model on diversification-rate shifts

### 1.1. Why is prior sensitivity a concern for BAMM?

The CPP prior model describes the prior distribution of the number and location of diversification-rate shifts across branches of the tree. This is a concern because the CPP model is often non-identifiable: that is, the likelihood function returns an identical value for the observed data under an infinite number of parameter combinations (Rannala, 2002). Specifically, under the CPP, the observed data may be equally likely to be the outcome of relatively infrequent shifts of large magnitude, or relatively frequent shifts of small magnitude. If the likelihood function cannot identify (distinguish between) parameter values based on the data, the posterior probability distribution will closely resemble the assumed prior distribution. This is the issue of prior sensitivity: our posterior estimates of parameters (e.g., about the number and location of diversification-rate shifts) after looking at the data will essentially be identical to whatever we assumed about these parameters before looking at the data.

### 1.2. How we assessed the prior sensitivity of BAMM

It is straightforward to assess whether BAMM exhibits prior sensitivity: we simply need to compare the posterior distribution for the estimated number of diversification-rate shifts inferred under a variety of assumed prior distributions for the expected number of diversification-rate shifts. If BAMM is sensitive to the assumed prior, then the estimated posterior distribution will depend strongly on the assumed prior distribution (i.e., the estimated posterior distribution for the number of diversification-rate shifts will change to reflect corresponding changes in the assumed prior distribution for the number of diversification-rate shifts).

We explored the prior sensitivity of BAMM by simulating 100 trees under a constant-rate birth-death process (where the true number of diversification-rate shifts is zero). To ensure that these simulated trees were empirically realistic, we first estimated the speciation and extinction rates for the cetacean phylogeny (Steeman 2009; which was analyzed by Rabosky, 2014) under a constant-rate birth-death model using RevBayes (Höhna et al. 2016). We then simulated each tree under a constant-rate birth-death process by sampling values for the speciation and extinction rates from the corresponding posterior distributions estimated for the cetacean tree, where the age of each simulated tree is the same as that of the cetacean tree.

We analyzed each simulated tree using BAMM (v.2.5), centering the speciation- and extinction-rate prior distributions on the true value for the simulated phylogeny, and under a range of prior values on the expected number of diversification-rate shifts ($\gamma = 0.1, 0.5, 1, 2, 10$, where the expected number of rate shifts is $1 / \gamma$). For each prior value, we plotted the prior distribution and the posterior distribution for the number of diversification processes (a tree with a single process has zero diversification-rate shifts) averaged over the 100 trees (Figure 1).

We note that our simulation study exploring the prior sensitivity of BAMM represents a best case scenario: (1) a constant-rate birth-death process is the simplest model described by BAMM, which minimizes the number of parameters that must be estimated from a given amount of data, and; (2) we have centered the priors for the diversification-rate parameters (speciation and extinction rate) on their true values.

Figure 1. Prior sensitivity of BAMM.
We simulated 100 constant-rate birth-death trees similar to the cetacean phylogeny (Steeman 2009). We analyzed each tree under a range of prior values for the expected number of diversification-rate shifts, (columns; $\gamma$ = 0.1, 0.5, 1, 2, 10). Each plot summarizes the prior distribution (grey lines) and the corresponding posterior distribution (blue lines) for the number of diversification processes. Over the entire range of parameter space explored, the estimated posterior distribution is dominated by the assumed prior distribution Note that the mode of the prior distributions over the entire parameter space is always at zero (i.e., where the tree has a single process with zero diversification-rate shifts).

## 2. Why do our conclusions contradict Rabosky (2014)?

The results of our simulation study described above indicate that estimates of the number of diversification-rate shifts using BAMM are sensitive to the assumed prior. This conclusion contradicts the finding presented in the original study: Rabosky (2014) performed a simulation study that apparently demonstrated that BAMM is not overly sensitive to the prior. These contradictory conclusions are not due to differences in the two simulation studies; in fact, the design of the two simulation studies is very similar.

Similar to our study, Rabosky simulated 500 constant-rate trees (where the true number of diversification-rate shifts is zero). He then analyzed each tree with BAMM under a range of priors values for the expected number of diversification-rate shifts; $\gamma = 1, 5, 10$. For each tree, he recorded the mode of the posterior distribution for the number of diversification-rate shifts: the mode of the posterior is also known as the maximum a posteriori (or MAP) estimate. He repeated this process for each of the 500 simulated trees, recording a set of 500 MAP estimates for a given prior value, e.g., $\gamma = 1$. He then summarized the corresponding set of 500 MAP estimates as a histogram (a frequency distribution) for each prior value. Rabosky then repeated this procedure to create histograms for each of the three prior values for the expected number of diversification-rate shifts. Finally, he noted that the histograms of the posterior modes for the estimated number of diversification-rate shifts looked similar for all three values of the prior. Accordingly, Rabosky concluded that posterior estimates of the number of diversification-rate shifts inferred using BAMM are insensitive to the prior assumptions about the expected number of diversification-rate shifts. The results from his study (and our replicated results summarized in the same way) are depicted in Figure 2.

Figure 2. Prior sensitivity of BAMM à la mode.
The top row of panels are reproduced from Rabosky (2014); the lower row of panels are identical summaries of identical simulations replicated in our study. Rabosky (2014) explored the prior sensitivity of BAMM by means of a simulation study that is identical to our simulation study (upper row reproduced from Rabosky, 2014). Unfortunately, Rabosky chose to summarize the results of his simulation by creating a histogram of the posterior modes (MAP) for the set of 500 trees analyzed under each value of the $\gamma$ prior. Obviously, the most frequent MAP will have a value of one (simply because the prior distribution for all values of $\gamma$ always has a mode of one, and the estimated posterior distribution closely follows the assumed prior; c.f., Figure 1). Noting that the histograms of the MAPs are similar over the range of $\gamma$ priors, Rabosky (2014) concluded that BAMM is not sensitive to the prior. However, this erroneous conclusion is based on an error in summarizing the results that makes it impossible to detect the extreme prior sensitivity of BAMM.

As you can see in Figure 2, using the mode (MAP) to summarize the estimated posterior distribution on the number of events is an unfortunate choice if we are interested in assessing the prior sensitivity of BAMM. It is an error because it makes it impossible to assess whether the posterior is influenced by the prior. This is simply because the prior distribution for the number of diversification processes has a mode of one for all values of the $\gamma$ prior (see Moore et al., 2016 SI1.3 Joint prior distribution). Therefore, even if the estimated posterior was virtually identical to the assumed prior (which is the case here), the posterior will always have mode of one for constant-rate trees. Therefore, if we made a histogram of the posterior modes inferred for each tree, the most frequent value would always be one, regardless of the $\gamma$ prior. It cannot be otherwise.

In fact, this is exactly what we observe when we summarize the results of our simulation using the same protocol as that used by Rabosky (see the lower row of panels in Figure 2, above). The most frequent value for the posterior mode (MAP) is always one, because the prior mode is always one, and the estimated posterior always resembles the assumed prior. Accordingly, the MAP is incapable of detecting prior sensitivity on constant-rate trees; the histogram of the MAPs will always be similar regardless of the $\gamma$ prior on constant-rate trees, so the similarity of the MAP histograms for the various priors does not provide evidence that BAMM is insensitive to the choice of $\gamma$ prior. Rabosky’s (2014) conclusion that BAMM is insensitive to the prior is therefore misleading. Simple inspection of the posterior and prior distributions makes it immediately obvious that the number of diversification-rate shifts estimated by BAMM is extremely sensitive to the assumed prior.

There is a second curious aspect of the results presented by Rabosky (2014). The histograms of the MAPs presented in Rabosky (2014; Figure 2) start to move toward the right as the prior favors more diversification-rate shifts. Specifically, the frequency of the MAPs is concentrated on one process when $\gamma = 1$, is split between 1 and 2 processes when $\gamma = 5$, and is concentrated on 2 processes when $\gamma = 10$ (see Figure 2, top row). It is as though the posterior estimates of the number of processes are increasing slightly as the prior increasingly favors more diversification-rate shifts. Rabosky describes this observation as follows: “With increasing values of $\gamma$, the model with maximum a posteriori probability (MAP) was biased in favor of M1, a model with two processes.” Accordingly, these results create the impression that BAMM is only slightly sensitive to the assumed prior on the number of diversification-rate shifts. However, as we have shown, these results cannot be true. The prior mode for all values of $\gamma$ is one (see Figure 1 and Figure 2, bottom row). Therefore, there is no reason for the frequency distribution of the MAPs to shift rightward (from 1 to 2 processes as $\gamma$ ranges from 1 to 10) because the true number of processes for these constant-rate trees is always one. We are unable to reproduce these results.

## 3. How did Rabosky (2016) conclude that BAMM is not sensitive to the prior (again)?

Recently, Rabosky reported on the BAMM website that he was unable to reproduce our results showing prior sensitivity for the estimated number of diversification-rate shifts using the latest version of BAMM (v.2.5). In this section, we clarify the source of this discrepancy. Specifically, Rabosky’s results differ from our published results because he used an option (combineExtinctionAtNodes = if_different, rather than combineExtinctionAtNodes = random as we used) implemented in BAMM (v.2.5) that causes the estimated number of diversification-rate shifts to artifactually depart from the prior. This new option is essentially an implementation error (i.e., a “software bug”) because it derives from a miscalculation of extinction probabilities. Clearly, any effort to evaluate the statistical behavior of a given method should avoid the confounding effects of any errors in the implementation of that method. In other words, we want to understand whether the there are problems with the method itself, not whether there are problems with the implementation of the method. Accordingly, the analyses that we present in our PNAS paper did not use this invalid option.

### 3.1. Background on extinction probabilities

Birth-death processes (with non-zero extinction rates) give rise to lineages that may go extinct before the present. Accordingly, if we want to compute the probability of a phylogeny generated by a birth-death process, we need to accommodate the possibility of unobserved speciation events (where one of the daughter lineages has gone extinct before the present). Naturally, this requires the ability to compute the probability that a given lineage goes extinct, a quantity referred to as the extinction probability. The need to compute the extinction probability is by no means unique to BAMM: Kendall (1948) describes computing the “chances of extinction” for general birth-death processes, Nee et al. (1994) describe the probability of a phylogeny given that it survived (1 – the probability that it went extinct), etc. Perhaps of most direct relevance to BAMM, Maddison et al. (2007) explicitly define the extinction probability under the BiSSE model (from which BAMM draws heavily). Specifically, Maddison et al. (2007) define the extinction probability as E(t); the probability that a lineage that exists at some time in the past, t, goes extinct before the present time. We wrote a simple program, available here, that performs forward simulation under the described birth-death process to demonstrate what the extinction probability represents.

Under the birth-death process model (as also used in BAMM), the fate of a given lineage is independent of other contemporaneous lineages; Maddison et al. (2007) note, “the extinction probabilities do not depend on tree structure of the surviving lineages, only on time“. Indeed, all lineages in the same diversification-rate category at time t must have the same extinction probability, regardless of the shape of the tree and the presence of diversification-rate shifts elsewhere in the tree (Figure 3)!

Figure 3. Extinction probabilities are independent of the tree and diversification-rate shifts on observed lineages.
Let’s try to imagine the birth-death process that produced this observed tree. At time t1 we observe a speciation event resulting in node N. The speciation event produces a left and right daughter lineage. To compute the probability of observing this speciation event, we need to compute the probability of a speciation event at time t1 and the probability that both daughter lineages survive. Clearly, the survival probabilities (i.e., 1 – extinction probability) of both daughter lineages must be equivalent because both daughter lineages start in the blue rate category. At time t1 we do not know the fate of the daughter lineages (e.g., if diversification-rate shifts will occur on either lineage); thus, if we were to simulate a process for both daughter lineages, they would both have the same probability of ultimate extinction.

### 3.2. The source of the bug

BAMM computes extinction probabilities using a recursive algorithm that begins at the tips of the tree and computes the change in the extinction probability down each branch using a set of differential equations (see Moore et al., 2016; SI1.2 Likelihood function). A question arises when BAMM reaches an internal node: what should the initial extinction probability be for the branch immediately ancestral to the speciation event? Here is where the problem arises: BAMM (v.2.5) enables an option (combineExtinctionAtNodes = if_different) that compares the extinction probabilities for each of the descendant branches at time t; if these probabilities are different, BAMM takes the product of the extinction probabilities as the initial extinction probability for the immediately ancestral branch; otherwise (if the extinction probabilities are the same), BAMM uses either one of the extinction probabilities as the initial extinction probability (note: no product is taken). These extinction probabilities for internal nodes are then carried down their ancestral branches, continuing all the way to the root of the tree. The description of how BAMM handles extinction probabilities directly follows from the BAMM website which also resembles the current version of the code.

We emphasize that this option is not mathematically valid: the extinction probability for a lineage arising right before a node certainly does not depend in any sensible way on the product of extinction probabilities of two lineages that arise immediately after that node (c.f., Maddison et al., 2007, and Figure 3, above), especially only when the extinction probabilities for those two lineages are different! (e.g., why take the product in some situations and not others?). Taking the product of these extinction probabilities violates the statistical principles underpinning model-based inference. Thus, all benefits normally associated with model-based approaches (consistency, efficiency) do not apply. We note that the correct procedure, which is not implemented in BAMM, is to take the extinction probability from the lineage in the same diversification process as the ancestral node, since the extinction probability for a given diversification-rate category depends solely on time.

### 3.3. The consequences of the bug

Taking products of extinction probabilities (which are always smaller than 1) results in increasingly smaller extinction probabilities. Since BAMM only takes the product of these extinction probabilities when they are different, and these probabilities will only be different when there are rate shifts, this incorrect procedure imposes an unpredictable change to the probabilities of models with diversification-rate shifts. Thus, the new option if_different in BAMM (v.2.5) implicitly penalizes rate-shifts more heavily. For example, consider a model that has no rate shifts and the extinction probabilities for both lineages descending from a node are 0.2. In this case the extinction probability at the ancestral node should be 0.2 because the extinction probabilities for either lineage are equal. Now imagine that there is a rate shift on one of the daughter lineages so that the extinction probability for that lineage is now of 0.15. Under the new option in BAMM, we would multiply the extinction probabilities to compute the extinction probability at the ancestral node, which would be approximately 0.03 in this case (which is clearly different from 0.2).

To demonstrate the impact of this bug, we compare the results of analyses with (orange) and without (blue) the option on our constant-rate birth-death simulated trees below (Figure 4). As we describe in our PNAS paper, we centered the diversification-rate priors on their true values; therefore, these results represent a best-case scenario for inferring the correct number of diversification-rate shifts.

Figure 4. Comparison of combineExtinctionAtNodes options and the prior distribution on the number of rate shifts for simulated constant-rate trees.
BAMM (v.2.5) exhibits strong prior sensitivity under the “random” option which randomly picks the extinction probability of the left or right daughter lineages. This option corresponds to the behavior of earlier versions of BAMM and that used for our analyses in Moore et al. (2016). The prior sensitivity using the if_different option—which is theoretically invalid as we describe above—is artificually decreased.

This phenomenon is not restricted to simulated phylogenies; we analyzed the cetacean phylogeny with and without the if_different option and observe the same general pattern (Figure 5).

Figure 5. Comparison of combineExtinctionAtNodes options and the prior distribution on the number of rate shifts for the cetacean phylogeny.
These results correspond to our published results (blue lines; Moore et al., 2016; SI4.2.3 Cetaceans) and the results on the BAMM website (orange lines). As for the simulated constant-rate trees, the posterior number of diversification-rate shifts is more sensitive to the prior when the “random” option is used. Here we clearly see that the combineExtinctionAtNodes option has an impact on the posterior distribution of the number of diversification-rate shifts for empirical datasets.

We note that the results for our if_different (orange lines above) are effectively identical to those reported by Rabosky (BAMM website), while our random analyses (blue lines) correspond to results presented in our PNAS paper.

## Summary

In this post, we have described the analyses and main results presented in Moore et al. (2016) demonstrating that the posterior distribution of the number of diversification-rate shifts inferred by BAMM is extremely sensitive to the assumed prior distribution on the number of diversification-rate shifts. We demonstrate how Rabosky’s (2014) conclusion that BAMM is prior insensitive is an artifact of using the posterior mode (MAP) to summarize the inferred number of diversification-rate shifts. We clarify that our results (Moore et al., 2016) are based on the most recent version of BAMM (v.2.5) but avoided use of the combineExtinctionAtNodes = if_different option because we view this as a software bug, which would not provide a fair evaluation of the behavior of BAMM. This option for computing extinction probabilities at nodes (taking the product if the probabilities are different) is theoretically invalid, leads to pathological behavior, and masks prior sensitivity of BAMM. Furthermore, we show that the use of the if_different option caused the discrepancy between our results (Moore et al., 2016) and a recent blogpost by Rabosky (2016), which concealed the strong prior sensitivity of BAMM. Although the details are somewhat technical, we believe that it is important to our community to understand the limits of existing methods for inferring diversification-rate shifts, and we hope that you will join the discussion via the comments section below. In a subsequent post we plan to elaborate on the issue of diversification-rate parameter estimates under BAMM.

## Data availability

All of the phylogenetic data and BAMM config files used for this blog post are publicly available as a BitBucket repository.

## References

Höhna, S, MJ Landis, TA Heath, B Boussau, N Lartillot, BR Moore, JP Huelsenbeck, and F Ronquist. 2016. RevBayes: Bayesian Phylogenetic Inference Using Graphical Models and an Interactive Model Specification Language. Systematic Biology, 65:726–736.

Huelsenbeck JP, B Larget, and DL Swofford. 2000. A compound Poisson process for relaxing the molecular clock. Genetics, 154:1879–1892.

Kendall, DG. 1948. On the generalized “birth-and-death” process. The Annals of Mathematical Statistics, 19:1–15.

Maddison W, P Midford, and S Otto. 2007. Estimating a binary character’s effect on speciation and extinction. Systematic Biology, 56:701.

Moore, BR, S Höhna, MR May, B Rannala, and JP Huelsenbeck. 2016. Critically evaluating the theory and performance of Bayesian analysis of macroevolutionary mixtures. Proceedings of the National Academy of Sciences, USA, (in press).

Nee, S, RM May, and PH Harvey. 1994. The reconstructed evolutionary process. Philosophical Transactions: Biological Sciences, 344:305–311.

Rabosky DL. 2014. Automatic detection of key innovations, rate shifts, and diversity-dependence on phylogenetic trees. PLoS One, 9:e89543.

Rabosky DL. 2016. Is the posterior on the number of shifts overly sensitive to the prior? http://bamm-project.org/prior.html

Rabosky, DL, M Grundler, C Anderson, JJ Shi, JW Brown, H Huang, JG Larson, et al. 2014. BAMMtools: an R package for the analysis of evolutionary dynamics on phylogenetic trees. Methods in Ecology and Evolution, 5:701–707.

Rannala B. 2002. Identifiability of parameters in MCMC Bayesian inference of phylogeny. Systematic Biology, 51:754–760.

Steeman, ME, MB Hebsgaard, RE Fordyce, SY Ho, DL Rabosky, R Nielsen, C Rahbek, H Glenner, MV Sørensen, and E Willerslev. 2009. Radiation of extant cetaceans driven by restructuring of the oceans. Systematic Biology, 58:573–585.

## One thought on “Prior sensitivity in BAMM”

1. Brian Moore Post author