Prior sensitivity in BAMM
By Brian R. Moore, Sebastian Höhna, Michael R. May, Bruce Rannala, and John P. Huelsenbeck
BAMM (Rabosky, 2014) is a popular Bayesian method intended to estimate: (1) the number and location of diversification-rate shifts across branches of a phylogeny, and; (2) the diversification-rate parameters for every branch of the study tree. As a Bayesian method, BAMM seeks to estimate the joint posterior probability density (the ‘posterior’) of the model parameters, which reflects our belief in the parameter values after evaluating the data at hand. The posterior is an updated version of the joint prior probability density (the ‘prior’) of the model parameters, which reflects our belief in the parameter values before evaluating the data at hand. In this post, we elaborate on our recent discovery (Moore et al., 2016) that BAMM exhibits prior sensitivity; i.e., where the inferred posterior is strongly influenced by the assumed prior.
To this end, we first discuss why prior sensitivity is a theoretical concern for BAMM. We then describe our efforts to explore the prior sensitivity of BAMM by means of simulation. We present results demonstrating that posterior estimates of the number of diversification-rate shifts are sensitive to the assumed prior on the number of diversification-rate shifts. Next, we describe how the original paper (Rabosky, 2014) concealed the strong prior sensitivity of BAMM. Finally, we describe how errors in a recent presponse to our paper (Rabosky, 2016) masked the prior sensitivity of BAMM. Importantly, both the results of our study (Moore et al., 2016) and those described in the presponse to our paper (Rabosky, 2016) used the current version of BAMM (v.2.5). However, the results of Rabosky (2016) masked the prior sensitivity of BAMM because those analyses used an option in BAMM v.2.5 that incorrectly computes extinction probabilities.
1. How did we reach the conclusion that BAMM exhibits prior sensitivity?
BAMM provides estimates of the number and location of diversification-rate shifts across the branches of a study tree, and also estimates the diversification-rate parameters—speciation, extinction, and time-dependence parameters—for every branch of the phylogeny. BAMM is implemented in a Bayesian statistical framework, which seeks to estimate the posterior probability distribution of the model parameters.
From Bayes theorem:
The posterior probability reflects our belief regarding the parameter values after evaluating the data at hand; it is an updated version of the prior probability, which reflects our belief about the parameter values before evaluating the data at hand. The likelihood function is the vehicle that extracts the information in the data to transform our prior beliefs into our posterior beliefs about the parameter values. BAMM uses the following priors:
- an exponential prior for the speciation rate
- an exponential prior for the extinction rate
- a normal prior for the time-dependence parameter
- a Compound Poisson Process (CPP) prior model on diversification-rate shifts
1.1. Why is prior sensitivity a concern for BAMM?
The CPP prior model describes the prior distribution of the number and location of diversification-rate shifts across branches of the tree. This is a concern because the CPP model is often non-identifiable: that is, the likelihood function returns an identical value for the observed data under an infinite number of parameter combinations (Rannala, 2002). Specifically, under the CPP, the observed data may be equally likely to be the outcome of relatively infrequent shifts of large magnitude, or relatively frequent shifts of small magnitude. If the likelihood function cannot identify (distinguish between) parameter values based on the data, the posterior probability distribution will closely resemble the assumed prior distribution. This is the issue of prior sensitivity: our posterior estimates of parameters (e.g., about the number and location of diversification-rate shifts) after looking at the data will essentially be identical to whatever we assumed about these parameters before looking at the data.
1.2. How we assessed the prior sensitivity of BAMM
It is straightforward to assess whether BAMM exhibits prior sensitivity: we simply need to compare the posterior distribution for the estimated number of diversification-rate shifts inferred under a variety of assumed prior distributions for the expected number of diversification-rate shifts. If BAMM is sensitive to the assumed prior, then the estimated posterior distribution will depend strongly on the assumed prior distribution (i.e., the estimated posterior distribution for the number of diversification-rate shifts will change to reflect corresponding changes in the assumed prior distribution for the number of diversification-rate shifts).
We explored the prior sensitivity of BAMM by simulating 100 trees under a constant-rate birth-death process (where the true number of diversification-rate shifts is zero). To ensure that these simulated trees were empirically realistic, we first estimated the speciation and extinction rates for the cetacean phylogeny (Steeman 2009; which was analyzed by Rabosky, 2014) under a constant-rate birth-death model using RevBayes (Höhna et al. 2016). We then simulated each tree under a constant-rate birth-death process by sampling values for the speciation and extinction rates from the corresponding posterior distributions estimated for the cetacean tree, where the age of each simulated tree is the same as that of the cetacean tree.
We analyzed each simulated tree using BAMM (v.2.5), centering the speciation- and extinction-rate prior distributions on the true value for the simulated phylogeny, and under a range of prior values on the expected number of diversification-rate shifts (, where the expected number of rate shifts is ). For each prior value, we plotted the prior distribution and the posterior distribution for the number of diversification processes (a tree with a single process has zero diversification-rate shifts) averaged over the 100 trees (Figure 1).
We note that our simulation study exploring the prior sensitivity of BAMM represents a best case scenario: (1) a constant-rate birth-death process is the simplest model described by BAMM, which minimizes the number of parameters that must be estimated from a given amount of data, and; (2) we have centered the priors for the diversification-rate parameters (speciation and extinction rate) on their true values.
2. Why do our conclusions contradict Rabosky (2014)?
The results of our simulation study described above indicate that estimates of the number of diversification-rate shifts using BAMM are sensitive to the assumed prior. This conclusion contradicts the finding presented in the original study: Rabosky (2014) performed a simulation study that apparently demonstrated that BAMM is not overly sensitive to the prior. These contradictory conclusions are not due to differences in the two simulation studies; in fact, the design of the two simulation studies is very similar.
Similar to our study, Rabosky simulated 500 constant-rate trees (where the true number of diversification-rate shifts is zero). He then analyzed each tree with BAMM under a range of priors values for the expected number of diversification-rate shifts; . For each tree, he recorded the mode of the posterior distribution for the number of diversification-rate shifts: the mode of the posterior is also known as the maximum a posteriori (or MAP) estimate. He repeated this process for each of the 500 simulated trees, recording a set of 500 MAP estimates for a given prior value, e.g., . He then summarized the corresponding set of 500 MAP estimates as a histogram (a frequency distribution) for each prior value. Rabosky then repeated this procedure to create histograms for each of the three prior values for the expected number of diversification-rate shifts. Finally, he noted that the histograms of the posterior modes for the estimated number of diversification-rate shifts looked similar for all three values of the prior. Accordingly, Rabosky concluded that posterior estimates of the number of diversification-rate shifts inferred using BAMM are insensitive to the prior assumptions about the expected number of diversification-rate shifts. The results from his study (and our replicated results summarized in the same way) are depicted in Figure 2.
As you can see in Figure 2, using the mode (MAP) to summarize the estimated posterior distribution on the number of events is an unfortunate choice if we are interested in assessing the prior sensitivity of BAMM. It is an error because it makes it impossible to assess whether the posterior is influenced by the prior. This is simply because the prior distribution for the number of diversification processes has a mode of one for all values of the prior (see Moore et al., 2016 SI1.3 Joint prior distribution). Therefore, even if the estimated posterior was virtually identical to the assumed prior (which is the case here), the posterior will always have mode of one for constant-rate trees. Therefore, if we made a histogram of the posterior modes inferred for each tree, the most frequent value would always be one, regardless of the prior. It cannot be otherwise.
In fact, this is exactly what we observe when we summarize the results of our simulation using the same protocol as that used by Rabosky (see the lower row of panels in Figure 2, above). The most frequent value for the posterior mode (MAP) is always one, because the prior mode is always one, and the estimated posterior always resembles the assumed prior. Accordingly, the MAP is incapable of detecting prior sensitivity on constant-rate trees; the histogram of the MAPs will always be similar regardless of the prior on constant-rate trees, so the similarity of the MAP histograms for the various priors does not provide evidence that BAMM is insensitive to the choice of prior. Rabosky’s (2014) conclusion that BAMM is insensitive to the prior is therefore misleading. Simple inspection of the posterior and prior distributions makes it immediately obvious that the number of diversification-rate shifts estimated by BAMM is extremely sensitive to the assumed prior.
There is a second curious aspect of the results presented by Rabosky (2014). The histograms of the MAPs presented in Rabosky (2014; Figure 2) start to move toward the right as the prior favors more diversification-rate shifts. Specifically, the frequency of the MAPs is concentrated on one process when , is split between 1 and 2 processes when , and is concentrated on 2 processes when (see Figure 2, top row). It is as though the posterior estimates of the number of processes are increasing slightly as the prior increasingly favors more diversification-rate shifts. Rabosky describes this observation as follows: “With increasing values of , the model with maximum a posteriori probability (MAP) was biased in favor of M1, a model with two processes.” Accordingly, these results create the impression that BAMM is only slightly sensitive to the assumed prior on the number of diversification-rate shifts. However, as we have shown, these results cannot be true. The prior mode for all values of is one (see Figure 1 and Figure 2, bottom row). Therefore, there is no reason for the frequency distribution of the MAPs to shift rightward (from 1 to 2 processes as ranges from 1 to 10) because the true number of processes for these constant-rate trees is always one. We are unable to reproduce these results.
3. How did Rabosky (2016) conclude that BAMM is not sensitive to the prior (again)?
Recently, Rabosky reported on the BAMM website that he was unable to reproduce our results showing prior sensitivity for the estimated number of diversification-rate shifts using the latest version of BAMM (v.2.5). In this section, we clarify the source of this discrepancy. Specifically, Rabosky’s results differ from our published results because he used an option (combineExtinctionAtNodes = if_different, rather than combineExtinctionAtNodes = random as we used) implemented in BAMM (v.2.5) that causes the estimated number of diversification-rate shifts to artifactually depart from the prior. This new option is essentially an implementation error (i.e., a “software bug”) because it derives from a miscalculation of extinction probabilities. Clearly, any effort to evaluate the statistical behavior of a given method should avoid the confounding effects of any errors in the implementation of that method. In other words, we want to understand whether the there are problems with the method itself, not whether there are problems with the implementation of the method. Accordingly, the analyses that we present in our PNAS paper did not use this invalid option.
3.1. Background on extinction probabilities
Birth-death processes (with non-zero extinction rates) give rise to lineages that may go extinct before the present. Accordingly, if we want to compute the probability of a phylogeny generated by a birth-death process, we need to accommodate the possibility of unobserved speciation events (where one of the daughter lineages has gone extinct before the present). Naturally, this requires the ability to compute the probability that a given lineage goes extinct, a quantity referred to as the extinction probability. The need to compute the extinction probability is by no means unique to BAMM: Kendall (1948) describes computing the “chances of extinction” for general birth-death processes, Nee et al. (1994) describe the probability of a phylogeny given that it survived (1 – the probability that it went extinct), etc. Perhaps of most direct relevance to BAMM, Maddison et al. (2007) explicitly define the extinction probability under the BiSSE model (from which BAMM draws heavily). Specifically, Maddison et al. (2007) define the extinction probability as E(t); the probability that a lineage that exists at some time in the past, t, goes extinct before the present time. We wrote a simple program, available here, that performs forward simulation under the described birth-death process to demonstrate what the extinction probability represents.
Under the birth-death process model (as also used in BAMM), the fate of a given lineage is independent of other contemporaneous lineages; Maddison et al. (2007) note, “the extinction probabilities do not depend on tree structure of the surviving lineages, only on time“. Indeed, all lineages in the same diversification-rate category at time t must have the same extinction probability, regardless of the shape of the tree and the presence of diversification-rate shifts elsewhere in the tree (Figure 3)!
3.2. The source of the bug
BAMM computes extinction probabilities using a recursive algorithm that begins at the tips of the tree and computes the change in the extinction probability down each branch using a set of differential equations (see Moore et al., 2016; SI1.2 Likelihood function). A question arises when BAMM reaches an internal node: what should the initial extinction probability be for the branch immediately ancestral to the speciation event? Here is where the problem arises: BAMM (v.2.5) enables an option (combineExtinctionAtNodes = if_different) that compares the extinction probabilities for each of the descendant branches at time t; if these probabilities are different, BAMM takes the product of the extinction probabilities as the initial extinction probability for the immediately ancestral branch; otherwise (if the extinction probabilities are the same), BAMM uses either one of the extinction probabilities as the initial extinction probability (note: no product is taken). These extinction probabilities for internal nodes are then carried down their ancestral branches, continuing all the way to the root of the tree. The description of how BAMM handles extinction probabilities directly follows from the BAMM website which also resembles the current version of the code.
We emphasize that this option is not mathematically valid: the extinction probability for a lineage arising right before a node certainly does not depend in any sensible way on the product of extinction probabilities of two lineages that arise immediately after that node (c.f., Maddison et al., 2007, and Figure 3, above), especially only when the extinction probabilities for those two lineages are different! (e.g., why take the product in some situations and not others?). Taking the product of these extinction probabilities violates the statistical principles underpinning model-based inference. Thus, all benefits normally associated with model-based approaches (consistency, efficiency) do not apply. We note that the correct procedure, which is not implemented in BAMM, is to take the extinction probability from the lineage in the same diversification process as the ancestral node, since the extinction probability for a given diversification-rate category depends solely on time.
3.3. The consequences of the bug
Taking products of extinction probabilities (which are always smaller than 1) results in increasingly smaller extinction probabilities. Since BAMM only takes the product of these extinction probabilities when they are different, and these probabilities will only be different when there are rate shifts, this incorrect procedure imposes an unpredictable change to the probabilities of models with diversification-rate shifts. Thus, the new option if_different in BAMM (v.2.5) implicitly penalizes rate-shifts more heavily. For example, consider a model that has no rate shifts and the extinction probabilities for both lineages descending from a node are 0.2. In this case the extinction probability at the ancestral node should be 0.2 because the extinction probabilities for either lineage are equal. Now imagine that there is a rate shift on one of the daughter lineages so that the extinction probability for that lineage is now of 0.15. Under the new option in BAMM, we would multiply the extinction probabilities to compute the extinction probability at the ancestral node, which would be approximately 0.03 in this case (which is clearly different from 0.2).
To demonstrate the impact of this bug, we compare the results of analyses with (orange) and without (blue) the option on our constant-rate birth-death simulated trees below (Figure 4). As we describe in our PNAS paper, we centered the diversification-rate priors on their true values; therefore, these results represent a best-case scenario for inferring the correct number of diversification-rate shifts.
This phenomenon is not restricted to simulated phylogenies; we analyzed the cetacean phylogeny with and without the if_different option and observe the same general pattern (Figure 5).
We note that the results for our if_different (orange lines above) are effectively identical to those reported by Rabosky (BAMM website), while our random analyses (blue lines) correspond to results presented in our PNAS paper.
In this post, we have described the analyses and main results presented in Moore et al. (2016) demonstrating that the posterior distribution of the number of diversification-rate shifts inferred by BAMM is extremely sensitive to the assumed prior distribution on the number of diversification-rate shifts. We demonstrate how Rabosky’s (2014) conclusion that BAMM is prior insensitive is an artifact of using the posterior mode (MAP) to summarize the inferred number of diversification-rate shifts. We clarify that our results (Moore et al., 2016) are based on the most recent version of BAMM (v.2.5) but avoided use of the combineExtinctionAtNodes = if_different option because we view this as a software bug, which would not provide a fair evaluation of the behavior of BAMM. This option for computing extinction probabilities at nodes (taking the product if the probabilities are different) is theoretically invalid, leads to pathological behavior, and masks prior sensitivity of BAMM. Furthermore, we show that the use of the if_different option caused the discrepancy between our results (Moore et al., 2016) and a recent blogpost by Rabosky (2016), which concealed the strong prior sensitivity of BAMM. Although the details are somewhat technical, we believe that it is important to our community to understand the limits of existing methods for inferring diversification-rate shifts, and we hope that you will join the discussion via the comments section below. In a subsequent post we plan to elaborate on the issue of diversification-rate parameter estimates under BAMM.
All of the phylogenetic data and BAMM config files used for this blog post are publicly available as a BitBucket repository.
Höhna, S, MJ Landis, TA Heath, B Boussau, N Lartillot, BR Moore, JP Huelsenbeck, and F Ronquist. 2016. RevBayes: Bayesian Phylogenetic Inference Using Graphical Models and an Interactive Model Specification Language. Systematic Biology, 65:726–736.
Huelsenbeck JP, B Larget, and DL Swofford. 2000. A compound Poisson process for relaxing the molecular clock. Genetics, 154:1879–1892.
Kendall, DG. 1948. On the generalized “birth-and-death” process. The Annals of Mathematical Statistics, 19:1–15.
Maddison W, P Midford, and S Otto. 2007. Estimating a binary character’s effect on speciation and extinction. Systematic Biology, 56:701.
Moore, BR, S Höhna, MR May, B Rannala, and JP Huelsenbeck. 2016. Critically evaluating the theory and performance of Bayesian analysis of macroevolutionary mixtures. Proceedings of the National Academy of Sciences, USA, (in press).
Nee, S, RM May, and PH Harvey. 1994. The reconstructed evolutionary process. Philosophical Transactions: Biological Sciences, 344:305–311.
Rabosky DL. 2014. Automatic detection of key innovations, rate shifts, and diversity-dependence on phylogenetic trees. PLoS One, 9:e89543.
Rabosky DL. 2016. Is the posterior on the number of shifts overly sensitive to the prior? http://bamm-project.org/prior.html
Rabosky, DL, M Grundler, C Anderson, JJ Shi, JW Brown, H Huang, JG Larson, et al. 2014. BAMMtools: an R package for the analysis of evolutionary dynamics on phylogenetic trees. Methods in Ecology and Evolution, 5:701–707.
Rannala B. 2002. Identifiability of parameters in MCMC Bayesian inference of phylogeny. Systematic Biology, 51:754–760.
Steeman, ME, MB Hebsgaard, RE Fordyce, SY Ho, DL Rabosky, R Nielsen, C Rahbek, H Glenner, MV Sørensen, and E Willerslev. 2009. Radiation of extant cetaceans driven by restructuring of the oceans. Systematic Biology, 58:573–585.
There’s a nice conversation on twitter that is raising some great points that I thought may be worth including here (I’m not a big fan of 140 character limit and navigating byzantine twitter conversation threads).
One question that has come up is whether the option for computing extinction probabilities that we used in our paper (i.e., the ‘combineExtinctionAtNodes = random’ option) is implemented in BAMM. This is indeed one of the options implemented in BAMM, but it is not the default option in BAMM v.2.5—we note, however, that using this option does not require altering any source code. The default option (as of v.2.5) for computing extinction probabilities is the ‘combineExtinctionAtNodes = if_different’ option, which computes extinction probabilities at nodes by taking the product of extinction probabilities of descendant branches if they are different (as described above in the blog post).
A second question is “Why wouldn’t you use the default option (i.e., the combineExtinctionAtNodes = if_different’ option) for your analyses exploring the statistical behavior of BAMM in your paper?” This new option for computing extinction probabilities is an implementation error (a bug); it is an option for computing extinction probabilities that was implemented in BAMM v.2.5 that has not been published or subjected to peer review, and is obviously incorrect. The goal of our paper is to understand the statistical behavior of the BAMM method itself. If we were to explore the behavior of BAMM using the default ‘combineExtinctionAtNodes = if_different’ option/bug, then it would be unclear whether any problems with BAMM were due to problems that are inherent to the method, or simply reflected problems with the implementation of the method. In other words, it would make it difficult to interpret the conclusions of our study. Given that a more correct solution for computing extinction probabilities is implemented in BAMM, we used that option. This is also in line with our goal to evaluate the behavior of BAMM in the most charitable way possible (e.g., centering priors for analyses of simulated datasets to the true values used to generate those data, maximizing the information-to-free parameter ratio, etc.).
A third question is “Why didn’t you provide a discussion of the issues with the ‘combineExtinctionAtNodes’ options in your paper?” Up until the penultimate (fourth) revision of our manuscript, we actually included a detailed section of the Supporting Information document (‘Implementation Errors’) that both described the various bugs that we had discovered in BAMM over the course of our study, and we also presented results of reanalyses of the simulated and empirical datasets using various bugs (to demonstrate how these bugs impacted the results of our analyses and conclusions of our study). However, two of the reviewers and the associate editor requested that we remove this section from the SI. I think their reasons for asking us to remove this section are pretty reasonable. (1) First, it was a very long and detailed discussion of implementation errors that was clearly outside the focus of our paper; our paper is not a bug report, it is a study describing inherent problems with the method. If the only problems with BAMM were implementation problems (bugs), then we would not need to write a paper, we would just need to submit a bug report to the BAMM developers. Moreover, the editor and reviewers argued that no one besides the BAMM developers would be interested in our bug collection. (2) The second justification for omitting description of bugs is that it significantly contributed to the critical tone of our paper. Given that our paper is a critical evaluation of BAMM—it documents serious flaws with the method—it is to some extent inherently/unavoidably critical, but the editor and reviewers (and authors) wanted to avoid an unnecessarily critical tone. In other words, going into gory detail about the implementation errors that we discovered in BAMM just seemed like mean-spirited piling on. Although we generally agree with the arguments for excluding the discussion of bugs from our paper, in retrospect, I still feel that there is also a downside to this decision. Retaining the discussion of the combineExtinctionAtNodes bug would have allowed us to explicitly describe our choices and why we made them. So, I think the decision to remove the discussion of the bugs in BAMM is a mistake on our part. Hopefully this blog post will provide a compromise solution that will allow us to rectify this mistake on our part.
A final question/comment raised in the twitter conversation is “It seems kind of sketchy that you guys didn’t use the default option for your analyses; what gives?” I can see—from an outside perspective—how it might seem like there is something nefarious going on. In fact, our decision is motivated to avoid being sketchy. It seems that concerns regarding the fairness of our decision to use a non-default option implicitly assumes that “default option” equals “best option.” As described above, the default option for computing extinction probabilities in BAMM is demonstrably invalid (i.e., we know it’s wrong), so it would be sketchy to demonstrate problems with BAMM where we know that the results are likely to be adversely impacted by using an obvious bug for computing extinction probabilities. In fact, most of the problems that we identify are (predictably) much worse when using the the default ‘combineExtinctionAtNodes = if_different’ option/bug in BAMM. Specifically, use of this bug greatly exacerbates errors in the likelihoods computed by BAMM, and this bug also has a strong adverse impact on estimates of the diversification-rate parameters (speciation and extinction rates) inferred by BAMM. The only “upside” of the default ‘combineExtinctionAtNodes = if_different’ option/bug is that it partially masks the prior sensitivity issue in BAMM: that is, the default ‘combineExtinctionAtNodes = if_different’ option/bug happens to mask the sensitivity of the estimated number of diversification-rate shifts to the assumed prior on the expected number of diversification-rate shifts. But—as we have explained in this blog post—this masking of the prior sensitivity is just one of the spurious manifestations of this bug. In order to correctly explore the prior sensitivity issue, it is necessary to avoid the confounding effects of this bug.
It may be worth pointing out that prior sensitivity is a relatively minor issue with BAMM. This blog is motivated to address possible confusion stemming from recent contradictory claims about the prior sensitivity of BAMM. The other problems with BAMM that we demonstrate in our paper are far more fundamental: the likelihood function is incorrect (it cannot correctly compute the probability of the data when rates of diversification vary across branches), and the CPP prior model used to describe the prior distribution of diversification-rate shifts across branches is incoherent (it does not provide a valid probability distribution on the number and location of diversification-rate shifts). We may elaborate on these issues in future posts.