2019 Workshop – applications now open
Applications for the 2019 Workshop are open until December 12, 2018. Apply now.
Applications for the 2019 Workshop are open until December 12, 2018. Apply now.
Applications are now being accepted for the 2017 Workshop in Applied Phylogenetics. This year’s workshop will run from March 11 to 18 at the Bodega Bay Marine Lab on the northern California Coast. The application deadline is January 31st. See the 2017 workshop page for more information and instructions to apply.
By Brian R. Moore, Sebastian Höhna, Michael R. May, Bruce Rannala, and John P. Huelsenbeck
Our recent study (Moore et al., 2016) explored the statistical behavior of BAMM (Rabosky, 2014). One of the pathologies demonstrated in our paper is the extreme prior sensitivity of BAMM. That is, the estimated number of diversification-rate shifts inferred using BAMM is highly sensitive to the assumed prior on the expected number of diversification-rate shifts. In a recent post, Rabosky attempts to refute this conclusion by claiming that: (1) we used modified and/or undocumented features of BAMM normally “unavailable” to users, and; (2) the prior sensitivity we observe is an artifact of how we forced BAMM to compute extinction probabilities.
In this post, we address these claims by clarifying: (1) the version of the BAMM we used was unmodified, and all the settings we used are available to all BAMM users; (2) the provenance of the options for computing extinction probabilities, and; (3) the relationship between the prior sensitivity of BAMM and its options for computing extinction probabilities. Specifically, the options for computing extinction probabilities were implemented by the BAMM developers (claims that we modified the source code are simply untrue), and these options are freely available to any BAMM user. Critically, the default option for computing estimation probabilities—the “if_different” opion—does not solve the prior sensitivity issue with BAMM, although it uniquely conceals the prior-sensitivity problem. We emphatically agree with the criticism that the option used in our study (the “random” option) for computing extinction probabilities is undocumented. More troubling is the fact that these major changes have not been published or subjected to peer review, and the default setting—which substantially impacts the statistical behavior of BAMM—is clearly invalid (see our explanation below).
BAMM (Rabosky, 2014) attempts to estimate: (1) the number and location of diversification-rate shifts, and; (2) the diversification-rate parameters (speciation and extinction rates) of every branch in the study tree. Our recent paper (Moore et al., 2016) demonstrates several problems with BAMM: (1) the likelihood function is incorrect (it cannot correctly compute the probability of the data when rates of diversification vary across lineages); (2) the compound Poisson process (CPP) prior model is incoherent (it does not correctly specify the prior distribution for the number and location of diversification-rate shifts across the tree), and; (3) the estimated number of diversification-rate shifts is highly sensitive to the assumed number of diversification-rate shifts a priori (i.e., BAMM exhibits extreme prior sensitivity). Although each of these problems constitutes a serious flaw with BAMM, the prior-sensitivity issue is perhaps the least pernicious. This is because prior sensitivity is relatively straightforward to detect in empirical analyses; that is, simply by repeatedly inferring the posterior number of diversification-rate shifts under a range of prior values on the expected number of diversification-rate shifts. In fact, several careful empirical studies using BAMM have noted that the method exhibits extreme prior sensitivity using this strategy (e.g., Callaghan and McPeek, 2016).
This same strategy was used by Rabosky (2014) and Moore et al., (2016) to explore the prior sensitivity of BAMM under simulation (see post 1). These results are summarized in Figure 1 of our previous post 1; the estimated posterior distribution closely resembles the corresponding prior distribution for the expected number of diversification-rate shifts. Nevertheless, Rabosky (2014) used a procedure for summarizing these results that concealed the prior sensitivity of BAMM (see Figure 2 in our previous post 1). Clearly, if the intention is to assess the prior sensitivity of BAMM, Rabosky’s (2014) procedure for summarizing these results is invalid because it cannot detect prior sensitivity: for any value of the prior, the histogram of posterior modes will always have a value of one, simply because the prior distribution will always have a mode of 1, and the posterior always closely resembles the prior (Figure 1).
Rabosky has claimed that our demonstration of the prior sensitivity of BAMM is “scientifically unjustified” because we did not use the “if_different” option for computing extinction probabilities (introduced in BAMM v.2.5). This is a curious criticism of our results, as it applies equally to the results in his original study (Rabosky, 2014, which similarly did not use the “if_different” option). In any case, let’s explore Rabosky’s various claims to see if they have merit.
Rabosky has claimed that we have used a version of the BAMM source code that implements options not available to non-developers. However, this claim is entirely specious. As noted by careful readers, the version of BAMM we used differs only trivially from BAMM v.2.5 in that it writes extinction probabilities to a file; this modification was necessary to assess the degree of the error made by BAMM in computing the extinction probability, and does not in any way alter how the quantity itself is computed. For easy comparison, our (trivially) modified version of BAMM v.2.5. is available on a GitHub repository provided as supplementary material for Moore et al. (2016).
Further, we note that the various options for the combineExtinctionAtNodes setting (which we describe below) are available to all users of BAMM without modifying any source code—it is simply an option specified by the user in the BAMM input file. That’s how you use the program: if you don’t specify an option for a setting in the input file, the program uses the default option. As we explained in our previous post, we wished to evaluate the performance of BAMM in the most favorable possible circumstances, and so we chose to avoid using obviously incorrect features (i.e., “bugs”), such as the “if_different” option, because doing so would constitute evaluating implementation errors (a “bug report”) rather than identifying critical theoretical and statistical errors inherent to this popular method.
Recall (post 1) that birth-death processes result in lineages that may go extinction before the present. Consequently, computing the probability of an observed tree requires that we allow for the possibility of unobserved speciation events (i.e., where one of the daughter lineages has gone extinct before the present). The probability that a lineage goes extinct before the present is called the extinction probability, and depends only on the speciation- and extinction-rates under which the lineage arose and its age. BAMM computes extinction probabilities using a recursive algorithm that begins at the tips of the tree and moves down the branches in small time steps. At each time step, it computes the change in the extinction probability using a set of differential equations (see Moore et al., 2016; SI1.2 Likelihood function). When the algorithm reaches an internal node (in versions prior to BAMM v.2.5), the extinction probability for the node is set to the extinction probability of its left descendant lineage. More recent versions of BAMM (> v.2.5) implement five options for computing extinction probabilities at an internal node:
In Moore et al. (2016), we analyzed our simulated data using what seems to be the most sensible option available, “random”. However, we note that all of these options are wrong, and none of them have been subjected to peer review. How do these alternative options for computing extinction probabilities impact the primary symptom of prior sensitivity? We re-analyzed our constant-rate simulations (as described here) using each of these options, and compared the prior to each of the resulting posterior distributions (Figure 1, below).
Each of these settings demonstrate some degree for prior sensitivity; however, the “if_different” option is the only option that causes the estimated posterior to depart substantially from the corresponding prior distribution. Notably, the “left”, “right”, and “random” options generate effectively identical results (i.e., they exhibit extreme prior sensitivity). Of all available options, the one that exhibits the least prior sensitivity—the “if_different” option—is the default setting for computing extinction probabilities in BAMM. Of course, one might argue that, if extreme prior sensitivity is a problem, anything that reduces the apparent prior sensitivity is a good thing. However, this argument fails because the “if_different” option is an arbitrary and invalid change to the code that causes errors in the computed extinction probabilities, and these errors cause the estimated posterior to artifactually depart from the assumed prior. In other words, this is a bug that only conceals the symptom of prior sensitivity (i.e., by causing the posterior to depart from the prior).
Consider a tree with a single diversification-rate shift from a “blue” rate category to an “orange” rate category (Figure 2, below). For simplicity, we assume that—within each of these rate categories—speciation- and extinction-rates are constant (i.e., they do not depend on time). If we follow a path from a particular tip (in the orange rate category) to the root of the tree, we can examine how the computed extinction probabilities change through time for each of the five combineExtinctionAtNodes options (Figure 3, below; under a constant-rate birth-death process, we can compute the extinction probabilities analytically, which we show as the blue and orange lines in each panel). The correct behavior for this extinction probability (which is not available as an option in BAMM) is that, as soon as there is a rate shift, the extinction probability immediately becomes the extinction probability of the new rate category (Figure 5, left panel). (Rabosky calls this procedure the “recompute” method because it recomputes extinction probabilities for every diversification-rate regime; he further claims that the “recompute” equations are invalid, which we will refute in a forthcoming post.) Now let us consider how the “if_different” option computes the extinction probability.
Under the “if_different” option, the extinction probabilities at nodes 1 and 2 (Figure 2, above) are unaffected; this is because both descendant lineages belong to the same rate category, and therefore the extinction probabilities will be identical when they are evaluated at each node. However, when we reach node 3, the extinction probabilities from the left-descendant lineage (≈0.21, Figure 3, below) and the right-descendant lineage (≈0.19) are different, so the initial extinction probability for the branch subtending node 3 is set to the product of those numbers (≈0.04). These quantities imply that extinct lineages arising immediately after node 3 have approximately the same extinction probability (≈0.2), while extinct lineages arising immediately before node 3 have a very different (and much smaller) extinction probability (≈0.04), even though lineages arising immediately before and after the node are arising in the same (blue) rate category at essentially the same moment in time!
Extinction probabilities are small numbers (i.e., < 1), so invocation of the “if_different” option will always result in increasingly smaller extinction probabilities (i.e., taking the product of two small numbers gives a much smaller number). Accordingly, when “if_different” option is invoked at a given internal node (i.e., where the extinction probability is computed as the product of the extinction probabilities of its descendant lineages), this will cause a sharp decrease in the extinction probability. This sharp decrease in extinction probability will, in turn, trigger the “if_different” option at all more inclusive nodes along the path the root of the tree. For example, when we reach node 4, the extinction probability from the right lineage (≈0.04) is guaranteed to be different than from the left lineage (≈0.21, since the extinction probability of the right lineage is based on the product taken at node 3), resulting in a “cascade” of products of ever smaller extinction probabilities (second panel in Figure 3, below). The consequence of this procedure is that the extinction probabilities (and thus the likelihood) computed by the “if_different” option depart in a substantial and unpredictable way from the true extinction probabilities! By contrast, the remaining options never take products of extinction probabilities, so that, even they they are still not technically correct, they are at least less incorrect than the “if_different” procedure (Figure 3, remaining panels).
In this post, we have explored the relationship between prior sensitivity and the computation of extinction probabilities in BAMM. Prior sensitivity is generally diagnosed by comparing the estimated posterior distribution to the corresponding prior distribution; i.e., the similarity between the inferred posterior and the assumed prior distributions is the primary diagnostic symptom of prior sensitivity. This diagnostic strategy was used by Rabosky (2014) to assess the prior sensitivity of BAMM; the results of his study demonstrate that estimates of the number of diversification-rate shifts inferred using BAMM are extremely sensitive to the assumed prior on the expected number of diversification-rate shifts. Even a cursory inspection of the prior and posterior distributions makes it clear that BAMM exhibits extreme prior sensitivity (e.g., see Figure 1 from our previous post). However, as we discussed in our previous post, Rabosky (2014) summarized these results in a manner that concealed the extreme prior sensitivity of BAMM (e.g., see Figure 2 from our previous post).
On September 15, 2015, we submitted our initial manuscript to PNAS, which exposed the error in Rabosky’s (2014) summary procedure, and thereby revealed the extreme prior sensitivity of BAMM v.2.3. At our request, Rabosky reviewed the initial submission of our manuscript. On November 4, 2015, Rabosky released a new version of BAMM (v.2.5). This new version implemented four new options for computing extinction probabilities. Specifically, these options provide alternative ways to specify the extinction probability of an internal node based on the extinction probabilities of its two immediate descendant lineages. We have demonstrated in this post that the inferred posterior follows the assumed prior for four of these five options—including the previously hard-coded “left” option—for computing extinction probabilities. However, the “if_different” option uniquely causes the inferred posterior to depart from the assumed prior. Accordingly, the “if_different” option causes the symptom of prior sensitivity to disappear. The “if_different” option was made the default setting in BAMM v.2.5.
The results of our study (like those of his original study, Rabosky 2014) are not based on the new default “if_different” option for computing extinction probabilities. Rabosky has seized on this point in an attempt to dismiss our conclusion that BAMM is prior sensitive by claiming: (1) that we modified the BAMM source code [this statement is untrue]; (2) that we used a “developers-only” (e.g., a “hidden” and “undocumented”) option that is not available to other users [this statement is untrue]; (3) that prior sensitivity is not exhibited by BAMM when using the new, default “if_different” option for computing extinction probabilities. Does this last claim mean that the “if_different” option for computing extinction probabilities solves the underlying prior sensitivity afflicting BAMM?
To help understand the answer to this question, consider the following analogy. Imagine that your vehicle has a serious engine problem (e.g., it is out of engine oil because of a serious leak). You will be alerted to this fact by the “check-engine” light in your instrument panel. The check-engine light is the symptom of a serious underlying problem. A really, really bad mechanic might suggest “solving” this mechanical problem by cutting the power to your instrument panel. After all, this constitutes a “solution” to the extent that the check-engine light is no longer illuminated. But this clearly does not solve the underlying problem (the engine is still out of oil). Moreover, this “solution” is likely to cause other serious problems that will adversely impact the operation of the vehicle (e.g., you will not be able to monitor the vehicle speed, fuel level, engine temperature, etc.).
As we have demonstrated above (section 3.2), all of the options for computing extinction probabilities at internal nodes in BAMM are incorrect. However, the error in the computed extinction probabilities is most severe using the default “if_different” option (Figure 3). The errors in the extinction probabilities computed under the “if_different” option will, in turn, cause errors in the computed likelihoods. The errors in the computed likelihoods will, in turn, cause errors in the computed posterior probabilities. The errors in the computed posterior probabilities will, in turn, cause the estimated posterior probability distribution to depart from the assumed prior distribution. The “if_different” option, therefore, causes the diagnostic symptom of prior sensitivity—the similarity of the estimated posterior to the assumed prior—to disappear. However, like cutting power to the warning light, this “solution” only masks the underlying problem; i.e., the “if_different” option only conceals the prior sensitivity in BAMM, it does not solve the underlying problem. As in our automotive analogy, the “if_different” option also causes other adverse impacts on the behavior of BAMM; the increased error in the computed likelihoods will adversely impact parameter estimates, etc.
Accordingly, Rabosky’s claim that BAMM does not exhibit prior sensitivity using the default “if_different” option is technically true, but only to the extent that it conceals the symptom of prior sensitivity. In fact, the “if_different” option is a bug that causes extreme errors in the extinction probabilities (and the likelihoods and parameter estimates). The only “positive” aspect of this bug is that it conceals a pathology of BAMM. However, if we are interested in assessing the statistical behavior of BAMM, then the fact that it conceals problems with the method is clearly undesirable. For this reason, we explored the statistical behavior of BAMM (including the prior sensitivity) using the most correct option for computing extinction probabilities (the “random” option). It would be unfair and confusing to use the default “if_different” option because: (1) we know this is a serious bug, and; (2) this bug is likely to cause pathologies that complicate our attempts to characterize inherent problems with the method.
Finally, we concede that Rabosky has made one valid criticism: the “random” option for computing extinction probabilities is “hidden” and has not been “documented”. That is, this option for computing extinction probabilities in BAMM (like all of the other options for computing extinction probabilities) has not been published, has not been subjected to peer review, and is clearly incorrect. In fact, this is the case for much of the current code in BAMM: the method currently implemented in BAMM is not the method that is described in Rabosky (2014) or in any peer-reviewed paper. We encourage the BAMM developers to expose the major changes to the theory implemented in BAMM to a formal, external review process. More careful and transparent method development would hopefully help identify and avoid the most obvious errors.
All of the phylogenetic data and BAMM config files used for this blog post are publicly available as a BitBucket repository.
Callahan, MS, and McPeek, MA. 2016. Multi-locus phylogeny and divergence time estimates of Enallagma damselflies (Odonata: Coenagrionidae). Molecular Phylogenetics and Evolution 94: 182–195.
Moore, BR, S Höhna, MR May, B Rannala, and JP Huelsenbeck. 2016. Critically evaluating the theory and performance of Bayesian analysis of macroevolutionary mixtures. Proceedings of the National Academy of Sciences, USA, (in press).
Rabosky DL. 2014. Automatic detection of key innovations, rate shifts, and diversity-dependence on phylogenetic trees. PLoS One, 9:e89543.
Rabosky DL. 2016. Is the posterior on the number of shifts overly sensitive to the prior? http://bamm-project.org/prior.html
Steeman, ME, MB Hebsgaard, RE Fordyce, SY Ho, DL Rabosky, R Nielsen, C Rahbek, H Glenner, MV Sørensen, and E Willerslev. 2009. Radiation of extant cetaceans driven by restructuring of the oceans. Systematic Biology, 58:573–585.
By Brian R. Moore, Sebastian Höhna, Michael R. May, Bruce Rannala, and John P. Huelsenbeck
In our recent study (Moore et al., 2016), we explored the ability of BAMM to provide reliable estimates of branch-specific diversification-rate parameters (speciation and extinction rates). To make our results comparable to those presented in the original study (Rabosky, 2014), we summarized the results of our simulation using the identical procedure as that used in the original study. The results of our study—when summarized in a comparable and appropriate manner—reveal that variation in branch-specific estimates of diversification parameters (speciation and extinction rates) are uncorrelated with the true (simulated) values. In a recent post on the BAMM website, Rabosky attempts to dismiss this conclusion by claiming that: (1) he is unable to replicate our results; (2) our demonstration that BAMM cannot reliably estimate diversification-rate parameters is an artifact of our use of the “random” option for computing extinction probabilities, and; (3) finally, he claims that BAMM provides reliable parameter estimates when using the default “if_different” option for computing extinction probabilities. In this post, we discuss each of these claims, demonstrating that: (1) the results regarding parameter estimates presented in Moore et al. (2016) are summarized in the same way as those in the original study (Rabosky, 2014); (2) parameter estimates using BAMM are biased regardless of whether we use the random or if_different options for computing extinction probabilities, and; (3) Rabosky’s claim that BAMM provides reliable parameter estimates is an artifact of his adoption of a new (and entirely invalid) way of summarizing these results that happens to conceal the pathological behavior of BAMM.
BAMM attempts to estimate: (1) the number and location of diversification-rate shifts, and; (2) the diversification-rate parameters (speciation and extinction rates) of every branch in the study tree. Although we have previously demonstrated that BAMM cannot estimate the number and location of diversification-rate shifts (see post 1 and post 2), we might hope that BAMM may still be able to estimate the branch-specific diversification-rate parameters. If this were the case, it would be possible to informally infer the pattern of diversification-rate shifts by inspecting how the posterior estimates of diversification rates vary across branches. In our paper, we demonstrated that BAMM cannot reliably estimate diversification-rate parameters; in fact, we demonstrated under simulation that estimated variation in branch-specific speciation and extinction rates is generally uncorrelated with the true parameter values when we summarize the results following the procedure advocated by Rabosky (2014). Here, we show that our results are summarized identically to Rabosky (2014). Rabosky has responded to our paper claiming that he is unable to replicate the results of our study. We are at a loss to explain this claim, as we simply followed the summary procedure that he used in his study (Rabosky, 2014).
To assess the accuracy of parameter estimates obtained using BAMM, Rabosky (2014) simulated 500 trees under each of five alternative scenarios where diversification rates varied across branches. He then estimated branch-specific speciation- and extinction-rate parameters using BAMM, and computed the mean of the posterior distribution of the speciation and extinction rates for each branch of each tree. Next, for each tree, he computed the “correlation” between the true rates and estimated rates by fitting a linear regression, using the true rates as the predictor variable and the estimated rates as the response variable. What do the slopes of the correlation coefficients indicate? A positive regression slope indicates that the relative parameter estimates are reliable; the method is correctly inferring the difference between branches with relatively high rates and branches with relatively low rates in the corresponding simulated tree. Conversely, a negative regression slope indicates that the relative parameter estimates are negatively correlated; the method is incorrectly inferring that relatively high-rate branches have relatively low rates (and vice versa). Finally, a regression slope of zero indicates that the relative parameter estimates are uncorrelated; the method fails to correctly distinguish between branches with relatively high rates and branches with relatively low rates. In other words, positive slopes indicate that BAMM provides reliable parameter estimates (the accuracy of the relative-rate estimates increase as the slopes approach one), whereas negative and zero slopes indicate that BAMM provides unreliable parameter estimates. For each of the five rate-variable simulation scenarios, Rabosky (2014) generated 500 slopes (1 per tree).
Rabosky also assessed the performance of parameter estimates by computing the proportional error of the branch-specific parameter estimates, defined as:
where refers to a particular parameter (e.g., speciation or extinction rate) and refers to a particular branch; a proportional error near 1 indicates that the estimated value is close to the true value. Rabosky then computed the mean proportional error for each tree.
Finally, Rabosky generated histograms of the regression slopes (Figure 1, left column) and mean proportional errors (Figure 1, right column) for each of the five variable-rate scenarios (one for each of the rows in Figure 1). The summaries presented by Rabosky (2014) are tightly peaked around 1, which he argues indicates the ability of BAMM to provide accurate and unbiased estimates of diversification parameters.
We followed the above procedure to make our results comparable to those presented in the original study (Rabosky, 2014), with one small exception. Rather than computing the mean proportional error of all the branches per tree, we simply computed (and plotted) the proportional error across all branches (i.e., without taking the mean of the error per tree); this decision was motivated by the concern that taking the mean of the errors across a tree could mask systematic errors within individual trees, but it turns out to make very little difference (see Figure 3 below). Our results demonstrate that branch-specific variation in diversification parameters (speciation and extinction rates) estimated by BAMM is uncorrelated with the true (simulated) values (see Figure 2, which is Figure 6 from Moore et al., 2016) when data are simulated under the BAMM model. To increase the ease of comparison, we have also summarized our results as histograms, following Rabosky (2014) (see Figure 3, below).
Rabosky also attempts to dismiss the conclusion that BAMM does not provide reliable parameter estimates by claiming that our conclusions are an artifact of using the (more sensible) “random” option for computing extinction probabilities, and that BAMM provides reliable estimates of diversification-rate parameters when using the default “if_different” option for computing extinction probabilities. Fortunately, it is straightforward to evaluate this claim: to this end, we simply reanalyzed all of the rate-variable simulated trees using the “if_different” option for computing extinction probabilities, and summarized the resulting parameter estimates following the protocol used in Rabosky (2014) and replicated in Moore et al., 2016. These results reveal Rabosky’s claim to be patently false: our conclusions are demonstrably not an artifact of using the (more sensible) “random” option for computing extinction probabilities. Rather, these results confirm that variation in diversification-rate parameters estimated using BAMM is uncorrelated with the true variation in rates regardless of the option used to compute extinction probabilities (Figure 5). The estimated error in diversification rates is virtually identical using either the “random” or “if_different” option.
BAMM is intended to estimate diversification-rate parameters (speciation and extinction rates) of every branch in the study tree. Therefore, evaluating the ability of BAMM to accurately estimate rate parameters must focus on individual trees. Rabosky (2014) recognized this, and accordingly summarized the relationship between the true branch-specific rates and estimated branch-specific rates for each simulated tree. As we discussed above, we simply followed this procedure in our study (Moore et al., 2016). However, in attempting to invalidate the results presented in our PNAS paper, Rabosky chose to assess the relationship between true and estimated diversification rate parameters in a new—and misleading—way. Specifically, rather than assessing correlations between true and estimated rates for individual trees, Rabosky instead infers correlations by pooling the true and estimated branch-specific rates across all trees in the simulation. This new procedure has the effect of conflating between-tree rate variation and within-tree rate variation, thus confounding our ability to assess whether BAMM is correctly estimating rate variation within individual trees.
Consider two trees with among-lineage speciation-rate variation, such that some lineages within each tree are speciating twice as fast as the other lineages in the tree. Further, imagine that the “average” speciation rate of the first tree is twice that of the second tree. Now imagine we use BAMM to estimate branch-specific speciation rates for each of those trees, but that it is unable to accurately characterize how speciation rates vary across the branches of each tree (the quantity we actually care about!), but BAMM can nonetheless estimate the “average” speciation rate of each tree. Estimating the correlation between the true and estimated speciation rates for each tree separately (correctly) reveals that the correlation is zero. By contrast, estimating the correlation between the true and estimated speciation rates after pooling all of the branches from the two trees conceals the poor performance of BAMM; we observe a positive relationship in the pooled sample simply because BAMM estimates higher average rates for one tree compared to the other! Clearly, summarizing the relationship between true and estimated branch-specific diversification rates without controlling for among-tree differences in diversification rates induces a spurious and positively misleading impression of the BAMM’s performance. We demonstrate this phenomenon with real trees from our simulation study (Figure 6); pooling results among trees conceals the poor performance of rate-parameter estimates within individual trees (Figure 7).
The relevant aspect of the branch-specific diversification-rate parameter estimates is their relative values. This is because we are interested in changes in the relative magnitude of diversification rates across branches of the study tree. For example, we are interested to discover that rates of speciation in the dolphin clade is three times higher than the speciation rate in their cetacean relatives (Steeman et al.,2009); by contrast, we are generally uninterested in differences in the absolute values of the dolphin and non-dolphin speciation rates. These considerations inform how we should assess error in estimates of the diversification-rate parameters. Imagine, for example, that the estimated speciation rate for a branch differes from the true speciation rate by an absolute value of 0.03. If the true speciation rate was 0.01, the absolute error (i.e., 0.03 – 0.01 = 0.02.) may seem small. However, this degree of estimation error implies a three-fold difference in the speciation rate! Presumably, such considerations informed Rabosky’s decision to use proportional error to characterize the ability of BAMM to estimate diversification-rate parameters in his study (Rabosky, 2014). Proportional error measures the relative error of parameter estimates, and is simply computed as the estimated value divided by the true value:
where refers to a particular parameter (e.g., the speciation or extinction rate) and refers to a particular branch; a proportional error near 1 indicates that the estimated value is close to the true value.
In our study, we followed Rabosky (2014), and used proportional error to assess the reliability of diversification-rate parameters using BAMM. The results of our simulation study—using the same summary to quantify estimation error as that used by Rabosky (2014)—demonstrate that BAMM does not provide reliable parameter estimates (Figure 3). For reasons that are unknown to us, the recent post by Rabosky attempts to discredit this conclusion by using a different (and misleading) procedure to summarize the results of our simulation. Specifically, Rabosky uses the mean-squared error to summarize the error in diversification-rate parameters. Mean-squared error measures the squared absolute difference between the true and estimated diversification rates, which is computed as:
As discussed above, assessing error in speciation and extinction rates (which are small absolute values) is both deeply misleading and completely irrelevant. Moreover, because the speciation and extintion rates are small, taking the square of the difference has the effect of further shrinking the apparent error (Figures 8 and 9). Accordingly, Rabosky appears to prefer the (correct) summary measure—the proportional error—unless it reveals that BAMM does not provide reliable parameter estimates. When this relevant summary exposes the fact that BAMM does not provide reliable diversification-rate parameter estimates, Rabosky suddenly searches for alternative (and inappropriate) summaries that conceal the flaws in BAMM.
Finally, we also calculated the error in estimated diversification parameters using the “if_different” option for computing extinction probabilities (Figure 10). These results are qualitatively identical to those presented in our study (i.e., the results are the same for both the “random” and “if_different” options for computing extinction probabilities). Again, these results reveal that Rabosky’s claims are manifestly untrue: our main conclusion—BAMM does not provide reliable parameter estimates—is not an artifact option used to compute extinction probabilities.
In this post, we have demonstrated that BAMM does not provide reliable estimates of branch-specific diversification-rate parameters (speciation and extinction rates). The results of our simulation—when summarized correctly—confirm that the estimated variation in rates is uncorrelated with the true variation in rates. We have also demonstrated that Rabosky’s claim—that he is “unable” to reproduce our results—is specious; the procedures that we used to summarize the results of our simulation exploring the performance BAMM are identical to the procedures used in his original study (Rabosky, 2014). Moreover, we have demonstrated that Rabosky’s claim—that our conclusions are “scientifically unjustified” because of the option we used to compute extinction probabilities in BAMM—is demonstrably untrue: the estimated variation in rates inferred using BAMM is uncorrelated with the true variation in rates using both the default “if_different” option and “random” option for computing extinction probabilities in BAMM. Finally, we have exposed Rabosky’s attempt to conceal the pathological statistical behavior of BAMM; he has attempted to discredit the conclusions of our study by manipulating our results using new—and completely invalid and misleading—summary procedures that conceal the poor performance of BAMM.
The demonstration that BAMM does not provide reliable estimates of branch-specific diversification-rate parameters should come as no surprise. The inability of BAMM to provide reliable estimates of the posterior distribution of branch-specific rate parameters is unsurprising, given our formal proofs that: (1) the likelihood function used by BAMM is incorrect (it cannot correctly compute the probability of the data when rates of diversification vary), and; (2) the CPP prior model used by BAMM is incoherent (it cannot correctly specify the probability distribution of rates across branches of the tree). The posterior distribution is an updated version of the prior distribution, where the prior is updated by the information in the data via the likelihood function. Given our demonstration that the likelihood function in BAMM is invalid, and our demonstration that the prior model in BAMM is invalid, it is unreasonable to expect that, somehow, the posterior estimates of the parameters will be reliable.
We wrote our paper because we believe that it is in the best interest of our community to expose the serious problems with BAMM. Our significance statment makes this clear: “Exposing the problems with BAMM is important both to empiricists (to avoid making unreliable inferences using this method) and to theoreticians (to focus their efforts on solving the problems that we identify).” We are dismayed by Rabosky’s response to our paper. Rather than trying to correct the problems afflicting BAMM (i.e., to develop a correct likelihood function and a coherent prior model), he has resorted to making demonstrably false statements about our study and attempting to dismiss our conclusions by manipulating our results. These efforts will not resolve the problems with BAMM; they can only serve to cause confusion regarding the problems with BAMM. In this regard, we are troubled that Rabosky is putting his own interests above those of our community.
All of the phylogenetic data and BAMM config files used for this blog post are publicly available as a BitBucket repository.
Moore, BR, S Höhna, MR May, B Rannala, and JP Huelsenbeck. 2016. Critically evaluating the theory and performance of Bayesian analysis of macroevolutionary mixtures. Proceedings of the National Academy of Sciences, USA, (in press).
Rabosky DL. 2014. Automatic detection of key innovations, rate shifts, and diversity-dependence on phylogenetic trees. PLoS One, 9:e89543.
Rabosky DL. 2016. The Moore et al (2016) use of developer switch “random” was not scientifically justified. link
Steeman, ME, MB Hebsgaard, RE Fordyce, SY Ho, DL Rabosky, R Nielsen, C Rahbek, H Glenner, MV Sørensen, and E Willerslev. 2009. Radiation of extant cetaceans driven by restructuring of the oceans. Systematic Biology, 58:573–585.
By Brian R. Moore, Sebastian Höhna, Michael R. May, Bruce Rannala, and John P. Huelsenbeck
BAMM (Rabosky, 2014) is a popular Bayesian method intended to estimate: (1) the number and location of diversification-rate shifts across branches of a phylogeny, and; (2) the diversification-rate parameters for every branch of the study tree. As a Bayesian method, BAMM seeks to estimate the joint posterior probability density (the ‘posterior’) of the model parameters, which reflects our belief in the parameter values after evaluating the data at hand. The posterior is an updated version of the joint prior probability density (the ‘prior’) of the model parameters, which reflects our belief in the parameter values before evaluating the data at hand. In this post, we elaborate on our recent discovery (Moore et al., 2016) that BAMM exhibits prior sensitivity; i.e., where the inferred posterior is strongly influenced by the assumed prior.
To this end, we first discuss why prior sensitivity is a theoretical concern for BAMM. We then describe our efforts to explore the prior sensitivity of BAMM by means of simulation. We present results demonstrating that posterior estimates of the number of diversification-rate shifts are sensitive to the assumed prior on the number of diversification-rate shifts. Next, we describe how the original paper (Rabosky, 2014) concealed the strong prior sensitivity of BAMM. Finally, we describe how errors in a recent presponse to our paper (Rabosky, 2016) masked the prior sensitivity of BAMM. Importantly, both the results of our study (Moore et al., 2016) and those described in the presponse to our paper (Rabosky, 2016) used the current version of BAMM (v.2.5). However, the results of Rabosky (2016) masked the prior sensitivity of BAMM because those analyses used an option in BAMM v.2.5 that incorrectly computes extinction probabilities.
BAMM provides estimates of the number and location of diversification-rate shifts across the branches of a study tree, and also estimates the diversification-rate parameters—speciation, extinction, and time-dependence parameters—for every branch of the phylogeny. BAMM is implemented in a Bayesian statistical framework, which seeks to estimate the posterior probability distribution of the model parameters.
From Bayes theorem:
The posterior probability reflects our belief regarding the parameter values after evaluating the data at hand; it is an updated version of the prior probability, which reflects our belief about the parameter values before evaluating the data at hand. The likelihood function is the vehicle that extracts the information in the data to transform our prior beliefs into our posterior beliefs about the parameter values. BAMM uses the following priors:
The CPP prior model describes the prior distribution of the number and location of diversification-rate shifts across branches of the tree. This is a concern because the CPP model is often non-identifiable: that is, the likelihood function returns an identical value for the observed data under an infinite number of parameter combinations (Rannala, 2002). Specifically, under the CPP, the observed data may be equally likely to be the outcome of relatively infrequent shifts of large magnitude, or relatively frequent shifts of small magnitude. If the likelihood function cannot identify (distinguish between) parameter values based on the data, the posterior probability distribution will closely resemble the assumed prior distribution. This is the issue of prior sensitivity: our posterior estimates of parameters (e.g., about the number and location of diversification-rate shifts) after looking at the data will essentially be identical to whatever we assumed about these parameters before looking at the data.
It is straightforward to assess whether BAMM exhibits prior sensitivity: we simply need to compare the posterior distribution for the estimated number of diversification-rate shifts inferred under a variety of assumed prior distributions for the expected number of diversification-rate shifts. If BAMM is sensitive to the assumed prior, then the estimated posterior distribution will depend strongly on the assumed prior distribution (i.e., the estimated posterior distribution for the number of diversification-rate shifts will change to reflect corresponding changes in the assumed prior distribution for the number of diversification-rate shifts).
We explored the prior sensitivity of BAMM by simulating 100 trees under a constant-rate birth-death process (where the true number of diversification-rate shifts is zero). To ensure that these simulated trees were empirically realistic, we first estimated the speciation and extinction rates for the cetacean phylogeny (Steeman 2009; which was analyzed by Rabosky, 2014) under a constant-rate birth-death model using RevBayes (Höhna et al. 2016). We then simulated each tree under a constant-rate birth-death process by sampling values for the speciation and extinction rates from the corresponding posterior distributions estimated for the cetacean tree, where the age of each simulated tree is the same as that of the cetacean tree.
We analyzed each simulated tree using BAMM (v.2.5), centering the speciation- and extinction-rate prior distributions on the true value for the simulated phylogeny, and under a range of prior values on the expected number of diversification-rate shifts (, where the expected number of rate shifts is ). For each prior value, we plotted the prior distribution and the posterior distribution for the number of diversification processes (a tree with a single process has zero diversification-rate shifts) averaged over the 100 trees (Figure 1).
We note that our simulation study exploring the prior sensitivity of BAMM represents a best case scenario: (1) a constant-rate birth-death process is the simplest model described by BAMM, which minimizes the number of parameters that must be estimated from a given amount of data, and; (2) we have centered the priors for the diversification-rate parameters (speciation and extinction rate) on their true values.
The results of our simulation study described above indicate that estimates of the number of diversification-rate shifts using BAMM are sensitive to the assumed prior. This conclusion contradicts the finding presented in the original study: Rabosky (2014) performed a simulation study that apparently demonstrated that BAMM is not overly sensitive to the prior. These contradictory conclusions are not due to differences in the two simulation studies; in fact, the design of the two simulation studies is very similar.
Similar to our study, Rabosky simulated 500 constant-rate trees (where the true number of diversification-rate shifts is zero). He then analyzed each tree with BAMM under a range of priors values for the expected number of diversification-rate shifts; . For each tree, he recorded the mode of the posterior distribution for the number of diversification-rate shifts: the mode of the posterior is also known as the maximum a posteriori (or MAP) estimate. He repeated this process for each of the 500 simulated trees, recording a set of 500 MAP estimates for a given prior value, e.g., . He then summarized the corresponding set of 500 MAP estimates as a histogram (a frequency distribution) for each prior value. Rabosky then repeated this procedure to create histograms for each of the three prior values for the expected number of diversification-rate shifts. Finally, he noted that the histograms of the posterior modes for the estimated number of diversification-rate shifts looked similar for all three values of the prior. Accordingly, Rabosky concluded that posterior estimates of the number of diversification-rate shifts inferred using BAMM are insensitive to the prior assumptions about the expected number of diversification-rate shifts. The results from his study (and our replicated results summarized in the same way) are depicted in Figure 2.
As you can see in Figure 2, using the mode (MAP) to summarize the estimated posterior distribution on the number of events is an unfortunate choice if we are interested in assessing the prior sensitivity of BAMM. It is an error because it makes it impossible to assess whether the posterior is influenced by the prior. This is simply because the prior distribution for the number of diversification processes has a mode of one for all values of the prior (see Moore et al., 2016 SI1.3 Joint prior distribution). Therefore, even if the estimated posterior was virtually identical to the assumed prior (which is the case here), the posterior will always have mode of one for constant-rate trees. Therefore, if we made a histogram of the posterior modes inferred for each tree, the most frequent value would always be one, regardless of the prior. It cannot be otherwise.
In fact, this is exactly what we observe when we summarize the results of our simulation using the same protocol as that used by Rabosky (see the lower row of panels in Figure 2, above). The most frequent value for the posterior mode (MAP) is always one, because the prior mode is always one, and the estimated posterior always resembles the assumed prior. Accordingly, the MAP is incapable of detecting prior sensitivity on constant-rate trees; the histogram of the MAPs will always be similar regardless of the prior on constant-rate trees, so the similarity of the MAP histograms for the various priors does not provide evidence that BAMM is insensitive to the choice of prior. Rabosky’s (2014) conclusion that BAMM is insensitive to the prior is therefore misleading. Simple inspection of the posterior and prior distributions makes it immediately obvious that the number of diversification-rate shifts estimated by BAMM is extremely sensitive to the assumed prior.
There is a second curious aspect of the results presented by Rabosky (2014). The histograms of the MAPs presented in Rabosky (2014; Figure 2) start to move toward the right as the prior favors more diversification-rate shifts. Specifically, the frequency of the MAPs is concentrated on one process when , is split between 1 and 2 processes when , and is concentrated on 2 processes when (see Figure 2, top row). It is as though the posterior estimates of the number of processes are increasing slightly as the prior increasingly favors more diversification-rate shifts. Rabosky describes this observation as follows: “With increasing values of , the model with maximum a posteriori probability (MAP) was biased in favor of M1, a model with two processes.” Accordingly, these results create the impression that BAMM is only slightly sensitive to the assumed prior on the number of diversification-rate shifts. However, as we have shown, these results cannot be true. The prior mode for all values of is one (see Figure 1 and Figure 2, bottom row). Therefore, there is no reason for the frequency distribution of the MAPs to shift rightward (from 1 to 2 processes as ranges from 1 to 10) because the true number of processes for these constant-rate trees is always one. We are unable to reproduce these results.
Recently, Rabosky reported on the BAMM website that he was unable to reproduce our results showing prior sensitivity for the estimated number of diversification-rate shifts using the latest version of BAMM (v.2.5). In this section, we clarify the source of this discrepancy. Specifically, Rabosky’s results differ from our published results because he used an option (combineExtinctionAtNodes = if_different, rather than combineExtinctionAtNodes = random as we used) implemented in BAMM (v.2.5) that causes the estimated number of diversification-rate shifts to artifactually depart from the prior. This new option is essentially an implementation error (i.e., a “software bug”) because it derives from a miscalculation of extinction probabilities. Clearly, any effort to evaluate the statistical behavior of a given method should avoid the confounding effects of any errors in the implementation of that method. In other words, we want to understand whether the there are problems with the method itself, not whether there are problems with the implementation of the method. Accordingly, the analyses that we present in our PNAS paper did not use this invalid option.
Birth-death processes (with non-zero extinction rates) give rise to lineages that may go extinct before the present. Accordingly, if we want to compute the probability of a phylogeny generated by a birth-death process, we need to accommodate the possibility of unobserved speciation events (where one of the daughter lineages has gone extinct before the present). Naturally, this requires the ability to compute the probability that a given lineage goes extinct, a quantity referred to as the extinction probability. The need to compute the extinction probability is by no means unique to BAMM: Kendall (1948) describes computing the “chances of extinction” for general birth-death processes, Nee et al. (1994) describe the probability of a phylogeny given that it survived (1 – the probability that it went extinct), etc. Perhaps of most direct relevance to BAMM, Maddison et al. (2007) explicitly define the extinction probability under the BiSSE model (from which BAMM draws heavily). Specifically, Maddison et al. (2007) define the extinction probability as E(t); the probability that a lineage that exists at some time in the past, t, goes extinct before the present time. We wrote a simple program, available here, that performs forward simulation under the described birth-death process to demonstrate what the extinction probability represents.
Under the birth-death process model (as also used in BAMM), the fate of a given lineage is independent of other contemporaneous lineages; Maddison et al. (2007) note, “the extinction probabilities do not depend on tree structure of the surviving lineages, only on time“. Indeed, all lineages in the same diversification-rate category at time t must have the same extinction probability, regardless of the shape of the tree and the presence of diversification-rate shifts elsewhere in the tree (Figure 3)!
BAMM computes extinction probabilities using a recursive algorithm that begins at the tips of the tree and computes the change in the extinction probability down each branch using a set of differential equations (see Moore et al., 2016; SI1.2 Likelihood function). A question arises when BAMM reaches an internal node: what should the initial extinction probability be for the branch immediately ancestral to the speciation event? Here is where the problem arises: BAMM (v.2.5) enables an option (combineExtinctionAtNodes = if_different) that compares the extinction probabilities for each of the descendant branches at time t; if these probabilities are different, BAMM takes the product of the extinction probabilities as the initial extinction probability for the immediately ancestral branch; otherwise (if the extinction probabilities are the same), BAMM uses either one of the extinction probabilities as the initial extinction probability (note: no product is taken). These extinction probabilities for internal nodes are then carried down their ancestral branches, continuing all the way to the root of the tree. The description of how BAMM handles extinction probabilities directly follows from the BAMM website which also resembles the current version of the code.
We emphasize that this option is not mathematically valid: the extinction probability for a lineage arising right before a node certainly does not depend in any sensible way on the product of extinction probabilities of two lineages that arise immediately after that node (c.f., Maddison et al., 2007, and Figure 3, above), especially only when the extinction probabilities for those two lineages are different! (e.g., why take the product in some situations and not others?). Taking the product of these extinction probabilities violates the statistical principles underpinning model-based inference. Thus, all benefits normally associated with model-based approaches (consistency, efficiency) do not apply. We note that the correct procedure, which is not implemented in BAMM, is to take the extinction probability from the lineage in the same diversification process as the ancestral node, since the extinction probability for a given diversification-rate category depends solely on time.
Taking products of extinction probabilities (which are always smaller than 1) results in increasingly smaller extinction probabilities. Since BAMM only takes the product of these extinction probabilities when they are different, and these probabilities will only be different when there are rate shifts, this incorrect procedure imposes an unpredictable change to the probabilities of models with diversification-rate shifts. Thus, the new option if_different in BAMM (v.2.5) implicitly penalizes rate-shifts more heavily. For example, consider a model that has no rate shifts and the extinction probabilities for both lineages descending from a node are 0.2. In this case the extinction probability at the ancestral node should be 0.2 because the extinction probabilities for either lineage are equal. Now imagine that there is a rate shift on one of the daughter lineages so that the extinction probability for that lineage is now of 0.15. Under the new option in BAMM, we would multiply the extinction probabilities to compute the extinction probability at the ancestral node, which would be approximately 0.03 in this case (which is clearly different from 0.2).
To demonstrate the impact of this bug, we compare the results of analyses with (orange) and without (blue) the option on our constant-rate birth-death simulated trees below (Figure 4). As we describe in our PNAS paper, we centered the diversification-rate priors on their true values; therefore, these results represent a best-case scenario for inferring the correct number of diversification-rate shifts.
This phenomenon is not restricted to simulated phylogenies; we analyzed the cetacean phylogeny with and without the if_different option and observe the same general pattern (Figure 5).
We note that the results for our if_different (orange lines above) are effectively identical to those reported by Rabosky (BAMM website), while our random analyses (blue lines) correspond to results presented in our PNAS paper.
In this post, we have described the analyses and main results presented in Moore et al. (2016) demonstrating that the posterior distribution of the number of diversification-rate shifts inferred by BAMM is extremely sensitive to the assumed prior distribution on the number of diversification-rate shifts. We demonstrate how Rabosky’s (2014) conclusion that BAMM is prior insensitive is an artifact of using the posterior mode (MAP) to summarize the inferred number of diversification-rate shifts. We clarify that our results (Moore et al., 2016) are based on the most recent version of BAMM (v.2.5) but avoided use of the combineExtinctionAtNodes = if_different option because we view this as a software bug, which would not provide a fair evaluation of the behavior of BAMM. This option for computing extinction probabilities at nodes (taking the product if the probabilities are different) is theoretically invalid, leads to pathological behavior, and masks prior sensitivity of BAMM. Furthermore, we show that the use of the if_different option caused the discrepancy between our results (Moore et al., 2016) and a recent blogpost by Rabosky (2016), which concealed the strong prior sensitivity of BAMM. Although the details are somewhat technical, we believe that it is important to our community to understand the limits of existing methods for inferring diversification-rate shifts, and we hope that you will join the discussion via the comments section below. In a subsequent post we plan to elaborate on the issue of diversification-rate parameter estimates under BAMM.
All of the phylogenetic data and BAMM config files used for this blog post are publicly available as a BitBucket repository.
Höhna, S, MJ Landis, TA Heath, B Boussau, N Lartillot, BR Moore, JP Huelsenbeck, and F Ronquist. 2016. RevBayes: Bayesian Phylogenetic Inference Using Graphical Models and an Interactive Model Specification Language. Systematic Biology, 65:726–736.
Huelsenbeck JP, B Larget, and DL Swofford. 2000. A compound Poisson process for relaxing the molecular clock. Genetics, 154:1879–1892.
Kendall, DG. 1948. On the generalized “birth-and-death” process. The Annals of Mathematical Statistics, 19:1–15.
Maddison W, P Midford, and S Otto. 2007. Estimating a binary character’s effect on speciation and extinction. Systematic Biology, 56:701.
Moore, BR, S Höhna, MR May, B Rannala, and JP Huelsenbeck. 2016. Critically evaluating the theory and performance of Bayesian analysis of macroevolutionary mixtures. Proceedings of the National Academy of Sciences, USA, (in press).
Nee, S, RM May, and PH Harvey. 1994. The reconstructed evolutionary process. Philosophical Transactions: Biological Sciences, 344:305–311.
Rabosky DL. 2014. Automatic detection of key innovations, rate shifts, and diversity-dependence on phylogenetic trees. PLoS One, 9:e89543.
Rabosky DL. 2016. Is the posterior on the number of shifts overly sensitive to the prior? http://bamm-project.org/prior.html
Rabosky, DL, M Grundler, C Anderson, JJ Shi, JW Brown, H Huang, JG Larson, et al. 2014. BAMMtools: an R package for the analysis of evolutionary dynamics on phylogenetic trees. Methods in Ecology and Evolution, 5:701–707.
Rannala B. 2002. Identifiability of parameters in MCMC Bayesian inference of phylogeny. Systematic Biology, 51:754–760.
Steeman, ME, MB Hebsgaard, RE Fordyce, SY Ho, DL Rabosky, R Nielsen, C Rahbek, H Glenner, MV Sørensen, and E Willerslev. 2009. Radiation of extant cetaceans driven by restructuring of the oceans. Systematic Biology, 58:573–585.
Every spring for the last sixteen years, we have held the Bodega Applied
Phylogenetic workshop for graduate students interested in applying
phylogenetic methods to diverse topics in biology. For logistical reasons,
it is necessary for the course to take a sabbatical in the spring of
2016. The Bodega course return in the spiring of 2017.
We look forward to seeing you next year!
Brian Moore & Peter Wainwright
Organizers
Hey Treethinkers!
Just a quick update on some recent work—by
the marvelous Mike May and sensational Sebastian Höhna—that we’re very excited about in the Moore lab.
First, we have a paper in review that describes a new Bayesian approach for detecting mass-extinction events. Briefly, this is a novel method for detecting mass-extinction events from phylogenies estimated from molecular sequence data. We develop our approach in a Bayesian statistical framework, which enables us to harness prior information on the frequency and magnitude of mass-extinction events. The approach is based on an episodic stochastic-branching process model in which rates of speciation and extinction are constant between rate-shift events. We model three types of events: (1) instantaneous tree-wide shifts in speciation rate; (2) instantaneous tree-wide shifts in extinction rate, and; (3) instantaneous tree-wide mass-extinction events.
Each of the events is described by a separate compound Poisson process (CPP) model,
where the waiting times between each event are exponentially distributed with event-specific rate parameters. The magnitude of each event is drawn from an event-specific prior distribution. Parameters of the model are then estimated using a reversible-jump Markov chain Monte Carlo (rjMCMC) algorithm. We demonstrate via simulation that this method has substantial power to detect the number of mass-extinction events, provides unbiased estimates of the timing of mass-extinction events, while exhibiting an appropriate (i.e., below 5%) false discovery rate even in the case of background diversification rate variation. Finally, we provide an empirical application of this approach to conifers, which reveals that this group has experienced two major episodes of mass extinction. This new approach—the CPP on Mass Extinction Times (CoMET) model—provides an effective tool for identifying mass-extinction events from molecular phylogenies, even when the history of those groups includes more prosaic temporal variation in diversification rate.
This paper is available from the bioRxiv here.
We’ve also submitted an application note for our new R package, TESS 2.0, a Bayesian software package implementing the CoMET model and many other tasty methods for inferring rates of lineage diversification. Briefly, TESS implements statistical approaches for estimating rates of lineage diversification (speciation — extinction) from phylogentic trees. The program provides a flexible Bayesian framework for specifying an effectively infinite array of diversification models—where diversification rates are constant, vary continuously, or change episodically through time—and implements numerical methods to estimate parameters of these models from molecular phylogenies.
We provide robust Bayesian methods for assessing the relative fit of these models of lineage diversification to a given study tree–-e.g., where stepping-stone simulation is used to estimate the marginal likelihoods of competing models, which can then be compared using Bayes factors. We also provide Bayesian methods for evaluating the absolute fit of these branching-process models to a given study tree—i.e., where posterior-predictive simulation is used to assess the ability of a candidate model to generate the observed phylogenetic data.
This paper is available from the bioRxiv here.
Finally, all this good stuff is implemented in the newly released TESS 2.0 R package (including the source code, comprehensive user manual, and example files) is available from CRAN here.
Following up on my previous post, here is the list of ‘Must Read’ papers in phylogenetics that were suggested on Twitter. I think that this is a great start, even though it is missing some classics and some important topics (divergence time estimation, for example). Thanks to everyone for chipping in with their thoughts and thanks again to Matt Hahn and Matt Pennell for getting the conversation started.
I apologize if I missed anyone’s contributions. Feel free to suggest additions, either here in the comments or on twitter with the hashtag #mustreadphylo.
Bull, J. J., Huelsenbeck, J. P., Cunningham, C. W., Swofford, D. L., & Waddel, P. J. (1993). Partitioning and combining data in phylogenetic analysis. Systematic Biology, 42(3), 384–397.
Cavalli-Sforza, L. L., & Edwards, a W. F. (1967). Phylogenetic analysis. Models and estimation procedures. The American Journal of Human Genetics, 19, 233–257.
Edwards, S. V. (2009). Is a new and general theory of molecular systematics emerging? Evolution, 63, 1–19.
Felsenstein, J. (1973). Maximum likelihood and minimum-steps methods for estimating evolutionary trees from data on discrete characters. Systematic Biology, 22, 240–249.
Felsenstein, J. (1978). Cases in which parsimony or compatibility methods will be positively misleading. Systematic Zoology, 27, 401–410.
Felsenstein, J. (1981). Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution, 17, 368–376.
Felsenstein, J. (1985). Confidence limits on phylogenies: an approach using the bootstrap. Evolution, 39, 783–791.
Felsenstein, J. (1985). Phylogenies and the comparative method. American Naturalist, 125, 1–15.
Goldman, N. (1993). Statistical tests of models of DNA substitution. Journal of Molecular Evolution, 36, 182–198.
Hillis, D. M., & Bull, J. J. (1993). An Empirical Test of Bootstrapping as a Method for Assessing Confidence in Phylogenetic Analysis. Systematic Biology, 42, 182–192.
Holder, M., & Lewis, P. O. (2003). Phylogeny estimation: traditional and Bayesian approaches. Nature Reviews. Genetics, 4, 275–284.
Kumar, S., Filipski, A. J., Battistuzzi, F. U., Kosakovsky Pond, S. L., & Tamura, K. (2012). Statistics and truth in phylogenomics. Molecular Biology and Evolution, 29, 457–472.
Maddison, W. P. (1997). Gene Trees in Species Trees. Systematic Biology, 46, 523–536.
Pauling, L., & Zuckerkandl, E. (1963). Chemical paleogenetics. Acta Chem. Scand, 17, S9 – S16.
Sullivan, J., & Swofford, D. (1997). Are Guinea Pigs Rodents?? The Importance of Adequate Models in Molecular Phylogenetics. Journal of Mammalian Evolution, 4, 77–86.
This post is sparked by an ongoing conversation on twitter that was kicked off when Matthew Hahn and Matt Pennell got to talking about developing a list of papers that should be required reading for graduate students with an interest in phylogenetics. This a good question, and I can’t recall seeing one. I start teaching on phylogenetics in our graduate core course here at UH next week and the 2015 Bodega workshop is only a few weeks away, so I’m finding this to be a timely and useful conversation.
.@mwpennell Speaking of which, have you ever seen a list of papers that should be required reading for every phylogenetics student?
— Matthew Hahn (@3rdreviewer) February 12, 2015
@mwpennell Let's just start on twitter. I say Felsenstein 1973, 1978, 1981, and 1985.
— Matthew Hahn (@3rdreviewer) February 12, 2015
There are already several good suggestions from folks on twitter, including, well….most of Joe Felsenstein’s early phylogenetics papers and his book, Maddison’s 1997 paper and Edwards 2009 paper on gene tree conflicts, and Sullivan and Swofford’s 1997 paper on the importance of adequate models (of course, guinea pigs are also a noble beast deserving of study in their own right).
Please jump into the conversation on twitter with your suggestions, or leave them here in the comments. I’ll post an update with a bibliography in a few days. Thanks to Matt and Matt for bringing this up!
.@3rdreviewer @mwpennell Thinking Felsenstein '81 MLE paper and chp. 10 (on history) from his book as context, but interested in other ideas
— Bob Thomson (@rcthomson) February 12, 2015
.@mwpennell @rcthomson I'd have to go with Edwards 2009 for a recent paper that touches on some of the same topics.
— Matthew Hahn (@3rdreviewer) February 13, 2015
Applications are now being accepted for the 2015 Workshop in Applied Phylogenetics. This year’s workshop will run from March 7 to 14 at the Bodega Bay Marine Lab on the northern California Coast. The application deadline is January 10th. See the 2015 workshop page for more information and instructions to apply.