Primary Contact: Jeremy Brown
- 1 PuMA & AMP Tutorial (3.9.14)
- 2 Introduction and Theory
- 3 Tutorials
- 3.1 Model Selection in jModelTest2
- 3.2 Assessing Model Adequacy with PuMA
- 4 Related Software
- 5 References and Further Reading
PuMA & AMP Tutorial (3.9.14)
Introduction and Theory
Models of the DNA substitution process are fundamental to statistical phylogenetic inference. They can loosely be thought of as interpreters of the phylogenetic information in our sequence data. In the same way that a language translator receives information in one language and translates that information into a language that is understandable to others, a model of sequence evolution translates the information in a set of sequences into phylogenetic information that can be directly interpreted by a biologist. However, each model that we might use for inference makes a different set of assumptions about how DNA evolves, resulting in differences of interpretation about the phylogenetic information contained in the sequence. There are two closely related questions that then arise:
- Which model is most appropriate for analyzing my set of DNA sequences?
- How do I know that the model I’ve chosen will give me trustworthy results?
Answering the first question is known as model selection and answering the second question is known as assessing model plausibility (or model adequacy).
“Essentially, all models are wrong, but some are useful.” – George E.P. Box
The use of statistical approaches to select an appropriate model of sequence evolution for phylogenetic inference is well-established and built on a robust literature (Sullivan and Joyce, 2005). The more complex a model becomes, the better it will fit to the data. However, this fit comes with a cost. More complex models require the estimation of more parameters, while the finite sample of data contains the same amount of information. The use of overly simplistic models of evolution that ignore fundamentally important processes occurring in nature can result in systematic error (bias) in the results, while the use of overly complex models can result in stochastic error (inflated variance). All model selection methods try to find “a best approximating model” (Burnham and Anderson 1998) that balances these two competing sources of error (Fig. 1), but they are often based on different theoretical foundations. Some of these theoretical foundations (with their associated model selection criteria) are:
- Frequentist Hypothesis Testing (Likelihood Ratio Test – LRT)
- Information Theory (Akaike’s “an Information Criterion” – AIC)
- Decision Theory (Decision Theoretic Risk – DT or R)
- Mix of Bayesian motivation and information theory (Bayesian information criterion – BIC)
- Bayesian Model Selection (Bayes factors – BF)
- Bayesian Hypothesis Testing or Model Averaging (Posterior probabilities – PP)
For the sake of concision, we won’t discuss the derivations and theoretical bases of these various criteria here, but see the list of references below for much more information about their use in phylogenetics and beyond. One final point worth noting is that various simulation studies have shown that modest overparameterization generally results in smaller inferential error than modest underparameterization, at least in a Bayesian framework (e.g., Huelsenbeck and Rannala, 2004; Lemmon and Moriarty, 2004; Brown and Lemmon, 2007). (Note that this does NOT mean that severe overparameterization is better than modest underparameterization.) Therefore, if two models seem to fit roughly equally well according to one or more model selection criteria and model-averaged estimates can’t be obtained, it’s usually better to err on the side of complexity.
Model Plausibility (Adequacy)
“We do not like to ask, ‘Is our model true or false?’, since probability models in most data analyses will not be perfectly true. The more relevant question is, ‘Do the model’s deficiencies have a noticeable effect on the substantive inferences?’.” – Gelman et al., 2004, Bayesian Data Analysis
While model selection criteria aim to pick a model that best approximates natural processes out of a set of candidate models, they make no guarantee that the chosen model is sufficient to provide unbiased inferences. This is particularly true when all of the candidate models share a set of common assumptions. For instance, many models of nucleotide substitution assume that
- Sites are independent of one another
- Processes are homogeneous across a sequence
- Processes are homogeneous across a tree
- Codon structure is unimportant
- All genes share the same tree topology
If all of the candidate models assume these things, it’s impossible for a model selection criterion to tell that one or more of these assumptions is strongly violated in empirical data. It is, therefore, a good idea to ask if the chosen model is “good enough”. One intuitive, highly flexible, and very underutilized approach for answering this question is to perform predictive simulations. In a frequentist setting, this is often called parametric bootstrapping and in a Bayesian context it is called posterior predictive simulation. The logic of both approaches is simple. If the chosen model provides a sufficient approximation to the natural world, data simulated under the assumed model should “look” very similar to the empirical data. The only difference between the frequentist and Bayesian approaches is that parametric bootstrapping involves simulating many datasets from the ML estimate of the tree and model parameters, while posterior predictive simulation draws trees and parameter values from the posterior distribution for simulation.
Key to the success of predictive simulation for assessing model adequacy is the way in which the “look” of a dataset is quantified. An almost infinite number of potential test statistics exist for assessing “look”, but relatively little work has gone into their development or testing in phylogenetics. Here are a few proposals:
- The multinomial likelihood (Goldman, 1993a; Huelsenbeck et al., 2001; Bollback, 2002)
- Chi-squared statistic for homogeneity of nucleotide frequencies (Huelsenbeck et al., 2001)
- Frequency of invariant sites (Goldman, 1993b)
- Number of unique site patterns (Goldman, 1993b)
- Number of “parallel” sites (Goldman, 1993b)
Unfortunately, such statistics are rarely used to assess model adequacy for empirical analyses. Figure 2 shows model plausibility assessment using the multinomial likelihood test statistic for a classic example of an inadequate model producing a misleading phylogenetic estimate (D’Erchia et al., 1996; Sullivan and Swofford, 1997). If the authors of the original study had used predictive assessment of model adequacy, they might have treated their results with caution. Lastly, it’s important to keep in mind that rejecting the plausibility of a model based on predictive simulation does not guarantee that the phylogenetic estimate is wrong. Similarly, not rejecting the plausibility of a model does not guarantee that the phylogenetic estimate is correct. Predictive simulation only directly tells us whether or not our models can recapitulate certain features of the data. However, if two groups of studies (A and B) support different tree topologies and the analyses in group A consistently fail model plausibility tests while those in group B do not, it would seem prudent to treat the results in A with more skepticism than those in B.
These tutorials use one example data file: primates.nex.
Model Selection in jModelTest2
There are many different model selection frameworks, criteria, and software packages that one could use. Here, we will focus on one popular and free software package for model selection, jModelTest2, which implements several different model selection criteria employing maximum likelihood (ML) scores. Caveats: Note that not all models one might wish to use can be set up in jModelTest2. With a little more effort, you can perform manual model testing as long as you can get an ML score for all of the models to be tested. Also, other model selection frameworks might be more appropriate for Bayesian analyses (notably the use of Bayes factors or reversible model jumping). More detail on jModelTest2 can be found on the program’s wiki.
jModelTest2 can be downloaded from: http://code.google.com/p/jmodeltest2/. After download, unzip the archive. Details on the theory behind the model selection criteria employed by jModelTest2 and instructions on running the program can be found on the jModelTest2 wiki: http://code.google.com/p/jmodeltest2/wiki/Index.
jModelTest2 is written in Java, so it requires Java to be previously installed on your system. Java will come pre-installed on any standard Mac or PC and is easily obtained for *nix systems. Since Java is platform-independent, jModelTest should look and run similarly across different platforms. To start jModelTest2 open a Terminal window, navigate to the jmodeltest-2.1.2 folder, and type
java -jar jModelTest.jar
After executing the file, the jModelTest2 console window should appear. Note that the jModelTest2 .jar file can only be run out of the jmodeltest-2.1.2 folder.
Download the primates.nex file linked at the top of this tutorial. This is a NEXUS file containing mtDNA sequences for 12 primate species (Hayasaka et al., 1988). Click on the File menu at the top of the console window, then select Load DNA alignment. jModelTest2 should print out some information about your dataset in the console and let you know that it was successfully loaded. For our primates data set, it should load 12 sequences and 898 sites.
Getting ML Scores
All of the model selection criteria in jModelTest2 use ML scores to compare models. Therefore, you need to begin by obtaining these scores for all the models that you would like to compare. jModelTest2 uses PhyML to calculate likelihoods (the original Modeltest program used PAUP*).
To calculate ML scores, click on the Analysis menu in the console window, and then select Compute likelihood scores. A window should appear with various options for calculating likelihoods and setting the number of processors you want jModelTest2 to use. Here is a brief explanation of the likelihood calculation settings:
Number of substitution schemes – A substitution scheme specifies how many parameters are used to describe the relative propensities of one base to be substituted by another (sometimes called exchangeabilities). The simplest scheme specifies that all substitution types are equally probable. The most complex scheme allows all 6 substitution types to have a different rate. There are many schemes of intermediate complexity with different numbers of unique rate parameters. If “3” is selected for this option, only the simplest, most complicated, and one intermediate (different rates of transition and transversion) are considered. If “203” is selected, all possible substitution schemes are compared (203 is also known as the 6th Bell number).
Base frequencies – Selecting the box labeled “+F” will include models that estimate the equilibrium frequencies of the four DNA nucleotides. If this box is not checked, only models assuming equal nucleotide frequencies will be considered.
Rate variation – Selecting the box labeled “+I” will include models that estimate a proportion of invariable sites. Similarly, selecting the box that is labeled as “+G” will include models that use a gamma distribution to estimate the distribution of rates across sites. The “nCat” value specifies the number of discrete categories used to approximate the gamma distribution. Not selecting either box will only compare models that assume equal rates across sites (not advisable for any empirical data).
Base tree for likelihood calculations – This option allows the user to specify the topology used when calculating model likelihoods. Previous work has suggested that the tree used for likelihood calculations when comparing models need only be a reasonable approximation of the true tree. Therefore, it might be acceptable to use a fast algorithm such as neighbor-joining to obtain this tree. jModelTest2 includes two options for specifying fixed tree topologies (i.e., the same tree topology is used when scoring each model). The first, “Fixed BIONJ-JC”, will estimate the tree with the BIONJ algorithm using distances calculated from the JC model. The second, “Fixed user topology”, allows you to load a tree from file. Note that LRTs require the comparison of nested models and different bifurcating tree topologies are not nested. Therefore, a fixed tree topology must always be used for calculating likelihood scores if LRTs will be used for model selection. jModelTest2 should (and does) prohibit you from doing this. The other two options allow the tree to vary across models. “BIONJ” will use the BIONJ algorithm to estimate separate topologies for each model, using distances calculated with that model. Selecting “ML optimized” will use PhyML to do a tree search for the maximum likelihood topology separately for each model.
As long as scoring the models does not take a prohibitively long time, it is generally advisable to compare many different models. For the primates data set, select an intermediate diversity of models (11 substitution schemes, include models with non-equal base frequencies, and include I and G rate variation models). Select the “Fixed BIONJ-JC” tree. Click the “Compute Likelihoods” button. A progress box will update you on the number of models that have been scored and show the progress for each thread (if multiple threads allowed). On my machine, scoring these 88 models for primates.nex takes about 47 seconds. Run times will vary across computers.
Model Selection – AIC
To compare models using Akaike’s an Information Criterion (AIC; Akaike, 1974), first select the Analysis menu from the console window, then select Do AIC calculations…. These calculations should take far less time than the calculation of the likelihood scores. When finished, jModelTest2 will display the results in the console window. At any time after the calculation of one (or more) model selection criteria, the summarized results can be displayed by selecting the Results menu from the console window, and then selecting Show results table. Roughly speaking, the AIC score of a model measures the relative amount of information lost when that model is used to approximate reality. Therefore, the model with the minimum AIC score is to be preferred. The relative plausibility of different models can be calculated from the difference in AIC scores across models and lead to Akaike weights. Beginning with the model with highest weight (lowest AIC score), the plausible set of models can be found by successively adding the next highest weight until a cumulative weight of 0.95 is reached. The AICc correction can also be selected before performing AIC calculations. This correction is typically applied when samples sizes are small relative to the complexity of the model, but should converge on standard AIC scores as dataset sizes increase. Therefore, there seems to be little reason not to use AICc. Model selection criteria based on information or decision theory (AIC, BIC, and DT) do not require the comparison of nested models, as do the various strategies employing likelihood ratio tests (LRTs).
Model Selection – BIC
To compare models using the Bayesian Information Criterion (BIC; Schwarz, 1978), first select the Analysis menu from the console window, then select Do BIC calculations…. Results are immediately displayed in the console window and can be summarized by selecting the “Results” menu, and then selecting “Show results table”.
While the BIC stems from a different theoretical framework, it is closely allied with AIC. BIC scores and weights are interpreted similarly to their AIC counterparts. However, BIC generally penalizes complex models more harshly and often end up favoring simpler models.
Model Selection – DT
Decision theory is based upon the idea that one should minimize risk when making a decision. Specifically applied to phylogenetic model selection, if we have a set of possible models from which to choose, where each model has an associated probability of being ‘true’ and some penalty is assessed for choosing that model when it is not true, the risk associated with choosing that model is the expected cost if that model is not true times the probability that it is not true. Many possible cost (or penalty) functions exist. Minin et al. (2003) introduced decision-theoretic model choice to phylogenetics and suggested a cost function based on the degree of branch length misestimation. Much more detail on this approach can be found in their paper. See also Abdo et al. (2005).
To compare models using Minin et al.’s DT approach, first select the “Analysis” menu, then select “Do DT calculations…”. As for other criteria, results are immediately displayed in the console window and can be summarized by selecting the “Results” menu, followed by “Show results table”.
Note that there is concern about the interpretation of the DT weights (although not scores) as calculated by jModelTest2 (see jModelTest2 manual). Therefore, DT weights should be interpreted with caution.
Model Selection – LRT
Historically, likelihood ratio tests (LRTs) have frequently been used to select models. However, concern has been expressed about this approach (see Posada and Buckley (2004) as well as Burnham and Anderson (1998)). LRTs rely on a frequentist model testing approach. As it turns out, if data have been generated by Model A (a simpler model) and are analyzed with both Model A and Model B (a more complex model within which model A is nested), the expected distribution of twice the log-likelihood ratio
LRT = 2*ln[L(model B)/L(model A)] = 2*(ln[L(model B)] - ln[L(model A)])
calculated using the data follows a chi-squared distribution with degrees of freedom equal to the number of free parameters that differ between models*. In other words, model B will always have a higher maximum likelihood because it’s more complex. However, we can use this likelihood ratio test to see if the improvement in likelihood is great enough to indicate that B really captures something about the data that A does not.
To compare models using LRT’s in jModelTest2, select the “Analysis” menu, then select “Do hLRT calculations…”. An options dialog box will appear. The “Confidence level LRT” lets you set the preferred alpha for your LRT to reject a more complex model. The other available options allow you to specify the hierarchy in which your models are compared. For instance, if starting from the simplest model (forward selection), which parameters would you like to test first? They can be ordered in the “Hypotheses order” box. Note that there are some constraints on this order to keep the tests nested, but jModelTest2 will tell you when you try and run tests in a non-nested manner. You can also start from the most complex model and move towards the simple (backward selection). The order of tests can also be determined dynamically (dLRTs). Under this strategy, the next test performed is the test that provides the largest increase (forward selection) or smallest decrease (backward selection) in likelihood (among model comparisons involving a single hypothesis).
Two important caveats to note about LRTs: (1) Only nested models can be compared. (2) If multiple tests are conducted to compare a range of models, there are usually multiple strategies for the sequence in which LRTs are conducted. Different sequences of tests can lead to different preferred models and it is unclear which strategy for conducting tests is to be preferred.
* The expected chi-squared distribution is different when the simpler model involves fixing a parameter from the more complex model at a boundary. In this case, the expected distribution is a mix of chi-squared distributions.
Model Selection for Primate Data
Following the above directions, load the file primates.nex into jModelTest2, calculate likelihood scores for 88 models (11 substitution schemes, unequal nucleotide frequencies, +I, and +G) on a fixed BIONJ-JC tree, and then calculate all model selection criteria (AIC, AICc, BIC, DT, and hLRT [alpha level:0.01, forward selection, default hypothesis ordering]). Look at the results summary window. Examine the models and compare likelihood scores. Which model is guaranteed to have the highest likelihood? If all calculations worked correctly, the following models should have been selected: AIC: TrN+G AICc: TrN+G BIC: HKY+G DT: HKY+G hLRT: HKY How do the chosen models differ? For reference, see the figure below from Swofford et al. (1996) that outlines the differences between models. For the information critera (AIC, AICc, and BIC), what are the weights of the model chosen by that criterion versus other criteria? Why did hLRT not choose a model with gamma-distributed rate variation? Which model would you use? Why?
Assessing Model Adequacy with PuMA
The assessment of model adequacy has been relatively unexplored in phylogenetics. PuMA is software written in Java to perform posterior predictive assessment of partitioned (and unpartitioned) model adequacy based on the multinomial likelihood test statistic. The development of alternative test statistics for assessing model adequacy is an active area of research.
Posterior predictive assessment of model adequacy requires that Bayesian analysis has already been performed. In this case, I will use output files from analyses in MrBayes 3.1.2 assuming either an HKY+G or JC model of sequence evolution. For details on performing Bayesian analyses, see the MrBayes tutorial. Example output files can be downloaded above.
The PuMA package can be downloaded from http://code.google.com/p/phylo-puma/. It contains PuMA (executable and source code), seq-gen (executable and source code), example files, and a manual. PuMA relies on seq-gen to simulate data, so please make sure that seq-gen runs and is in the same folder as PuMA when running an analysis. To see if it runs, open a Terminal window, navigate to the folder containing seq-gen and type
If seq-gen runs properly, it should spit out a statement about proper program usage.
If the seq-gen executable downloaded with PuMA does not run, you likely need to recompile it from source. If so, simply open a Terminal window, navigate to the seq-gen source folder, type
to remove the compiled files already present, and then type
A new version of seq-gen should then appear. If this doesn’t work and you’re using a Mac, make sure you’ve downloaded the Developer Tools from Apple (which contains compilers). Unfortunately, PuMA is not set up to run on PC’s.
Place the PuMAv0.906 application (the one with the puma image) and seq-gen in each of the folders containing MrBayes output files that you’ve downloaded. PuMAv0.906 can be started simply by double-clicking on the icon.
If double-clicking the application doesn’t work, replace the application file with PuMAv0.906.jar. Open a Terminal window, navigate to the folder, and start PuMA by typing
java -jar PuMAv0.906.jar
Set the directory containing your MrBayes output files by clicking the button and selecting any one of your .p or .t files. Note that PuMA will useall .p and .t files in this folder for analysis, so only include files that you wish to use. Type a filename for PuMA output. This file will contain the multinomial likelihoods for each simulated dataset, the multinomial likelihood for the empirical dataset, and the estimated posterior predictive p-value.
Bayesian Analysis Options
PuMA can perform posterior predictive simulation using output files from either MrBayes or BayesPhylogenies. In this case, select the radio button next to “MrBayes”.
These analyses have been performed using models that assume homogeneity across all sites in the data set, so select the radio button next to “Single Partition” as the partitioning scheme. PuMA can also take output files from partitioned models (see the manual for details).
We will not be using the program MrConverge here, so a manual burn-in will need to be entered. Note that this burn-in is interpreted as the number of samples to discard, not the number of generations to ignore. A burn-in value of 100 is appropriate for both of these analyses. Burn-in values should be determined by looking for concordance of sampling (both of model parameter values and tree topologies) across runs.
After selecting all of the appropriate options, simply click the “Submit” button and wait. The status bar at the bottom of the PuMA window will update you as PuMA progresses. Some of the analysis steps (e.g., simulation of data sets) can take several minutes. New folders and files will be created in the folder containing your analysis files while PuMA runs. Once the run has finished, you will be notified in the status bar.
Interpreting and Visualizing Output
Once the run is complete, an output file (as you named it) will appear in your folder. This file begins with a (usually long) list of the simulated datasets and their corresponding multinomial likelihoods. Scroll to the bottom of the file. There you will find the multinomial likelihood for the empirical data set, as well as an associated posterior predictive p-value. This p-value is calculated as the percentage of multinomial likelihoods from simulated datasets that are larger than the multinomial likelihood from the empirical dataset. In an ideal situation (i.e., where our model is doing a very good job), this p-value should be close to 0.5. When the model or priors are flawed, this value will fall farther out in the tails of the simulated null distribution. In other words, the model is assessed as adequate when the empirical multinomial likelihood is a reasonable draw from the posterior distribution of multinomial likelihoods.
To visualize the null distribution, download the R script pumaHist.r from the PuMA website. Open the script in an R GUI or another text editor and change the file name for the multinomial likelihoods as appropriate. Change the R working directory to the folder containing your PuMA output file and execute the commands in the script. After seeing the initial histogram, you may need to alter the position of the arrow representing the empirical multinomial likelihood (the last line of the script). You will learn more about R during other sessions this week.
Primate Model Adequacy
If properly done, you should get the following posterior predictive p-values for analyses of primates.nex:
JC: 0.0 HKY+G: 0.965
The associated histograms look like this for JC (Top) and HKY+G (Bottom):
Clearly the empirical multinomial likelihood (shown by the arrow) is a more reasonable draw from HKY+G than from JC. Note, however, that the empirical value still falls in the tails of the HKY+G distribution, suggesting that aspects of the true process of sequence evolution important in determining the frequency of different site patterns are not fully captured even by HKY+G.
AMP is software …
DT-ModSel is software …
jModelTest is software that implements various model selection tests via a pipeline of other freely available software( PhyML,Consense, and ReadSeq). Note that executables for all of the software included in this pipeline are provided within the jModelTest download.
MAPPS is software …
MrModelTest is software …
PuMA is software …
References and Further Reading
Akaike, H. 1974. A new look at the statistical model identification. IEEE Transactions on Automatic Control 19:716-723.
Abdo, Z., V.N. Minin, P. Joyce, and J. Sullivan. 2005. Accounting for uncertainty in the tree topology has little effect on the decision-theoretic approach to model selection in phylogeny estimation. Molecular Biology and Evolution 22:691-703.
Bollback, J.P. 2002. Bayesian model adequacy and choice in phylogenetics. Molecular Biology and Evolution 19:1171-1180.
Bollback, J.P. 2005. Posterior mapping and predictive distributions. In Statistical methods in molecular evolution (Nielsen, R. Ed.) Springer, New York.
Brown, J.M. and R. ElDabaje. 2009. PuMA: Bayesian analysis of partitioned (and unpartitioned) model adequacy. Bioinformatics 25:537-538.
Brown, J.M. and A.R. Lemmon. 2007. The importance of data partitioning and the utility of Bayes factors in Bayesian phylogenetics. Systematic Biology 56:643-655.
Burnham, K.P. and D.R. Anderson. 1998. Model selection and inference: a practical information-theoretic approach. Springer, New York.
D’Erchia, A.M., C. Gissi, G. Pesole, C. Saccone, and U. Arnason. 1996. The guinea-pig is not a rodent. Nature 381:597-600.
Gelman, A., J.B. Carlin, H.S. Stern, and D.B. Rubin. 2004. Ch. 6 – Model checking and improvement in Bayesian Data Analysis, Second Edition. Chapman and Hall, Boca Raton.
Goldman, N. 1993a. Statistical tests of models of DNA substitution. Journal of Molecular Evolution 36:182-198.
Goldman, N. 1993b. Simple diagnostic statistical tests of models for DNA substitution. Journal of Molecular Evolution 37:650-661.
Hayasaka, K., T. Gojobori, and S. Horai. 1988. Molecular phylogeny and evolution of primate mitochondrial DNA. Molecular Biology and Evolution5:626-644.
Huelsenbeck, J.P. and B. Rannala. 2004. Frequentist properties of Bayesian posterior probabilities of phylogenetic trees under simple and complex substitution models. Systematic Biology 53:904-913.
Johnson, J.B. and K.S. Omland. 2003. Model selection in ecology and evolution. Trends in Ecology and Evolution 19:101-108.
Lemmon, A.R. and E.C. Moriarty. 2004. The importance of proper model assumption in Bayesian phylogenetics. Systematic Biology 53:265-277.
Minin, V., Z. Abdo, P. Joyce, and J. Sullivan. 2003. Performance-based selection of likelihood models for phylogeny estimation. Systematic Biology 52:674-683.
Posada, D. 2008. jModelTest: Phylogenetic Model Averaging. Molecular Biology and Evolution 25:1253-1256.
Posada, D. and T.R. Buckley. 2004. Model selection and model averaging in phylogenetics: advantages of Akaike Information Criterion and Bayesian approaches over likelihood ratio tests. Systematic Biology 53:793-808.
Ripplinger, J. and J. Sullivan. 2008. Does choice in model selection affect affect maximum likelihood analysis? Systematic Biology 57:76-85.
Ripplinger, J. and J. Sullivan. 2010. Assessment of substitution model adequacy using frequentist and Bayesian methods. Molecular Biology and Evolution 27:2790-2803.
Schwarz, G. 1978. Estimating the dimension of a model. Annals of Statistics 6:461-464.
Sullivan, J. and P. Joyce. 2005. Model selection in phylogenetics. Annual Review of Ecology, Evolution, and Systematics 36:445-466.
Sullivan, J. and D.L. Swofford. 1997. Are guinea pigs rodents? The importance of adequate models in molecular phylogenetics. Journal of Mammalian Evolution 4:77-86.
Swofford, D.L., G.J. Olsen, P.J. Waddell, and D.M. Hillis. 1996. Ch. 11 – Phylogenetic inference in Molecular Systematics, Second Edition (D.M. Hillis, C. Moritz, and B.K. Mable, Eds.), Sinauer, Sunderland.