Points of View

Syst. Biol. 63(3):436–441, 2014 © The Author(s) 2013. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: [email protected] DOI:10.1093/sysbio/syt103 Advance Access publication December 18, 2013

The Shape of Modern Tree Reconstruction Methods THÉRÈSE A. HOLTON1,2 , MARK WILKINSON3 , AND DAVIDE PISANI4,∗

Received 18 July 2013; reviews returned 22 August 2013; accepted 2 December 2013 Associate Editor: Susanne Renner

One characteristic of inferred phylogenetic trees is their shape. Rooted binary trees are more balanced, or symmetric, to the extent that sister groups contain similar number of leaves. Premised on the idea that macroevolutionary processes may leave a strong signature in the shape of phylogenetic trees, inferred tree shapes can be compared with expectations under probabilistic models of speciation and extinction in an attempt to make macroevolutionary inferences (e.g., Harvey and Purvis 1991; Guyer and Slowinski 1993; Kirpatrick and Slatkin 1993; Mooers and Heard 1997; Bortolussi et al. 2006). Colless (1982) first suggested that distinct tree reconstruction methods (TRMs) may differ in their tendencies to produce more or less balanced trees. Thus, a concern with using inferred tree balance in macroevolutionary studies is the possibility that patterns of tree balance might depend upon the particular TRM employed (Heard 1992). Putative methodological biases in inferred tree shape have been investigated in several studies. Heard (1992) found no tree-shape biases in a collection of empirical trees inferred using different TRMs (see also Stam 2002), whereas Colless (1995) presented evidence to support his contention that cladistic trees tended to be more unbalanced than phenetic ones. Huelsenbeck and Kirkpatrick (1996) found with simulations that all investigated TRMs were biased toward more unbalanced trees and that the bias was strongest with maximum likelihood (ML). Although there has been some investigation and demonstration of tree-shape biases in supertree methods (Wilkinson et al. 2005; Kupczok 2011), and recent studies (Vinh et al. 2011; Zhu and Steel 2013) that have addressed the different distributions of tree shapes generated from random data by parsimony and quartet puzzling (QP), there have been no comparative studies of the properties of the commonly used TRMs with non-random data

since the work of Huelsenbeck and Kirkpatrick (1996). This is despite considerable changes and advances, such as the advent of Bayesian methods. Rather, recent studies of tree shape, through the use of null models of tree balance, have focused on whether biological factors can be construed to be causing inferred trees to be more imbalanced than expected in the absence of adaptive radiation (e.g., Harcourt-Brown et al. 2001; Blum and Francois 2006). Here, we use a genomic-scale data set of protein families to investigate whether the TRMs most frequently used in modern phylogenetics differ in their propensities to produce more or less balanced topologies. Significant differences would be prima facie evidence that at least some of the methods are biased with respect to tree shape.

MATERIALS AND METHODS Our empirical investigations employed the single protein families from a genomic-scale data set of 2216 protein alignments previously used to generate a metazoan supertree (Holton and Pisani 2010). The amino acid (AA) alignments in this data set have each been previously shown to convey significant clustering information. In addition, these alignments support a species phylogeny (under both supertree and supermatrix analyses) that most zoologists would deem to be valid: non-phylogenetic signals in the data do not seem particularly strong. Because these data were assembled for a different purpose there should be no tree shape-associated bias in this sample, which is large enough to afford statistical power. We have no a priori reason to believe that the sample should not be representative of protein alignments more generally. To enhance statistical power, only trees with more than six taxa were used (Rogers 1994; Rogers 1996; Harcourt-Brown et al. 2001) and,

Downloaded from http://sysbio.oxfordjournals.org/ at Universidade de Vigo on May 16, 2014

1 Department of Biology, The National University of Ireland Maynooth, Maynooth, Co. Kildare, Ireland; 2 UCD Complex and Adaptive Systems Laboratory/UCD Conway Institute of Biomolecular and Biomedical Science/School of Medicine and Medical Science, University College Dublin, Dublin 4, Ireland; 3 Department of Life Sciences, The Natural History Museum, London SW7 5BD, UK; and 4 School of Earth Sciences and School of Biological Sciences, The University of Bristol, Woodland Road, Bristol BS8 1UG, UK ∗ Correspondence to be sent to: School of Biological Sciences and School of Earth Sciences, University of Bristol, Woodland Road, Bristol BS8 1UG; E-mail: [email protected]

436

[12:34 7/4/2014 Sysbio-syt103.tex]

Page: 436

436–441

2014

POINTS OF VIEW

[12:34 7/4/2014 Sysbio-syt103.tex]

QP analyses were performed using TREE-PUZZLE 5.1 (Schmidt et al. 2002), under the default, auto-selected model with a gamma correction (four rate categories), and the alpha parameter and the AA frequencies estimated from the data. Because this method tended to produce polytomous trees, tests of tree-shape bias including QP had to use a reduced sample of 126 trees. Trees derived from each TRM using the same alignment were equivalently rooted. Since defining outgroups with multiple species could have lead to issues of outgroup monophyly/non-monophyly when different TRMs found different trees, we always used one single taxon as the outgroup. Where possible, we attempted to root trees in a biologically meaningful way. Accordingly, in the main set of analyses Nematostella vectensis (31 protein families) or in its absence, Trichoplax adherens (15 protein families), was used as a single outgroup. In their joint absence (205 protein families), a biologically meaningful one-taxon outgroup could not be defined, as the data set included multiple protostomes and deuterostomes. Therefore, in such cases, the trees were rooted on the first species in the sequence alignment. This rooting strategy should not introduce any bias because we expect shape differences between equivalently rooted trees to reflect shape differences in the corresponding unrooted trees. To test this, we performed a final analysis where all trees were rooted on the first species in the sequence alignment irrespective of the presence of N. vectensis or T. adherens. When rooting the trees the outgroup taxon was not deleted from the resultant rooted tree. This is relevant when comparing the imbalance of real trees against that of trees sampled from a theoretical Yule distribution because, as pointed out to us by one of the anonymous reviewers, if rooted Yule trees are, like the empirical trees in this study, arbitrarily re-rooted on a single leaf, a new distribution of more imbalanced trees is generated. To maintain comparability, simulated Yule trees need to be arbitrarily re-rooted on a single leaf prior to calculation of their balance. Accordingly, while in all our figures, plotted values for the imbalance of PDA trees have been obtained analytically in apTreeshape (Bortolussi et al. 2006), Yule values were obtained from a real sample of Yule trees, generated in apTreeshape (Bortolussi et al. 2006), arbitrarily re-rooted before balance estimation. The number of Yule trees generated and the distribution of their taxa was equivalent to that observed in our real data sets (Supplementary Fig. S1, available at http://dx.doi.org/10.5061/dryad.22d26). Hence the reader needs to consider the potential effect of sampling errors when contrasting the imbalance of trees obtained using real data against our Yule trees. Alternative approaches can be used to assess the shape of a tree (e.g., Colless 1982; McKenzie and Steel 2000; Aldous 2001). Here, tree shape was quantified using the Colless index (Colless 1982), which increases with tree asymmetry, and the cherry count (McKenzie and Steel 2000), which is positively correlated with balance. The Colless index is the sum of absolute values |L–R| of each internal node of a rooted tree, where L and R are the

Page: 437

Downloaded from http://sysbio.oxfordjournals.org/ at Universidade de Vigo on May 16, 2014

following Heard (1992), only binary trees (i.e., those lacking polytomies) were retained. These restrictions resulted in a sample of 251 suitable protein families for between 7 and 27 taxa (Supplementary Fig. S1, available at http://dx.doi.org/10.5061/dryad.22d26) that were used to test a core set of four TRMs: standard maximum likelihood (ML), maximum parsimony (MP), neighbor joining (NJ), and Bayesian inference (BI). In addition, tests were also performed that included Quartet Puzzling (QP) maximum-likelihood analyses, but using a reduced data set (see below). The balance of the trees obtained from these methods was also compared against two theoretical distributions: Yule (Yule 1924), and Proportional-to-DistinguishableArrangements (PDA; under which all tree topologies for any given number of leaves are equiprobable—Rosen 1978). MP and NJ trees were inferred using PAUP (Swofford 1998). MP analyses used heuristic searches with 100 random addition sequences and TBR branch swapping run to completion, or for up to 6 h, retaining all equally optimal trees. NJ analyses used observed P-distances. Distances for the NJ analyses were not calculated using a substitution model, which is likely to have had a negative impact on the accuracy of the trees built using this approach. However, this was done deliberately to explicitly evaluate the extent to which (i) different levels of parametrization (from no-parametrization in NJ, to the use of a highly parameterized model in BI; see below), and (ii) the use of methods with greatly varying performance (from NJ to BI and ML) affected the imbalance of recovered trees. BI employed PhyloBayes (Lartillot and Philippe 2004), which, similarly to MrBayes (Ronquist et al. 2012), assigns equal prior probability to each labeled tree (i.e., it uses PDA as a prior on tree topologies). For each protein family two independent runs were performed under the LG + G model (Le and Gascuel 2008) with convergence determined using the automatic stopping rule of PhyloBayes. A burnin of 100 trees was discarded from each run prior to construction of a consensus tree using the bpcomp program. To investigate (and discount) the influence of model selection upon our results (see also above), BI was also applied to all protein families using the CAT + G (Lartillot and Philippe 2004) and JTT + G (Jones et al. 1992) models (see Supplementary Material for more details). ML trees were constructed using PhyML 3.0 (Guindon et al. 2010) and additionally by RAxML (Stamatakis et al. 2005), with AA model and number of rate categories determined by Modelgenerator (Keane et al. 2006) as per Holton and Pisani (2010). Further to this, ML analyses were also performed under the LG + G model in RAxML to test whether using the same model for all data sets, or the objectively selected (best fitting) model for each data set, resulted in the recovery of trees with different levels of imbalance (see Supplementary Material for more details). ML analyses always returned binary trees, enabling the comparison of PhyML and RAxML trees using an extended sample of 1000 protein families.

437

436–441

438 TABLE 1.

SYSTEMATIC BIOLOGY Comparisons of TRMs marked by an X

BI trees

Consensus Sample mean Sample mode Sample median Random selection

Most parsimonious trees Mean

Mode

Median

Random

X X

X

X

X

X X X

numbers of leaves in the left and right daughter clades of a node respectively. The cherry count is simply the number of nodes whose daughters are just a pair of leaves. The use of cherry counts in addition to Colless indices is important because cherry counts are minimally impacted by the position of the root and provide a good approximation of the balance of the unrooted topologies underlying our rooted trees. Colless indices were calculated using apTreeshape (Bortolussi et al. 2006), with subsequent normalization for tree size using the denominator (n−1)(n−2) (Heard 1992; confirmed by 2 Colless 1995), where n is equal to the number of taxa. Cherry counts were calculated in R with the APE package (Paradis et al. 2004). Where multiple optimal MP trees occurred, tree balance metrics were calculated for each tree, and their mean, median, mode, and the value of a randomly selected MP tree were used in comparative analyses. Similarly, BI analyses used the majority-rule consensus (of the trees sampled after convergence and excluding the burn-in), a randomly selected tree and the mean, median, and mode of the tree balance measures of a sample of one of every hundred individual trees (sampled from the trees saved from the mcmc chains after convergence). Using these variants we made 16 separate comparisons (Table 1). Balance metrics for the same data set analysed with different TRMs are not independent and, therefore, cannot be used to test for variation between methods using standard statistical approaches (Huelsenbeck and Kirkpatrick 1996). Thus, we used the Friedman test (Friedman 1937), implemented in R, to perform a nonparametric analysis of variance. The null hypothesis of this test is that all subjects have come from a population with the same median, essentially considering all treatments to have the same effect (Siegel and Castellan 1988). Friedman P-values were Bonferroni corrected to account for the 16 multiple comparisons (Table 1) such that significance at the 0.01 level corresponds to a Bonferroni corrected P-value of 0.0006. Where the Friedman test identified significant variance a post hoc Wilcoxon–Nemenyi–McDonaldThompson test (Hollander and Wolfe 1999) was implemented in R with the Coin package (Hothorn

[12:34 7/4/2014 Sysbio-syt103.tex]

et al. 2008) to determine which TRM methods had significant differences in variance. We used the Wilcoxon signed-rank test, implemented in R, to compare PhyML and RAxML tree measures. Here, we present primarily the results from comparisons of Colless indices using BI consensus trees and, where multiple optimal trees were encountered, the median balance of the MP trees. Parallel results from alternative comparisons are given in the Supplementary Material (available from Dyrad http://dx.doi.org/10.5061/dryad.22d26). We also compared methods through simulation by inferring trees from random binary (0,1), 4 states (DNA), and 20 states (AA) data. Methodological details of the simulation studies are reported in the Supplementary Material.

RESULTS Boxplots representing the Colless indices for each TRM are shown in Figure 1. Using Colless indices, the Friedman test reveals a significant difference in tree shape between the considered TRMs (df = 3, 2 = 40.6232, P = 7.86×10−9 ). This result is supported by all tests, irrespective of which descriptive statistic (i.e., mean, median, mode, or random value) is used to represent Colless indices in cases where there are multiple trees and when cherry counts are used instead of Colless indices (Supplementary Table S1 and Supplementary Fig. S2–S5, available at http://dx.doi.org/10.5061/dryad.22d26). Post hoc Wilcoxon–Nemenyi–McDonald-Thompson testing (Table 2) revealed significant differences

Downloaded from http://sysbio.oxfordjournals.org/ at Universidade de Vigo on May 16, 2014

Note: All comparisons used the single ML and NJ trees. Where analyses returned sets of trees (MP, BI) we compared the corresponding means, modes, medians, and randomly selected trees. Additionally, we compared all of these measures from parsimony trees with the BI consensus tree. Each of these eight comparisons was performed using normalized Colless indices and with cherry counts.

VOL. 63

FIGURE 1. Boxplot summary of normalized Colless indices for trees produced with four TRMs. Here, BI consensus trees are used and, where multiple optimal trees were encountered, the median of the Colless indices of the MP trees. The leftmost boxplot gives the distribution of Colless values expected under PDA and the rightmost column the distribution expected under the Yule model.

Page: 438

436–441

2014

POINTS OF VIEW

439

TABLE 2. Post hoc Wilcoxon–Nemenyi–McDonald-Thompson tests of Colless indices of trees produced by different TRMs TRMs

P-value (adj)

MP versus Bayesian NJ versus Bayesian ML versus Bayesian NJ versus MP ML versus MP ML versus NJ

0.678 0.992 7.57 × 10−6 0.497 4.04 × 10−8 3.61 × 10−5

[12:34 7/4/2014 Sysbio-syt103.tex]

FIGURE 2. Boxplot summary of normalized Colless indices for trees derived under two alternative ML software implementations.

Downloaded from http://sysbio.oxfordjournals.org/ at Universidade de Vigo on May 16, 2014

between ML and all other TRMs (reflected in all 16 data comparisons; Supplementary Table S2, available at http://dx.doi.org/10.5061/dryad.22d26) with ML yielding the most balanced trees as measured by the Colless index (Fig. 1; see also Supplementary Figs. S2 and S4, available at http://dx.doi.org/10.5061/dryad.22d26). All methods produce trees that are significantly more balanced than trees from a theoretical PDA distribution, while all methods, except ML, generate trees that are comparably balanced to Yule trees. Interestingly, ML generates trees that are significantly more balanced than Yule (Fig. 1). Comparisons of BI and ML trees produced using different models revealed no significant differences in terms of balance (Supplementary Tables S3 and S4, respectively, available at http://dx.doi.org/10.5061/dryad.22d26). QP applied to random data produced distributions that are similar to Yule (Vinh et al. 2011; Zhu and Steel 2013), whereas QP applied to real data appears similar to all TRMs except ML (which produces trees significantly more balanced than QP; Supplementary Fig. S6, available at http://dx.doi.org/10.5061/dryad.22d26). Supplementary Figure S7 shows that these results hold also when all trees are equivalently re-rooted using the first taxon in the data set irrespective of whether N. vectensis or T. adherens are included in the considered alignment. A Wilcoxon signed-rank test of the Colless indices of ML trees, produced with PhyML 3.0 and RAxML, for 1000 protein families, revealed a significant difference (V = 166 219, P = 6.46×10−5 ). The corresponding boxplot (Fig. 2) shows that RAxML produces even more balanced trees than PhyML as judged with the Colless index. In contrast, analyses conducted using the cherry count found no significant difference (V = 54 028.5, P = 0.568; Supplementary Fig. S8, available at http://dx.doi.org/10.5061/dryad.22d26). Analyses of the shape of trees generated from random data sets (Fig. 3 and Supplementary Figs. S9 and S10, available at http://dx.doi.org/10.5061/dryad.22d26) reveal an interesting gradient, with MP trees being the most PDAlike (in agreement with Zhu and Steel 2013) and NJ trees being even more balanced than Yule trees. ML was found to be the method returning the most PDA-like trees apart from MP. This is in stark contrast with the results of real data sets where ML trees are the most balanced. Overall, comparison of Figures 1 and 3 show that when

FIGURE 3. Boxplot summary of normalized Colless indices for trees derived from random (no signal) data sets. The rightmost and leftmost columns, respectively, indicate the theoretical distribution expected under PDA and Yule. TRMs have been ordered from the one that returns the most PDA-like trees (MP) to the one that returns trees that are most Yule-like (NJ). These results represent simulations of 20 states (AA) random data sets, where median values are used for MP and BI trees. Simulations for binary and 4-states data sets are reported in Supplementary Figures S8 and S9.

phylogenetic methods are used to analyse random or real data sets they tend to return trees of different levels of imbalance. It is clear that there is an effect caused by the presence of signal in the data, but the direction of the effect is method dependent with ML trees, for example, going from being more PDA-like to more Yulelike, while the opposite is seen for NJ trees. A comparison

Page: 439

436–441

440

of Figure 3, Supplementary Figures S9 and S10 shows that these results hold for all data types (multistate and binary) considered. DISCUSSION

Huelsenbeck and Kirkpatrick generated their simulated trees under a Yule model. Huelsenbeck Kirkpatrick’s (1996) study suggests that their Yule model tends to produce fairly balanced trees, such that errors in tree reconstruction are likely to produce more asymmetric trees than expected under that model. If the empirical trees in our study are generally less balanced than expected under a Yule model, then it is possible that errors in tree reconstruction (as evidenced by TRMs yielding different trees) would be more likely to yield more balanced trees. It is not clear from our results whether there is any bias per se with ML toward balanced trees, or with the other methods investigated toward more unbalanced trees, or indeed some other pattern. Although it is clear that patterns of tree balance arising from real data are different from those resulting from random data, these differences do not seem to provide any clue as to the mechanistic basis for the observed differences in tree balance in terms, for example, of the combinations of phylogenetic signal and random noise that they might contain. Similarly, whereas MP is expected to be biased toward asymmetric trees, because of differences in the maximum number of steps of sets of characters that fit asymmetric or symmetric trees (Thorley and Wilkinson 2003; Wilkinson et al. 2005), this does not explain MP’s similarity to NJ, BI, and QP, and its significant difference from ML in our sample of real data sets. In the case of ML and BI, we found that choice of model did not affect inferred tree shape. We also speculate that the use of PDA priors in BI software might impact upon the probability of inferring more asymmetric trees in cases of error. This first, large scale, study of empirical patterns of the balance of modern TRMs demonstrates significant differences between TRMs that are consistent with the existence of biases, without elucidating which methods are biased, or why. Given the importance of empirical patterns of tree balance to macroevolutionary studies, these remain fundamental open questions that merit further attention. Huelsenbeck and Kirkpatrick (1996) outlined a parametric bootstrapping type approach for investigating patterns of tree balance that would account for biases in TRMs in macroevolutionary studies. Fortunately, the applicability of their approach does not depend on the direction or strength of any methodological bias.

Downloaded from http://sysbio.oxfordjournals.org/ at Universidade de Vigo on May 16, 2014

Our results are inconsistent with Heard’s (1992) study of the balance of empirical trees sampled from the literature. That study differs from ours in employing fewer trees overall, particularly fewer ML trees, in not imposing a lower limit on numbers of taxa, and in using a simple, perhaps confounding, classification of methods, and trees as either cladistic or phenetic (including both UPGMA and NJ). These differences may have negatively impacted upon the statistical power of Heard’s analysis and contributed to his finding of no variance between TRMs. Although important historically, we consider Heard’s (1992) pioneering study to be of little relevance to current understanding of empirical patterns of phylogenetic tree balance. Huelsenbeck and Kirkpatrick (1996) simulated eight taxon trees under a Yule model, and associated sequence alignments of 100 or 500 nt produced under the Jukes and Cantor model (Jukes and Cantor 1969). Using a variety of balance measures, including the Colless index, they found that trees inferred from the alignments by different TRMs (ML, MP, NJ, UPGMA) were more unbalanced than expected, especially when rates of evolution were high. They found this bias to be strongest in ML and that it was correlated with accuracy. Only when inferred trees are inaccurate is there any scope for tree balance to be incorrectly inferred or differ, on average, from expectations. These authors show that ML was the least accurate method in their simulations and this might explain why this method suffered the largest average bias. The direction of the effect can be explained if it is assumed that when inferred trees are incorrect, their error is essentially randomly sampled from a tree balance distribution, in which asymmetric relationships are more common than in the null Yule model used to generate the trees, for example, a distribution where all topologies are equally probable (Huelsenbeck and Kirkpatrick 1996). Our analyses of empirical data also show that the most frequently used TRMs in modern phylogenetics produce trees with varying degrees of balance. However, in marked contrast to the results from Huelsenbeck Kirkpatrick’s (1996) simulations, we find that ML tends to yield the most balanced trees, and that the distinction from the other methods is highly significant statistically. Furthermore, this pattern is found using two alternative modern implementations of ML and, as judged by Colless indices, is stronger for RAxML than for PhyML. In contrast QP, which also uses ML, and BI, which is related to ML, return trees with a level of imbalance that cannot be distinguished from that of other methods. We suspect the difference between our empirical results and those of Huelsenbeck and Kirkpatrick (1996) simulations is, at least in part, attributable to the fact that

[12:34 7/4/2014 Sysbio-syt103.tex]

VOL. 63

SYSTEMATIC BIOLOGY

SUPPLEMENTARY MATERIAL Data available from the Dryad Digital Repository: http://dx.doi.org/10.5061/dryad.22d26.

FUNDING This study was supported by an “IRCSET EMBARK Initiative” PhD scholarship [awarded to T.A.H.]; partially supported by a Science Foundation Ireland

Page: 440

436–441

2014

POINTS OF VIEW

Research Frontiers Programme [11-RFP-EOB-3106 to D.P.]; by the High Performance Computing (HPC) Facility at NUI Maynooth; and by the Irish Centre for High-End Computing (ICHEC).

ACKNOWLEDGMENTS We thank the editorial team and reviewers for their insightful comments that led to substantial improvements.

REFERENCES

[12:34 7/4/2014 Sysbio-syt103.tex]

Jukes T., Cantor C. 1969. Evolution of protein molecules. In: Munro H.N., editor. Mammalian protein metabolism.New York: Academic Press. Keane T., Creevey C., Pentony M., Naughton T., McLnerney J. 2006. Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified. BMC Evol. Biol. 6:29. Kirkpatrick M., Slatkin M. 1993. Searching for evolutionary patterns in the shape of a phylogenetic tree. Evolution 47:1171–1181. Kupczok A. 2011. Consequences of different null models on the tree shape bias of supertree methods. Syst. Biol. 60:218–225. Lartillot N., Philippe H. 2004. A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol. Biol. Evol. 21:1095–1109. Le S., Gascuel O. 2008. An improved general amino acid replacement matrix. Mol. Biol. Evol. 25:1307–1320. McKenzie A., Steel M. 2000. Distributions of cherries for two models of trees. Math. Biosci. 164:81–92. Mooers A., Heard S. 1997. Inferring evolutionary process from phylogenetic tree shape. Q. Rev. Biol. 72:31–54. Paradis E., Claude J., Strimmer K. 2004. APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20:289–290. Rogers J. 1994. Central moments and probability distribution of Colless’s coefficient of tree imbalance. Evolution 48:2026–2036. Rogers J. 1996. Central moments and probability distributions of three measures of phylogenetic tree imbalance. Syst. Biol. 45:99–110. Ronquist, F., Teslenko, M., Van Der Mark, P., Ayres, D.L., Darling, A., Höhna, S., Larget, B., Liu, L., Suchard, M.A., Huelsenbeck, J. P. 2012. MrBayes 3.2: Efficient bayesian phylogenetic inference and model choice across a large model space. Systematic Biology 61:539–542. Rosen D.E. 1978. Vicariant pattern and historical biogeography. Syst. Zool. 27:159–188. Schmidt H. A., Strimmer K., Vingron M., von Haeseler A. 2002. TREEPUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18:502–504. Siegel S., Castellan N.J. 1988. Nonparametric statistics for the behavioral sciences.New York: McGraw-Hill. Stam E. 2002. Does imbalance in phylogenies reflect only bias? Evolution 56:1292–1295. Stamatakis A., Ludwig T., Meier H. 2005. RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees. Bioinformatics 21:456–463. Swofford D.L. 1998. PAUP*. Phylogenetic analysis using parsimony (* and other methods). 4th ed. Sunderland (MA): Sinauer Associates. Thorley J.L., Wilkinson M. 2003. A view of supertree methods. In: Janovitz M.F., Lapointe F., McMorris F.R., Mirkin B., Roberts F.S., editors. Bioconsensus. p. 183–195. DIMCAS series in discrete mathematics and theoretical computer science. Vol. 61. Providence, Rhode Island: The American Mathematical Society. Vinh L.S., Fuehrer A., von Haeseler A. 2011. Random tree-puzzle leads to the Yule–Harding distribution. Mol. Biol. Evol. 28:873–877. Wilkinson M., Cotton J., Creevey C., Eulenstein O., Harris S., Lapointe F., Levasseur C., McInerney J., Pisani D., Thorley J. L. 2005. The shape of supertrees to come: tree shape related properties of fourteen supertree methods. Syst. Biol. 54:419–431. Yule G. 1924 A mathematical theory of evolution, based on the conclusions of Dr. JC Willis FRS. Philos T Roy Soc Lon. B 213: 21–87. Zhu S., Steel M. 2013 Does random tree puzzle produce Yule–Harding trees in the many-taxon limit? Math. Biosci. 243:109–116.

Page: 441

Downloaded from http://sysbio.oxfordjournals.org/ at Universidade de Vigo on May 16, 2014

Aldous D.J. 2001. Stochastic models and descriptive statistics for phylogenetic trees, from Yule to today. Stat. Sci. 16:23–34. Blum M., Francois O. 2006. Which random processes describe the tree of life? A large-scale study of phylogenetic tree imbalance. Syst. Biol. 55:685–691. Bortolussi N., Durand E., Blum M., Francois O. 2006. apTreeshape: statistical analysis of phylogenetic tree shape. Bioinformatics 22:363–364. Colless D. 1982. Review of phylogenetics: the theory and practice of phylogenetic systematics, by E. 0. Wiley. Syst. Zool. 31:100–104. Colless, D. 1995. Relative symmetry of cladograms and phenograms: an experimental study. Syst. Biol. 44:102–108. Friedman M. 1937. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc. 32:675–701. Guindon S., Dufayard J.F., Lefort V., Anisimova M., Hordijk W., Gascuel O. 2010. New algorithms and methods to estimate maximumlikelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59:307–321. Guyer C., Slowinski J. 1993. Adaptive radiation and the topology of large phylogenies. Evolution 47:253–263. Harcourt-Brown K., Pearson P., Wilkinson M. 2001. The imbalance of paleontological trees. Paleobiology 27:188–204. Harvey P.H., Purvis A. 1991. Comparative methods for explaining adaptations. Nature 351:619–624. Heard S. 1992. Patterns in tree balance among cladistic, phenetic, and randomly generated phylogenetic trees. Evolution 46:1818–1826. Hollander M., Wolfe D.A. 1999. Nonparametric statistical methods. New York: Wiley. Holton T.A., Pisani D. 2010. Deep genomic-scale analyses of the Metazoa reject Coelomata: evidence from single- and multigene families analyzed under a supertree and supermatrix paradigm. Genome Biol. Evol. 2:310–324. Hothorn T., Hornik K., van de Wiel M., Zeileis A. 2008. Implementing a class of permutation tests: the coin package. J. Stat. Softw. 28:1–23. Huelsenbeck J., Kirkpatrick M. 1996. Do phylogenetic methods produce trees with biased shapes? Evolution 50:1418–1424. Jones D.T., Taylor W.R., Thornton J.M. 1992. The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. 8:275–282.

441

436–441

The shape of modern tree reconstruction methods.

The shape of modern tree reconstruction methods. - PDF Download Free
242KB Sizes 0 Downloads 0 Views