MOLECULAR

PHYLOGENETICS

AND

Vol. 1, No. 1, March, pp. 53-58,

EVOLUTION

1992

Phylogenetic Inference Based on Matrix Representation

of Trees’

MARK A. RAGAN Institute

for Marine

Biosciences,

National

Research Council of Canada, 7411 Oxford St, Halifax, Nova Scotia,

Received

September

12, 1991;

revised

March

Canada

t33H 327

16, 1992

user-friendly microcomputer implementations (Platnick, 1987). In molecular phylogenetics, parsimony methods have not so extensively supplanted all others; certain distance approaches (Saitou and Nei, 1987) behave well in simulations (Gouy and Li, 1990) and have computational advantages, while the statistically more tractable likelihood-based methods (Felsenstein, 1981) may also perform well in simulations (Hasegawa et al., 1991) and appear to some to be “more in the spirit of mainstream scientific investigation” than parsimony analysis (Bishop et al., 1987). Nonetheless, parsimony methods are widely employed in molecular phylogenetics (Wolters and Erdman, 1988; Patterson, 1989; Loomis and Smith, 1990). Instead, the major impediment to integrating classical and molecular analyses is the lack of an explicit, theoretically based, flexible, and computationally feasible framework for combining both kinds of data into a single hybrid matrix. This problem is in fact a subset of the more general lack of an explicit framework for combining any two dissimilar data sets, e.g., proteinsequence with nucleotide-sequence data, for phylogenetic analysis. Two approaches have been put forward to address these problems, one potentially exact but computationally inefficient, the other inherently approximate. In the former, the various data sets (aligned orthologous molecular sequences, classical discrete-state characters) are simply pooled into a single hybrid matrix, which is then analyzed, e.g., by parsimony, perhaps after weighting to balance the contributions of individual classes of characters (Miyamoto, 1985). Although it preserves all the resolution inherent in the data, there are several problems with this approach. Matrices including multiple gene sequences can consist of tens of thousands of characters (nucleotides), and as such be difficult or impossible to handle with existing software or in a microcomputer environment. If (as described above) the classical characters are to be interpreted using parsimony, molecular-sequence data in the same hybrid matrix would have to be analyzed similarly. Inclusion of protein sequences would necessitate the use of a specialized character-statetransition table (e.g., a “Dayhoff matrix”) with some, but not all, columns of data. Maintaining such a

Rooted phylogenetic trees can be represented as matrices in which the rows correspond to termini, and columns correspond to internal nodes (elements of the n-tree). Parsimony analysis of such a matrix will fully recover the topology of the original tree. The maximum size of the represented matrix depends only on the number of termini in the tree; for a tree derived from molecular sequences, the represented matrix may be orders of magnitude smaller than the original data matrix. Representations of multiple trees (which may or may not have identical M-mini) can readily be combined into a single matrix; columns of discrete-character-state data can be added and, if desired, weighted differentially. Parsimony analysis of the resulting composite matrix yields a hybrid supertree which typically provides greater resolution than conventional consensus trees. Use of this method is illustrated with examples involving multiple tRNA genes in organelles and multiple protein-coding genes in euharyotes.

INTRODUCTION

Morphology and macromolecules: conflict or congruence? Although, as argued by Moritz and Hillis (1990), conflicts between the two kinds of evidence are sometimes overemphasized, the stark dichotomy remains: phylogenetic reconstruction is typically based either on molecular sequences or on “classical” (morphological and other phenotypic) characters, but infrequently on both simultaneously. For the most part, the existence of this dichotomy cannot be blamed on differences inherent in the natures of the two kinds of characters. Like many classical data, nucleotides and amino acid residues are intrinsically discrete-state (if unordered) characters. Nor is the dichotomy strictly required methodologically. “Cladism is conquering everywhere” in morphology-based phylogenetics (Hull, 1989, p. 13), owing at least in part (Hull, 1988) to the force of its underlying logic (Farris, 1983), its maximization of information content (Far-r-is, 1979), and the availability of 1 Issued as NRCC No. 33767. 53

1055-7903/92

$5.00

54

MARK

multigene matrix could furthermore require realignment as new sequences are added (Hogeweg and Hesper, 1984; Hein, 1990), and the complete matrix would have to be reanalyzed after each adjustment. The second approach involves constructing separate trees for classical characters and for each type of molecular sequence, and from these deriving a consensus tree (Hillis, 1987). This approach allows complex problems to be modularized for easier computation, but has drawbacks as well. The loss of resolution inherent in consensus methods is well known. By losing contact with the original data, consensus methods fail to take into account the relative strengths of character evidence among data sets, and hence “cannot resolve conflict and ambiguity according to evidence” (Miyamoto, 1985). Existing methods for calculating consensus trees further require that the same set of termini (objects) be present in each tree, and do not allow for relative weighting of trees. Difficult problems remain in merging partially overlapping (Gordon, 1986) and disjunct trees (Brossier, 1990). I now define an alternative method, based on graphtheoretical relationships between trees and matrices (Ponstein, 1966), to address some of these problems. This approach, which I term matrix representation with parsimony analysis (MRP), resembles the former method in proceeding from a single hybrid matrix, but like the latter approach incorporates topological information from derived trees. MRP does not require that all termini be identical or even overlapping between trees, and relative weighting of trees appears possible (although I do not develop methods for relative weighting at this time). I illustrate MRP with two examples and discuss some of its advantages and limitations. METHOD

It has been appreciated at least since Poincare (1901) that graphs (of which trees represent a special case) may be completely represented as matrices. At least five different formalisms implementing this representation have been developed (Ponstein, 1966); one of these, with properties particularly suited to the problem at hand, is described below. Identical or similar representations have occasionally seen limited use in phylogenetics (Farris, 1973; O’Grady and Deets, 1987). Consider a set S of n objects (termini) related by an ordered tree. Let S = (1, 2, . , . , n}, and let P(S) be the set of all subsets of S. As described by Bobisud and Bobisud (1972) and Margush and McMorris (1981), an n-tree is a subset T of P(S) which satisfies three conditions: 1. SET,0eT. 2. {i} E T for all i E S. 3. If A, B E T with A n B # 0, then A c B orB

CA.

A. RAGAN

The topology of the tree can be fully represented by a matrix in which rows correspond to the objects (termini) and columns correspond to the elements of the n-tree. As no additional information is contributed by establishing columns for the termini themselves (although they satisfy the three conditions above and thus are elements of the n-tree), individual termini are not afforded columns. If the ith terminus is in the jth element of the n-tree, matrix element ij is scored 1; if not, it is scored 0. No topological information is lost in this representation; the tree and matrix are equivalent data structures. This process of representation may alternatively be described as follows. If the tree is unrooted, it first must be rooted by appropriate criteria, e.g., by outgroup comparison or by the method of Iwabe et al. (1989); then all internal nodes are numbered. A matrix is constructed with a row for each terminus (including the outgroup) and a column for each internal node. If, according to the tree, terminus i is descendant from node j, matrix element ij is scored 1; if not, 0. The outgroup descends from no internal node, so all its character-states are coded 0. The matrix so encoded from any given tree will necessarily be homoplasy-free, and parsimony analysis of this matrix of “artificial synapomorphies” will return the original tree. Matrices constructed in this way will have at most n-l columns (for strictly bifurcating trees), a number independent of the size of the original data matrix from which the tree was derived. For each tree derived from molecular-sequence data, the represented matrix may thus be orders of magnitude smaller than the original data matrix. Representations of multiple, identically rooted trees can readily be combined into a single matrix simply by appending additional columns and, if necessary, rows. Matrix elements corresponding to termini not present in a given tree are coded as missing data. Representations of diverse trees can be combined in this way, regardless of whether the trees were produced from molecular or nonmolecular data, or whether they were constructed using parsimony, distance, maximumlikelihood, or other methods. Columns of discrete-state character data can be added as well. The final hybrid matrix is then analyzed by parsimony. RESULTS Example

1: Organellar

tRNA Genes

As a consequence of recent intensive studies of organelle genomes, primary structures of genes encoding a number of isoaccepting transfer RNAs are now available, especially for fungal and animal mitochondria, and for chlorophyll a/b-containing chloroplasts (Sprinzl et al., 1987). In comparison with many other coded macromolecules, transfer RNAs are small and highly constrained structurally, and contain regions

MRP: MATRIX

REPRESENTATION

difficult to align for comparative studies; tRNAs from mitochondria of vertebrates are especially difficult to align (Nicoghosian et al., 1987; Cedergren et al., 1991). Moreover, there is evidence that the acceptor specificities of certain tRNA species may be interchanged in vitro (Normanly et al., 1986; Hou and Schimmel, 1988), although there is no convincing evidence that this has occurred in uiuo, apart from suppressor tRNAs (Cedergren et al., 1991). Furthermore, tRNA genes (tDNAs) of plant mitochondria have unexpectedly complex evolutionary histories (Joyce and Gray, 1989). Thus, although providing a challenging but computationally tractable test of the MRP method, trees based on organellar tDNA sequences may not adequately represent phylogenetic relationships among either the host eukaryotes or their organelles. A 9 x 673 sequence matrix of mitochondrial and chloroplastic genes encoding alanyl (anticodon UGC), asparaginyl (GUU), aspartyl (GUC), glutaminyl (UUG), glutamyl (UUC), histidinyl (GUG), phenylalanyl (GAA), tyrosinyl (GUA), and valyl (UAC) tRNAs (Manzara et al., 1987; Sprinzl et al., 1987; Trinkl et al., 1989; B.F. Lang, personal communication) was constructed on the basis of published alignments (Sprinzl et al., 1987). Nucleotides were treated as unordered (nonadditive) characters. Implicit enumeration of this matrix using HENNIG86 (Farris, 1988) yielded one tree of length 1265, with CI = 0.72 and RI = 0.62 (Fig. 1). The tree was rooted between chloroplast and mitochondrial tDNAs as shown. Individual most-parsimonious trees for each of the nine tDNAs were then calculated (Table 1) by implicit enumeration using HENNIG86. Eight of the tRNAs gave single most-parsimonious trees; these eight, plus the three equally parsimonious histidinyl tDNA trees, were represented as a single 10 x 88 matrix. Alternatively, the three histidinyl tRNA trees were replaced by the corresponding Nelson consensus tree, yielding a 10 x 70 represented matrix. MRP analysis of either matrix gave a tree topologically identical to that derived by direct analysis of the original 9 x 673 matrix (Fig. 1, and topology 2 in Table 1). CI values were 0.89 and 0.87 from the 10 x 88 and 10 x 70 represented matrices, respectively; RI values were 0.92 and 0.90. From the individual tRNA trees, strict, 50% majority-rule, Nelson, and Adams consensus trees were cflis ohloro abacus chloro

llu8 nfdulans rite Torulopsfs glabrata mite Saccharomyces cerevisiao rite accharomyces pombe mite sophila yakuba mite mouse mite Homo sapiens mito

I$; I% (SC) (SP)

(W

(I$

FIG. 1. Most-parsimonious tree relating tDNAs in nine organelles. Tree calculated directly from original 9 x 673 matrix and rooted between chloroplasts and mitochondria.

WITH PARSIMONY

55 TABLE

Most-Parsimonious I: 2: 3: 4: 5: 6: 7:

TOPO~WY Topology Topology TOPO~OW Topology Topology Topology

tDNA Ala Asn Asp Gln Glu His Phe Tyr Val

UGC GUU GUC UUG UUC GUG GAA GUA UAC

1

Trees of Individual

(Eg,Nt)(Sp((Sc,Tg)(An(Dy(m,Hs))))) (Eg,Nt)(An((Sc,Tgl(Sp(Dy(m,Hs))))) (Eg,Nt)((Sp(Sc,Tg))(An(Dy(m,Hs)))) (Rg,Nt)((Dy(m,Hsl)(Sp(An(Sc,Tg)))) (Rg,NtKSp(An((Sc,Tgl(Dy(m,Hs))))) (Eg,Nt)(An((Sp(Sc,Tg))(Dy(m,Hs)))) (Eg,Nt)(An,Sp(Sc,TgKDy(m,Hs)))

Number of trees

Topology

Nelson consensus

1 1 1 1 1

1

-

2 2 3 2

3

Vi6

1 1 1

tDNAs

2 4 4

7 -

CI

RI

0.73 0.74 0.74 0.71 0.73 0.75 0.74 0.76 0.73

0.61 0.64 0.70 0.61 0.66 0.59 0.66 0.66 0.62

derived using routines in PAUP (Swofford, 1990) and HENNIG86; all showed the same trichotomy and had the topology (Eg,Nt) (An,Sp(Sc,Tg) (Dy(m,Hs))). The clade Sp(Dy(m,Hs)), resolved in the MRP result despite appearing in only four of the individual tRNA trees (Topology 2, Table l), occurs because the MRP tree including this clade is most parsimonious overall based on the data (equally weighted n-tree elements). Example 2: PhyZogeny of Dictyostelium In a recent examination of the phylogenetic position of the slime mold Dictyostelium, Loomis and Smith (1990) presented parsimony (PROTPARS: Felsenstein, 1982) and distance trees of eight individual proteins: four enzymes of pyrimidine biosynthesis, actin, myosin heavy chain, calmodulin, and RAS. Based on a visual examination of the sixteen individual trees, these authors concluded that Dictyostelium is more closely related to animals than to ascomycetes, contradicting results (Bachellerie and Michot, 1989; Sogin et uZ., 1989; Hendriks et al., 1991) based on rRNA sequences. Although informal, this visual approach embodies the basis of most consensus techniques (Margush and McMorris, 1981; Adams, 1972; Nelson, 1979) in recognizing monophyletic groups which appear frequently in the data. This comparison can be conducted more formally using MRP; consensus methods are not helpful, as all eight individual trees do not share identical termini. To prevent the matrices from becoming altogether too sparse, termini were collapsed from 29 to 10 as described in the note to Table 2. Representing the eight parsimony trees (Loomis and Smith, 1990) as described above yields the matrix shown in Table 2. To a first approximation, data weighting is probably unneces-

56

MARK A. RAGAN

TABLE Matrix

eubacteria ascomycetous

2

Representation of Eight Parsimony from Loomis and Smith (1990)

filamentous

Trees

Chlamydononas

000000000111111111122222222 123456789012345678901234567 000000000000000000000000000 111111111111001101101111111 111111111111111111111111111 111 110110110011111001001000010 -11101110 10010010001000 lllp llllOlllOO1 lOOOColumns ATCase parsimony DHOase parsimony OPRTase parsimony OMPase parsimony Actin parsimony Myosin heavy chain head parsimony Calmodulin parsimony RAS parsimony

Column Number Outgroup Animal Dictyoste Entamoeba Ascoyeast Ascofilam Eubacteria Acanthamo Plant Ciliate Chlamydo 01-03 04-06 07-10 11-14 15-17 18-20 21-25 26-27

Dictyostelium Acenthanoabd

FIG. 2. Nelson consensus MRP tree of parsimony Loomis and Smith between eubacteria

sary, as each of the eight proteins has about the same number of amino acid residues. Implicit enumeration using HENNIG86 yields all 21 most-parsimonious trees, each of length 29, with CI = 0.93 and RI = 0.91. Each of the 21 trees shows (a) a bifurcation between eubacteria and eukaryotes at the deepest (most basal) TABLE

Jumble No. 1 2 3 4 5 6 7 8 9 10 11

PROTPARS

Analyses of 10

x

2382 Concatenated

Tree length

25 25 38 35 36 31 25 31 36 29 46

3450” 34514 3450 3450 3450 3450 3450

E. E. E. E. E. E. E.

coli coli coli coli coli coli coli

3449 3449 3449 3449

E. E. E. E.

coli [. . . [Animal coli [. . [Animal coli [. [Animal coli [. [Animal

the same random-number

of outgroup.

trees from Tree rooted

3

No. trees

a The first and second runs utilized differently in the two cases.

(1990) after removal and eukaryotes.

node, (b) ascomycetous yeasts forming a sister group with the remaining eukaryotes, and (c) animals, Acanthamoeba, Dictyostelium, and Entumoeba constituting a monophyletic group. The relative branching order of filamentous ascomycetes, Chlamydomonas, ciliates, and plants was unstable, as expected from the paucity of relevant data for these organisms (Table 2). The Nelson (1979) consensus tree (Fig. 21, calculated from these 21 trees using HENNIG86, corroborates the conclusions reached by Loomis and Smith (1990). MRP analysis of the eight distance trees presented by Loomis and Smith (1990) returns Dictyostelium and animals as a monophyletic group in a single tree of length 30, with CI = 0.63 and RI = 0.72 (result not shown). These results can be compared with those obtained by direct PROTPARS analysis of the 10 x 2382 matrix of concatenated amino acid sequences. Because results of PROTPARS analysis can depend on the order of data input, 11 runs with 10 different random-number seeds were conducted on an SGI 4D/280 UNIX workstation. The results (Table 2) fall into two groups. From each of seven runs, the clade animal (Dictyostelium, Acanthamoeba, Entumoeba) was recovered as in the MRP tree. In the other four analyses, Chlamydomonus and spinach were inserted unpredictably into this clade;

Note. Where necessary, terminal branches have been collapsed into the monophyletic groups “animals,” “ascomycetous yeasts,” “filamentous ascomycetes,” “plants, ” “ciliates,” and “eubacteria” to restrict the number of taxa under consideration. Because “ascomycetous yeasts” do not appear monophyletic in the actin parsimony tree, termini corresponding to Thermomyces and Saccharomyces actins were not represented in the matrix above; see Loomis and Smith (1990) for further discussion of actin trees.

Results of Direct

yeasts asccmycetes

Protein-Sequence

Matrix

Consensus topology: [ means 100% inclusion [. [. [. 1. . [. 1. [. .

[Animal [Animal [Animal [Animal [Animal [Animal [Animal

(Ditty (Acanth, (Ditty (Acanth, (Acanth (Ditty, (Acanth (Ditty, (Acanth (Ditty, (Acanth (Ditty, (Acanth (Ditty,

Ent Ent Ent Ent Ent Ent Ent

[Acanth (Ent (Chlamy (Dicty,Spinach [Chlamy ((Dicty,Spinach)(Acanth,Ent [(Acanth,Ent)(Chlamy (Dicty,Spinach [Chlamy (Ent (Acanth (Dicty,Spinach

seed and found identical

trees in the same order, but assessed tree length

MRP: MATRIX

REPRESENTATION

this may be an artifact arising from the paucity of sequence information (one protein sequence each) for Chlamydomonas, spinach, Acanthumoeba, and Entamoeba. All PROTPARS analyses of the concatenated matrix differed from the MRP result, however, in showing the filamentous ascomycete (not Saccharomyces) to occur on the deepest branch among eukaryotes. This difference could be an artifact of sparsity of data (in either the concatenated sequence or the represented matrix) or of the manner in which termini were collapsed, and need not indicate a serious problem with either the PROTPARS or the MRP methods themselves. DISCUSSION Representation of trees as matrices of artificial synapomorphies opens up a number of possibilities in phylogenetic analysis. Single matrices representing multiple genes, RNAs, and/or proteins can readily be constructed from published trees without reference to the original aligned sequences. Discrete-state characters such as morphological and biochemical data can be added to these matrices prior to parsimony analysis. Matrices representing large numbers of genes will typically be small enough for microcomputer analysis using existing software such as HENNIG86, PAUP, or PHYLIP. Unlike existing consensus techniques, MRP does not require that all subtrees possess identical termini. As with existing consensus methods, MRP loses contact with original data potentially supporting alternative trees; hence, in the absence of a system of character weighting which preserves this information in the represented matrix, conflicts and ambiguities cannot be resolved according to the original evidence (e.g., sequences). As illustrated in the tDNA example above, however, the resulting MRP tree can provide considerably greater resolution than available consensus methods. Clades are resolved because their presence contributes to maximization of the parsimony of the entire MRP tree based on the represented topological information, not (as in consensus methods) because that cluster appears in some proportion of the initial trees. Two situations which cause problems for the MRP method have been identified. Where a single data set yields many equally good (e.g., most-parsimonious) trees, representing each tree individually may not afford the scale of data reduction indicated above; in extreme cases, the represented matrix could be larger than the initial data set. In such cases it may be necessary to represent a consensus tree, not all the individual most-parsimonious trees. The second situation involves the “rogue branch.” Two trees may be completely identical, except for a single branch which appears at a radically different place in each (for an example see Fig. 2 of Hillis, 1987).

WITH PARSIMONY

57

In this situation strict consensus methods behave poorly, collapsing all intervening branches to the first common node; the Adams consensus method (Adams, 19721, which collapses only the rogue branch to the first common node, better preserves internal structure. In this situation MRP unhelpfully yields a large number of trees. Either a combination of the initial data matrices or Adams consensus is the preferred approach in such situations. The behavior of MRP trees during weighting or pruning has not been systematically explored. Using features in existing phylogenetic software, matrix columns representing individual trees or subtrees can easily be weighted differentially for parsimony analysis. In the case of parsimony trees, these individual column weights could be based on the number and type of synapomorphies supporting each internal node or n-tree element; alternative approaches based on differential compatibility have been explored by Penny and Hendy (1985) and by Sharkey (1989). With distance or likelihood trees, where nodes do not necessarily correspond to any real synapomorphies, the weighting criteria seem less obvious. The application of pruning algorithms to studying the structure of consensus supertrees has been described by Finden and Gordon (1985). Applications of MRP can be envisaged not only among microcomputer-based phylogenetic analyses of multigene molecular-sequence data and in hybrid classical/molecular phylogenetics, but perhaps eventually in comparative analysis of results from genomicsequencing projects. ACKNOWLEDGMENTS I am grateful to Allan D. Gordon and David L. Swofford for guiding me to the literature; to William H. E. Day for the n-tree formulation and for reviewing the manuscript; to Michael Zuker and Joseph Felsenstein for assistance with PROTPARS analysis of the Dictyostelium matrix; to Douglas W. Smith, B. Franz Lang, and Richard B. Hallick for help in compiling molecular-sequence matrices; to two anonymous reviewers for insightful criticism, especially regarding limitations of the method; and to the Canadian Institute for Advanced Research for travel funding. Note added in proof. A referee has called to my attention that this method has been discovered independently by Bernard Baum (1992) and by Brent Mishler (unpublished).

REFERENCES Adams, E. M., III (19721. Consensus techniques and the comparison of taxonomic trees. Syst. 2001. 21: 390-397. Bachellerie, J.-P., and Michot, B. (1989). Evolution of large subunit rR.NA structure. The 3’ terminal domain contains elements of secondary structure specific to major phylogenetic groups. Biochimie 71: 701-709. Baum, B. R. (1992). Combining trees as a way of combining data sets for phylogenetic inference, and the desirability of combining gene trees. Taxon 41: 3-10.

MARK A. RAGAN

58

Bishop, M. J., Friday, A. E., and Thompson, E. A. (1987). Inference of evolutionary relationships. In “Nucleic Acid and Protein Sequence Analysis, A Practical Approach” (M. J. Bishop and C. J. Rawlings, Eds.), pp. 359-385, IRL Press, Oxford. Bobisud, H. M., and Bobisud, L. E. (1972). A metric for classifications. Tuxon 21: 607-613. Brossier, G. (1990). Piecewise hierarchical clustering. J. Classif. 7: 197-216. Cedergren, R., Abel, Y., and Sankoff, D. (1991). Evaluating gene versus genome evolution. NATO Adv. Study Inst. Ser. H 57: 87-100. Farris, J. S. (1973). On comparing the shapes of taxonomic trees. Syst.

Zool.

22: 50-54.

Farris, J. S. (1979). The information content of the phylogenetic system. Syst. Zool. 28: 483-519. Farris, J. S. (1983). The logical basis of phylogenetic analysis. Adv. Cladistics

2. 7-36.

Farris, J. S. (1988). HENNIG86 documentation, version 1.5. Felsenstein, J. (1981). Evolutionary trees from DNA sequences: A maximum likelihood approach. J. Mol. Evol. 17: 368-376. Felsenstein, J. (1982). Numerical methods for inferring evolutionary trees. Q. Rev. Biol. 57: 379-404. Finden, C. R., and Gordon, A. D. (1985). Obtaining common pruned trees. J. Classif. 2: 255-276. Gordon, A. D. (1986). Consensus super-trees: The synthesis of rooted trees containing overlapping sets of labeled leaves. J. Classif. 3: 335-348. Gouy, M., and Li, W.-H. (1990). Evolutionary relationships among primary lineages of life inferred from rRNA sequences. In “The Ribosome: Structure, Function, Evolution” (W. E. Hill, A. Dahlberg, R. A. Garrett, P. B. Moore, D. Schlessinger, and J. R. Warner, Eds.), pp. 573-578, Am. Sot. Microbial., Washington. Hasegawa, M., Kishino, H., and Saitou, N. (19911. On the maximum likelihood method in molecular phylogenetics. J. Mol. Evol. 32: 443-445. Hein, J. (1990). A unified approach to alignment and phylogenies. In “Methods in Enzymology” (R. F. Doolittle, Ed.), Vol. 183, pp. 626-645, Academic Press, San Diego. Hendriks, L., Goris, A., Van de Peer, Y., Neefs, J.-M., Vancanneyt, M., Kersters, K., Hennebert, G. L., and De Wachter, R. (1991). Phylogenetic analysis of five medically important Candida species as deduced on the basis of small ribosomal RNA sequences. J. Gen. Microbial. 137: 1223-1230. Hillis, D. M. (1987). Molecular versus morphological approaches to systematics. Annu. Rev. Ecol. Syst. 18: 23-42. Hogeweg, P., and Hesper, B. (19841. The alignment of sets of sequences and the construction of phyletic trees: An integrated method. J. Mol. Evol. 20: 175-186. Hou, Y.-M., and Schimmel, P. (1988). A simple structural feature is a major determinant of the identity of a transfer RNA. Nature (London) 333: 140-145. Hull, D. L. 0988). “Science as a Process. An Evolutionary Account of the Social and Conceptual Development of Science,” Univ. of Chicago Press, Chicago. Hull, D. L. (1989). The evolution of phylogenetic systematics. In “The Hierarchy of Life” (B. Fernholm, K. Bremer, and H. Jiirnvall, Eds.), pp. 3-15, Elsevier Biomedical, Amsterdam. Iwabe, N., Kuma, K.-i., Hasegawa, M., Osawa, S., and Miyata, T. (1989). Evolutionary relationship of archaebacteria, eubacteria, and eukaryotes inferred from phylogenetic trees of duplicated genes. Proc. Natl. Acad. Sci. USA 86: 9355-9359.

Joyce, P. B. M., and Gray, M. W. (1989). Chloroplast-like transfer RNA genes expressed in wheat mitochondria. Nucleic Acids Res. 17: 5461-5476. Loomis, W. F., and Smith, D. W. (1990). Molecular phylogeny of Dictyostelium discoideum by protein sequence comparison. Proc. Natl.

Acad.

Sci.

USA

87: 9093-9097.

Manzara, T., Hu, J.-X., Price, C. A., and Hallick, R. B. (1987). Characterization of the TrnD, TrnK, PsaA locus of Euglena grucilis chloroplast DNA. Plant Mol. Biol. 8: 327-336. Margush, T., and McMorris, F. R. (1981). Consensus n-trees. Bull. Math.

Biol.

43: 239-244.

Miyamoto, M. M. (1985). Consensus cladograms and general classifications. Cladistics 1: 186-189. Moritz, C., and Hillis, D. M. (19901. Molecular systematics: Context and controversies. In “Molecular Systematics” (D. W. Hillis and C. Moritz, Eds.), pp. l-10, Sinauer, Sunderland, MA. Nelson, G. (1979). Cladistic analysis and synthesis: Principles and definitions, with a historical note on Adanson’s “Familles des Plan&” (1763-17641. Syst. Zool. 28: 1-21. Nicoghosian, K., Bigras, M., Sankoff, D., and Cedergren, R. (1987). Archetypical features of tRNA families. J. Mol. Evol. 26: 341-346. Normanly, J., Ogden, R. C., Horvath, S. J., and Abelson, J. (1986). Changing the identity of a transfer RNA. Nature (London) 321: 213-219. G’Grady, R. T., and Deem, G. B. (1987). Coding multistate characters, with special reference to the use of parasites as characters of their hosts. Syst. Zool. 36: 268-279. Patterson, C. (1989). Phylogenetic relations of major groups: Conclusions and prospects. In “The Hierarchy of Life” (B. Fernholm, K. Bremer, and H. Jornvall, Eds.), pp. 471-488, Elsevier Biomedical, Amsterdam. Penny, D., and Hendy, M. D. (1985). Testing methods of evolutionary tree construction. Cladistics 1: 266-278. Platnick, N. I. (1987). An empirical comparison of microcomputer parsimony programs. Cladistics 3: 121-144. Poincare, H. (1901). Second complement a I’analysis situs. Proc. London Math.

Sot.

32: 277-308.

Ponstein, J. (1966). “Matrices in Graph and Network Theory,” Van Gorcum, Assen, Netherlands. Saitou, N., and Nei, M. (1987). The neighbor-joining method: A new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4: 406-425. Sharkey, M. J. (1989). A hypothesis-independent method of character weighting for cladistic analysis. Cladistics 5: 63-86. Sogin, M. L., Edman, U., and Elwood, H. (1989). A single kingdom of eukaryotes. In “The Hierarchy of Life” (B. Femholm, K. Bremer, and H. Jornvall, Eds.1, pp. 133-143, Elsevier Biomedical, Amsterdam. Sprinzl, M., Hartmann, T., Meissner, F., Moll, J., and Vorderwtilbecke, T. (1987). Compilation of tRNA sequences and sequences of tRNA genes. Nucleic Acids Res. lS(Suppl.): r53-r188. Studier, J. A., and Kepler, K. J. (1988). A note on the neighborjoining algorithm of Saitou and Nei. Mol. Biol. Evol. 5: 729-731. Swofford, D. L. (1990). PAUP documentation, version 3.Of. Trinkl, H., Lang, B. F., and Wolf, K. (1989). Nucleotide sequence of the gene encoding the small ribosomal RNA in the mitochondrial gene of the fission yeast Schizosaccharomyces pombe. Nucleic Acids Res. 17: 6730. Wolters, J., and Erdmann, V. A. (1988). Cladistic analysis of ribosomal RNAs-The phylogeny of eukaryotes with respect to the endosymbiotic theory. BioSystems 21: 209-214.

Phylogenetic inference based on matrix representation of trees.

Rooted phylogenetic trees can be represented as matrices in which the rows correspond to termini, and columns correspond to internal nodes (elements o...
778KB Sizes 0 Downloads 0 Views