Evolution is a theory. Before we start this subject, we need to agree on what we mean by theory. For scientists theory means: a hypothesis that has never been rejected in spite of thorough testing from every possible perspective. This definition is not understood by society at large; in fact society often uses the word speculation as a synonym for theory. Thus we get such phrases as "just a theory" or "cockamamie theory" in common media. This unfortunate misconception in society leads to much confusion. Evolution is a theory because it is a hypothesis that has been thoroughly tested from every possible perspective and has never failed to explain what we have observed. The evidence for evolution is vast and diverse...evolution is not based solely upon some "fragmentary fossil record" as some elements in society may claim. In fact the "blind watchmaker" argument often given by "special creationists" as an analogy for evolution, instead applies better to the "special creation" belief than it does to evolution! There is a mountain of evidence for evolution and only a tiny fraction of it is fossil-based. The rest is genetic, physiological, developmental, phytogeographic, etc. It has been said that "nothing makes sense in biology except in the light of evolution." This statement is based upon solid logic. So theory is as close to fact as allowances for error due to chance alone permit; if evolution belonged to the Chemistry and Physics realms, where chance-error is not acknowledged, it would be called law. Evolution is not speculative.
In prerequisite courses, evolution has been defined as the change in allele frequencies from one generation to the next. Such changes in allele frequencies are reflected in genotype and phenotype frequencies. Thus as organisms evolve, we expect to see changes in their DNA, enzymes, physiology, development, and morphology, etc. Ultimately these changes result in speciation. Of course this implies that we have a decent species-concept; and that is problematic for many organisms, especially plants (more on that later in the course)!
Phylogeny is the reconstruction of evolutionary history. Since we cannot directly observe past evolutionary changes, we cannot reconstruct evolutionary history with certainty. Rather, inferences concerning the evolutionary relationships among existing (extant) groups of organisms is made from comparison of molecular, morphological or physiological characters. Of course, a great deal of information can be obtained from close examination of characters of extinct organisms found in the fossil record.
All organisms living today are the product of a series of gradual or abrupt changes in the past. Ancestral intermediate forms were adapted to some transient historical environment. Variations in the environment and sympatric organisms in the community occur continuously, so evolution will continue into the future. Many of the ancestral forms failed to adapt to a changing environment and their lineages became extinct (A', C'). Other forms were especially suited to a particular environment that has been available up to today and are thus extant in a more-or-less relict state (D'). Thus, if we suppose that a species G has evolved from an ancestor A, we may still find representatives of stages D' and E" extant while other lineages and various intermediate forms became extinct:
The diagram above is called a dendrogram or cladogram. The evolutionary pathway has a tree-like structure. There are major axes (internodes), such as A-G and E-E", which are pathways of anagenesis and branching points (nodes) where the evolutionary process called cladogenesis occurs. A clade is a group of organisms including a common ancestor and all of its descendants; in other words, a clade is monophyletic.The group E"-E-G would be a clade, but the group E"-D-G would not be a clade as it would not include D'. The group C-G (along the main axis of this dendrogram) is obviously not a clade and is therefore called paraphyletic. A good example of a group of vertebrates that is paraphyletic would be reptiles (C-G), where C' might be dinosaurs, D' might be birds, E" might be mammals, and G might be extant reptiles. Of course, the more ancient a cladogenesis event is, the more difficult it is to gather all of its descendants (extinct and extant). In our example, there are nine extinct species that belong in the clade originating at step A...are all nine members present in the fossil record?
Examination of the fossil record allows us to: 1) make inferences about the nature of past environments, 2) reconstruct ancient biological communities, 3) estimate abundance of component species, 4) examine rates of evolutionary change and 5) understand the origin of the assemblage of extant organisms. However, the interpreting the fossil record is not trivial. Some species, due to structure and habitat, have outstanding fossil records, other species due to their structure or habitat may have a poor record. While a fossil can be very helpful in phylogenetic analysis, fossils are not required for phylogenetic reconstruction.
Rather, much phylogenetic reconstruction is based upon examination of extant species using their characteristics. The characteristics of the most-ancient of ancestors (A in our example) are all called plesiomorphic. Through evolution of those characters, new states appear which are called apomorphic. A character state that evolves early in evolution (say at step D in our example) may be found in all descendants of D; an apomorphy that is shared among these descendants is called a synapomorphy. The reconstruction of phylogeny in cladistics depends upon analysis of shared-derived caracteristics (synapomorphies) to show hypothetical pathways of evolution.
Darwin wrote that a natural classification must be based on genealogy. In the 1950s and 1960s Hennig published an explicit technique for basing classification on genealogy, a method he called phylogenetic systematics. This revolutionary method is now usually called cladistics. Phylogenetic systematics usually consists of two different processes:
Step 1. | Select a group of species (or genera, or families, etc.) that you believe to be monophyletic. This study group is termed the ingroup. |
Step 2. | Select the molecular, morphological, or physiological characters you plan to use. Divide each character into alternative conditions (states). You want to find as many characters as possible that have two or more states. Here are some examples:
|
Step 3. | Polarize the states of each character. For each character, you need to determine which state is the oldest (plesiomorphic) and which state(s) is/are derived (apomorphic). In our the example above, you need to know if the superior ovary or the inferior ovary appeared first in the history of your group, etc. There are several methods of doing this which provide better evidence when used in tandem:
|
Step 4. | Build a data matrix. In a table, list each organism and record the states of the selected characters. The outgroup (OG) should have the pleisiomorphic (0) state in all characters. If a taxon lacks a character, is is best to code it as an additional state called "absent." This happens with Character 3 for species B because it is hairless.
| |||||||||||||||||||||||||||||||||||
We will work as a class team to analyze this data matrix:
| Character Number | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Taxon | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| OG | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| A | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 |
| B | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 |
| C | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 |
| D | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 |
| E | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 |
Step 5. | Make the phylogenetic tree from trunk to twigs. Taxa are grouped by shared-derived (apomorphic) character states. Start with characters that group the most taxa (8, 4), enclosing within a circle all of the apomorphic character states of the same value. Then move on to those that group fewer taxa (3, 10 then 1, 6, 9). Leave until last, those characters (2, 5, 7) for which only one taxon has the apomorphic state (autapomorphy). We must be willing to make numerous trees, abandoning earlier versions as we work. Note that the shape of the tree is determined by the characters. |
Node - where branching (cladogenesis = speciation) occurs
Internode - the pathway between nodes (anagenesis). A character
state change is indicated with a slash across the internode of a tree.
Note that any internode can be cut, everything above it rotated 180 degrees,
and the phylogenetic tree is not fundamentally changed.
Sister Group - a species or a higher monophyletic group hypothesized to be the closest genealogical relative of a given taxon exclusive of the ancestral species of both taxa.
Which taxon or taxa is/are most closely related to taxon C?
Which taxon or taxa is/are most closely related to taxon A?
What is the sister group of A? ...of B? ...of C?
After slashing each character state change on the internodes, how many steps long is this tree?
See how well you can put a tree together for the following data matrix... (hint: 4, 8, 3, 10, 1, 9, 2, 5, 7, 6)
| Character Number | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Taxon | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| OG | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| A | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 |
| B | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 |
| C | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 |
| D | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 |
| E | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
You may use the * the symbol to show that a character is changing from the plesiomorphic to the apomorphic state more than once. In this data matrix, we are forced to put one of the characters on the tree more than once. When a character state evolves independently in two places in a cladogram, it is called a homoplasy. The shortest tree (11 steps) shows parallel or convergent evolution of the apomorphic state of that character. Whether the independent appearance of a character is a parallelism or a convergence depends on the degree of separation in the cladogram. For example, wings evolving in insects and birds would most-likely be a convergence; hairs evolving twice in a group of legumes would most-likely be a parallelism.
Another tree we could consider for this data matrix has 12 steps and shows shows gain (0→1) of the apomorphic state near the base of the tree and reversal (1→0) in two different places on the tree. A reversal is another kind of homoplasy; one must consider whether the gain of a function is more plausible than a loss of a function before accepting either of these two cladograms as the most-acceptable.
The goal of cladistics is to construct the most parsimonious phylogenetic tree. The most parsimonious tree has the fewest character state changes. We can also describe this most-parsimonious tree as having the minimum length, the fewest inferred evolutionary steps, or as being the shortest phylogenetic tree.
In our second example, there was a single, most-parsimonious tree having 11 character state changes. This most-parsimonious tree has 11 evolutionary steps on it. There is no shorter tree for that data matrix; however it is critical to note that our runner-up with 12 steps might be considered most-likely under some circumstances.
A computer can assist you in finding the most-parsimonious tree from a data matrix. We will attempt to use some simple computer software to do this. The software we use is free and available at:
After that page has been renedered on your internet browser, scroll down that page to find your combination of hardware and operating system. Then download both the documentation and c-sources as well as the executables files for your computing system. This software is much more primitive (faster) than you are probably used to using, and the documentation is quite opaque in my opinion. Basically the executables are a series of short programs that work on a source file (your data matrix!) that you place in the same folder (subdirectory).
The source file is a plain text file that has the following structure. In the first line of the file has three spaces, the number of taxa, three spaces, then the number of characters. Each subsequent line in the file has 10 characters or spaces for the taxon and then one character for each state for the characters used. It will make your work easiest if the out-group is the first taxon. Thus the source file for our Second Example looks like:
...6...10¦
OG.........0000000000¦
A.........0011100101¦
B.........1001001100¦
C.........0111010111¦
D.........0011010111¦
E.........1001010100¦
¦
I have used a . symbol to indicate the spaces and the ¦ symbol to show the end of the line (where you hit return or enter). This file needs to be a plain-text file, so when you save it to the hard-drive you need to specify that in the Save-As dialog box in Microsoft Word. Sometimes Simple Text is a better word-processor for creating this file. If you are a Wintel operator, you are on your own.
Now that your source file is saved in the correct folder (subdirectory), and is in the correct conformation, you can double-click any of the executable files to analyze the data matrix.
Finding the Most-Parsimonious Tree. The executable Penny will create the most-parsimonious tree very quickly. When it opens a window, it will first ask you for the filename of your source file...it is a primitive dialog box and the > symbol is basically a prompt. Just type the filename for your source file and hit return or enter. Now if your matrix has the out-group as the first taxon, you can avoid taking the options and just hit a Y key and then return or enter. The software will find your most-parsimonious tree and create an outfile and a treefile in the folder (subdirectory). These files can be opened in Simple Text or Microsoft Word and printed out. The treefile is not too helpful at this point.
Printing out the Tree. The executable Drawgram will access the treefile that Penny created and produce a plotfile to print out. Double-click on Drawgram and it will prompt you for a font to use. Just type font1 at the prompt and hit return or enter. Then you can choose what to do for the printer. For our departmental computers, hit L for Laserwriter and hit return or enter. Then you can choose to see a preview or not, and then hit Y and return or enter to produce the plotfile. The software will create the plotfile but will not print it out. You have to drag the plotfile onto the icon for your printer and then it will print out.
For the tree we have made from our Second Example, calculate the consistency index (C.I.) for each character, one at a time. The Consistency Index for a character with two states (0 and 1) is: 1/the number of character state changes of that character. The Consistency Index for a character with three states (0,1 and 2) is: 2/the number of character state changes of that character. Thus for character 6, the C. I. is 1/2. "2" is in the denominator because for this character there are two character state changes. The C. I. for characters 1 - 5 and 7 - 10 is 1.
See what you can do using Phylip to create a most-parsimonious tree for this data matrix:
| Character Number | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Taxon | 1 | 2 | 3 | 4 | 5 | |||||
| OG | 0 | 0 | 0 | 0 | 0 | |||||
| A | 1 | 0 | 0 | 0 | 0 | |||||
| B | 1 | 1 | 0 | 1 | 0 | |||||
| C | 1 | 1 | 0 | 1 | 1 | |||||
| D | 1 | 1 | 1 | 0 | 1 | |||||
| E | 1 | 1 | 1 | 0 | 1 | |||||
Consensus Trees
Suppose your data set results in not just one most-parsimonious tree, but two or more different trees all having the fewest character state changes. These equally most-parsimonious trees are equally plausible evolutionary scenarios.
The Strict Consensus Tree
This is a new tree you or the computer constructs that shows only the agreement among the equally most-parsimonious trees. The strict consensus tree has less resolution, but we have more confidence in it than we have in any one of the equally most-parsimonious trees.
All shared-derived character states are commonly given equal weight in cladistics, but systematists know that it is harder for some changes to occur than others. Loss of a structure is easier than gain of a structure, loss of a restriction site is easier than gain of a restriction site. Now, with computer programs we can weight characters but this introduces some subjectivity! How much harder is it for the ovary position to change than for the plant to evolve forked hairs from unbranched hairs...5X or 10X?
Maddison, W. P. and Maddison, D. R. 1992. MacClade. Analysis of Phylogeny and Character Evolution, version 3. Sinauer Associates, Inc., Sunderland Mass., USA.
Mayr, E. and P. D. Ashlock. 1991. Principles of Systematic Zoology, 2nd ed. McGraw-Hill, Inc.
Wiley, E. O. 1981. Phylogenetics, The Theory and Practice of Phylogenetic Systematics. John Wiley & Sons, New York.
Wiley, E.O., D. Siegel-Causey, D.R. Brooks, and V.A. Funk. 1991. The Compleat Cladist: A Primer of Phylogenetic Procedures. The University of Kansas Museum of Natural History, special publication no. 19, Lawrence.
This page © Ross E. Koning 1994.
Go to the Course Schedule Page.
Go to the Plant Physiology Information Homepage.
Send comments and bug reports to Ross Koning at rkoning@snet.net.