Phylogeny, the study of evolutionary relationships, historically relied on comparing physical characteristics like bone structure. This approach often led to misinterpretations because similar traits can evolve independently in different lineages. Molecular biology introduced a more direct method: comparing the sequence of genetic material, specifically DNA and RNA. Since DNA is inherited and changes accumulate over time, its comparison provides a quantifiable record of ancestry, tracing lineages beyond what the fossil record or morphology alone can reveal.
The Molecular Clock and Common Ancestry
The ability to use DNA to determine evolutionary time rests on the principle of the molecular clock, first proposed in the 1960s. This suggests that mutations accumulate at a relatively constant rate over millions of years. These changes serve as a measurable ticker for evolutionary time, especially in genomic regions not strongly affected by natural selection.
When two species share a recent common ancestor, their DNA sequences are highly similar, having had less time to diverge. A greater number of differences indicates a more distant common ancestor and a longer period of evolutionary separation. Scientists calculate the approximate time elapsed since lineages split by counting genetic differences and dividing by the estimated mutation rate. This rate is often calibrated using known dates from the fossil record or geological events to provide a timescale.
Preparing Genetic Data Through Sequence Alignment
Before DNA sequences are analyzed for relatedness, they undergo multiple sequence alignment (MSA). This procedure involves lining up DNA segments from all species under study to identify homologous positions—nucleotides inherited from a single common ancestor. The quality of this alignment is foundational, as phylogenetic analysis relies entirely on this homology assessment.
The alignment process introduces gaps to account for insertions or deletions (indels) that occurred during evolution. A gap signifies that a DNA segment was either lost in one lineage or gained in another since the common ancestor. These gaps are treated as significant evolutionary events contributing to the overall measure of divergence. The resulting alignment matrix is the raw data set used by computational programs to infer the evolutionary tree.
Computational Methods for Building Phylogenetic Trees
The aligned DNA sequences are fed into specialized computer programs that employ various algorithms to reconstruct the most probable evolutionary history. These methods fall into two categories, each with its own mathematical approach.
Distance methods are computationally efficient and calculate a single numerical score representing the overall evolutionary distance between every pair of sequences. Algorithms like Neighbor-Joining use this distance matrix to progressively cluster the most similar sequences, building the tree structure from the tips inward.
In contrast, character-based methods analyze each nucleotide position, or “character,” individually across all species simultaneously.
Character-Based Methods
Maximum Parsimony seeks the tree that requires the fewest total evolutionary changes to explain the observed sequence differences. Maximum Likelihood and Bayesian methods utilize explicit statistical models of DNA evolution to calculate the probability of the data given a specific tree structure. These methods evaluate numerous possible tree topologies, selecting the one statistically most likely to have produced the observed genetic data. Because the choice of method and evolutionary model can influence the result, multiple approaches are often compared.
Reading and Interpreting Tree Diagrams
The final product is the phylogenetic tree, a visual hypothesis of the species’ evolutionary history. The tips represent the species or genes compared, and the lines, called branches, illustrate the evolutionary lineages connecting them. The points where branches meet are nodes, each representing a hypothetical common ancestor from which descendant lineages diverged.
A clade is a group that includes a node and all of its descendants. Branch length often holds meaning: it can be proportional to the number of evolutionary changes, or, when the molecular clock is applied, it can represent the actual time elapsed since divergence. To assess reliability, scientists use confidence metrics like bootstrap values, which are percentages placed on internal branches. A high bootstrap value, typically near 100%, indicates strong support for the grouping of species following that branch.