How Are Phylogenetic Trees Constructed?

A phylogenetic tree visually represents the evolutionary history and relationships among different biological species. It illustrates how these groups diverged and evolved from common ancestors over time, serving as a diagrammatic hypothesis of life’s branching pattern and shared ancestry.

Foundational Data

Constructing phylogenetic trees relies on various types of biological data. Molecular data, such as DNA, RNA, and protein sequences, are frequently used due to their abundance and comparability. These sequences are generally less prone to convergent evolution, where unrelated organisms independently develop similar traits, compared to physical characteristics.

Most evolutionary relationships are inferred from molecular sequence data because genetic material can be sequenced quickly, inexpensively, and reliably. Molecular data can also reveal evolutionary relationships at various taxonomic levels. Morphological data, including physical characteristics and anatomical features, also play a role, particularly for fossil species where molecular information is unavailable.

Preparing the Data

Before a phylogenetic tree can be built, collected biological data must undergo sequence alignment. This process involves arranging DNA, RNA, or protein sequences to identify regions of similarity and evolutionary relationships.

Alignment ensures that homologous positions are correctly compared across different sequences. Gaps are often inserted into sequences to account for insertions or deletions that occurred during evolution, ensuring accurate lining up of corresponding sites. Without proper alignment, comparing sequences and inferring evolutionary distances would be difficult.

Building the Tree

Construction of a phylogenetic tree involves computational methods that infer evolutionary relationships from the prepared data. These methods can broadly be categorized into distance-based, parsimony, and likelihood-based approaches. Each method employs different principles to determine the most probable tree.

Distance-based methods

Distance-based methods begin by calculating a numerical “distance” or dissimilarity score between each pair of sequences, reflecting their genetic differences. Algorithms like Unweighted Pair Group Method with Arithmetic Mean (UPGMA) or Neighbor-Joining (NJ) then use these distances to group organisms, placing those with the smallest distances closer together on the tree. These methods are computationally efficient and are useful for large datasets or exploratory analysis.

Parsimony methods

Parsimony methods operate on the principle of simplicity, seeking the tree that requires the fewest evolutionary changes or mutations to explain the observed data. This approach assumes that evolution tends to follow the path of least resistance, minimizing the total number of character state changes across the tree. Biologists use computer programs to evaluate numerous possible trees and select the one with the minimum number of inferred changes.

Likelihood-based methods

Likelihood-based methods, which include Maximum Likelihood and Bayesian Inference, use statistical models of evolution to find the tree that is most “likely” to have produced the observed data. Maximum Likelihood analysis calculates the probability of observing the sequence data given a specific tree and an evolutionary model, then selects the tree with the highest probability. Bayesian Inference, a related approach, combines prior knowledge about possible trees with the data’s likelihood to generate a posterior probability for various trees, identifying the most probable evolutionary history. These methods account for varying rates of evolution and different types of mutations, providing a more nuanced statistical framework for tree inference.

Understanding the Tree’s Story

Once constructed, a phylogenetic tree illustrates evolutionary divergence and relatedness. Understanding its key components is essential.

The points where branches split, known as nodes, represent common ancestors. Internal nodes signify an inferred common ancestor of the groups descending from that point, while terminal nodes represent the species or groups being studied today.

The lines connecting these nodes are called branches, representing evolutionary lineages. The length of these branches can sometimes indicate the amount of evolutionary change or the passage of time. A clade encompasses a common ancestor and all of its descendants. Clades are nested, meaning smaller clades are contained within larger ones, reflecting the hierarchical nature of evolution.

A tree can be either rooted or unrooted. A rooted tree explicitly shows a single common ancestor for all organisms in the tree, indicating the overall direction of evolution. In contrast, an unrooted tree illustrates the relationships among groups without specifying a common ancestor or an evolutionary direction. Interpreting a phylogenetic tree reveals patterns of how species have diversified and how closely related they are, based on their shared ancestral history.