What Is Incomplete Lineage Sorting in Genetics?

Incomplete Lineage Sorting (ILS) is an evolutionary phenomenon where the genetic history of a specific DNA segment does not align with the established evolutionary history of the species that carries it. This discordance arises because genetic variation, or ancestral alleles, present in an ancestral species can persist across multiple speciation events. ILS represents a common evolutionary puzzle that complicates the accurate reconstruction of relationships between closely related organisms. The term describes the failure of these ancestral gene versions to fully separate into distinct, species-specific lineages before a new species branching event occurs. This means different parts of a single organism’s DNA may tell conflicting stories about its evolutionary past.

Understanding Gene Trees and Species Trees

To understand this discordance, it is helpful to distinguish between a species tree and a gene tree. A species tree represents the true evolutionary history of a group of organisms, illustrating the sequence of population splits that led to the different species we see today. It shows when populations became reproductively isolated and embarked on separate evolutionary paths.

A gene tree, conversely, traces the ancestry of a single gene or a specific region of the genome. It shows the history of the ancestral alleles for that DNA segment, moving backward until all copies converge into a single common ancestral sequence. Ideally, the branching pattern of every gene tree would perfectly mirror the species tree. However, because genes are passed down within populations, their individual histories are subject to random genetic processes.

The species tree represents the splitting of populations, while the gene tree tracks ancestral alleles back through time. Although every organism within a species shares the same species history, different genes can have independent histories. This occurs because their ancestral versions were passed on differently across speciation events.

The Coalescence Process and ILS Mechanism

The population genetics mechanism responsible for ILS is the coalescence process. Coalescence describes the point in the past where all copies of a gene sequence trace back to a single common ancestral molecule. This process is governed by population size and time, as it takes time for one gene version to randomly outcompete all others within a population.

ILS occurs when the genetic lineages of a particular gene fail to coalesce into a single ancestral copy before the next speciation event. For example, an ancestral species may contain two different versions of a gene, known as ancestral polymorphism. If this ancestral species splits, both daughter species might initially inherit both versions, meaning the gene copies have not yet fully “sorted” into the species lineages.

If the time interval between the first and a subsequent second speciation event is short, the ancestral gene versions will not have enough generations to fully resolve. This short interval prevents one allele from becoming fixed in the intermediate species population. When the second split occurs, the random sampling of remaining genetic diversity can lead to a gene tree that suggests an incorrect relationship between the resulting three species.

ILS is also more likely when the ancestral population was large, as a larger population maintains higher genetic diversity and requires a longer time for coalescence. This is often referred to as deep coalescence, where the common ancestor of the gene copies predates the species split itself. The short time interval between speciation events and large ancestral population size are the primary factors increasing the probability of ILS.

How ILS Affects Phylogenetic Inference

The presence of ILS creates a challenge for scientists reconstructing the evolutionary history of species, known as phylogenetic inference. If a researcher selects only a few genes to build a phylogenetic tree, and those genes are affected by ILS, the resulting tree topology may incorrectly represent the true species relationships. The gene tree might be statistically well-supported but topologically wrong compared to the actual species history.

This issue is pronounced in instances of rapid evolutionary radiation, where multiple speciation events occur in quick succession. The short intervals between species splits maximize the probability of ILS across the genome. Consequently, different genes within the same species set can provide conflicting evolutionary signals, leading to a mosaic of possible relationships.

To overcome the misleading effects of individual gene trees, modern phylogenetics relies on a genomic approach, known as phylogenomics. Researchers analyze hundreds or thousands of independent genes across the entire genome. By using multispecies coalescent models, scientists account for the expected level of gene tree discordance and statistically infer the true species tree. This approach treats ILS as background noise and uses the volume of data to find the consensus evolutionary signal.

Real-World Observations of Incomplete Lineage Sorting

Incomplete Lineage Sorting is a widely observed pattern in the genomes of many organisms. One frequently cited example involves the divergence of humans, chimpanzees, and gorillas. While the species tree suggests that humans and chimpanzees are the closest relatives, a minority of human genes show a closer relationship to the corresponding gorilla genes than to the chimpanzee genes.

Estimates suggest that roughly 23% of the human genome exhibits a gene tree topology that places humans closer to gorillas or chimpanzees closer to gorillas. This discordance is a textbook example of deep coalescence, indicating that the common ancestor of these three species maintained substantial genetic variation. This ancestral variation was then randomly sorted into the descendant species lineages, leading to conflicting patterns in modern genomes.

ILS is also prevalent in groups that have undergone recent and rapid speciation bursts. Studies on certain groups of birds, such as Darwin’s finches or various passerine bird families, show high levels of gene tree discordance due to ILS. In some primate phylogenies, the probability of ILS is estimated to affect up to 64% of the genome.