Tajima’s D: A Test for Natural Selection

Population geneticists use statistical tools to understand the evolutionary history of organisms by analyzing their DNA. One such tool is Tajima’s D, a test developed in 1989 that examines patterns of genetic variation. The test’s goal is to determine if a gene is evolving randomly under a “neutral” model or if its evolution is influenced by non-random processes. These processes include natural selection or major demographic events, and identifying them helps researchers find genomic regions that hold clues about a species’ history.

Foundations of the Test: Neutrality and Diversity

The basis for Tajima’s D is the Neutral Theory of Molecular Evolution. This theory suggests that most genetic changes at the molecular level are governed by random chance rather than natural selection. Most genetic variation within a population results from a steady accumulation of mutations and their subsequent random fluctuation in frequency, a process known as genetic drift. In a population that has maintained a stable size, these forces of mutation and drift are expected to reach an equilibrium.

Tajima’s D compares two measurements of genetic diversity. The first is nucleotide diversity (π), the average number of genetic differences between any two DNA sequences chosen at random from the population sample.

The second measure is the number of segregating sites (S), a count of the total number of positions within the DNA sequence that show variation. For example, if a specific location is an ‘A’ in some sequences and a ‘G’ in others, that location is one segregating site. Under the neutral theory for a population at equilibrium, these two standardized measures of diversity are expected to be roughly equal.

Interpreting Tajima’s D Values

The Tajima’s D statistic summarizes the difference between nucleotide diversity (π) and the number of segregating sites (S). When observed patterns of genetic variation align with the predictions of the Neutral Theory for a stable population, the value of Tajima’s D will be close to zero. This outcome suggests the gene is evolving neutrally, without significant influence from selective pressures or major population size changes. It serves as the baseline or null hypothesis for the test.

A negative Tajima’s D value (D < 0) indicates an excess of rare, low-frequency genetic variants compared to what would be expected under neutrality. One cause is a recent and rapid population expansion. As a population grows, new mutations arise, but they are initially rare and have not had time to spread widely, leading to an abundance of low-frequency variants. Another cause is a "selective sweep," where a newly beneficial mutation quickly increases in frequency, carrying nearby linked DNA with it and erasing previous genetic variation. Conversely, a positive Tajima's D value (D > 0) signifies a deficit of rare variants and a higher-than-expected number of variants at intermediate frequencies. This pattern can be the result of a recent population bottleneck, where a sharp reduction in population size has occurred. Such an event randomly eliminates many rare variants, leaving behind a smaller set of variations at more moderate frequencies. Another explanation for a positive value is balancing selection, a process where natural selection actively maintains multiple different versions of a gene in the population.

Applications in Genetic Research

Scientists use Tajima’s D as a scanning tool to sift through data in an organism’s genome. By calculating the D statistic for thousands of genes, researchers can identify specific genes that deviate significantly from zero. These genes are flagged as candidates for having been influenced by natural selection. For example, studies in the human genome have revealed that genes involved in immune responses show signs of selection, which helps pinpoint the genetic basis of adaptations to pathogens.

The test is also used for reconstructing the demographic history of a species. While a single gene with a negative Tajima’s D might indicate selection, a consistent pattern of negative values across the entire genome is evidence of a past population expansion. This is the kind of evidence used to support the “Out of Africa” model of human history, which posits that modern human populations expanded from a small founding group. Researchers use these findings alongside other population genetic tests to build a more complete picture of a species’ past.