Restriction site-associated DNA sequencing, or RAD sequencing, is a genetic method for sampling specific regions across the genomes of many individuals. The principle is “genome reduction.” Instead of sequencing an organism’s entire genetic code, which is costly, RAD sequencing focuses only on DNA sequences next to specific cut sites made by molecular tools.
This targeted approach allows researchers to efficiently compare genetic variations, particularly single nucleotide polymorphisms (SNPs), among many samples. Similar to comparing massive libraries by reading only page 100 from each book, RAD sequencing provides a representative snapshot of the genome. This makes it a powerful tool for studies in population genetics and evolutionary biology.
The RAD Sequencing Laboratory Process
The process begins with extracting high-quality DNA. The first step is DNA digestion, where a specific restriction enzyme acts as molecular scissors, cutting the DNA only at a particular sequence. This cleaves the long DNA strands into millions of smaller fragments, each starting with the same enzyme recognition sequence.
Next, small, custom-designed DNA sequences called adapters are attached to the fragment ends through ligation. These adapters contain a unique molecular barcode, a short sequence that differs for each individual sample. This barcode allows researchers to mix samples for sequencing and later trace the data back to its source.
After adapters are attached, the DNA from all individuals is pooled into a single tube in a step called multiplexing. This combined library is then subjected to random shearing, often using sonication, to break the fragments into smaller pieces suitable for sequencing platforms. The goal is to create fragments where the original cut site and adapter are still present at one end.
The next stage is size selection. The laboratory procedure isolates only those fragments that fall within a narrow, predetermined size range, often using gel electrophoresis. This step enriches the sample for the targeted fragments—those containing the restriction site and the ligated adapter—while discarding unwanted DNA.
The final step before sequencing is PCR amplification. The size-selected fragments are used as a template in a polymerase chain reaction (PCR) to create millions of identical copies. This amplification ensures the DNA concentration is high enough for the sequencing instrument to read accurately. The resulting library is then loaded onto a sequencer to generate the raw genetic reads.
Analyzing RAD Sequencing Data
After sequencing, the raw data undergoes several computational steps, starting with demultiplexing. A bioinformatics script sorts the mixed reads into separate bins for each individual by reading the unique molecular barcodes attached during the laboratory phase.
With the data sorted, the next step is quality filtering. Because sequencing technologies are not perfect, software is used to identify and discard low-quality reads, trim away adapter sequences, and remove data that does not meet accuracy criteria. This cleanup ensures subsequent analyses are based on reliable genetic information.
The filtered reads from all individuals are then clustered to assemble RAD loci. Software groups all identical or highly similar reads together. Each cluster represents a specific location in the organism’s genome adjacent to a restriction enzyme cut site, known as a RAD locus. For organisms without a previously sequenced genome, this step, called de novo assembly, discovers and assembles these loci from scratch.
The final analytical step is SNP calling. A single nucleotide polymorphism, or SNP, is a variation at a single position in a DNA sequence among individuals. With reads from all individuals aligned at each RAD locus, the software scans these locations to identify sites where individuals have different DNA bases. The result is a matrix detailing the genetic variant each individual carries at thousands of SNP locations.
Key Applications in Research
The resulting SNP dataset is a powerful resource for answering a diverse set of biological questions.
- Population genetics: Researchers study genetic differences within and between populations. The data can reveal patterns of genetic diversity, identify population boundaries, and measure gene flow, such as determining if fish from different parts of a lake are one intermingling population or several genetically separate ones.
- Phylogenetics: By comparing thousands of SNPs across closely related species, researchers can build detailed evolutionary trees, known as phylogenies. These trees illustrate how different species are related and can help pinpoint when they diverged, such as untangling relationships among island-dwelling birds.
- Conservation genetics: This data informs the management of threatened or endangered species. By assessing the genetic health and diversity of a population, managers can identify populations with low genetic diversity at risk of inbreeding or highlight unique genetic lineages that warrant specific protection.
- Genetic mapping: Scientists link genetic markers (SNPs) to observable traits, a process known as quantitative trait locus (QTL) mapping. This is used in agriculture to identify the genetic basis of desirable traits like disease resistance in crops or faster growth rates in farmed fish, allowing for more efficient selective breeding.
Variations of the RAD Method
The original RAD sequencing protocol has inspired several variations tailored for different research goals or budgets. One widely used alternative is double-digest RAD sequencing (ddRAD), which employs two different restriction enzymes. Using a second enzyme creates fragments with known cut sites at both ends, providing greater control over which fragments are sequenced. This precision leads to more consistent data and eliminates the need for the random shearing step.
Another popular method is Genotyping-by-Sequencing (GBS), a more streamlined and cost-effective version of RAD sequencing. The GBS laboratory workflow is simpler, involving fewer steps and reagents, which reduces the cost per sample. This makes it optimized for generating data from hundreds or thousands of samples, as is common in agricultural genetics.
These variations highlight the flexibility of restriction enzyme-based sequencing. The choice between RAD, ddRAD, or GBS depends on the scientific question, the number of samples, the organism’s genome size, and the available budget. Researchers select the protocol that best balances the need for data density, consistency, and cost, making these methods a staple in modern genomics.