DNA, or deoxyribonucleic acid, is the biological instruction manual for all known life, organized into the double helix structure. This molecule resembles a twisted ladder, with sides made of sugar and phosphate units, and rungs formed by pairs of nitrogenous bases. These four bases are Adenine (A), Thymine (T), Cytosine (C), and Guanine (G), which pair complementarily (A with T, and C with G). The specific order of these base pairs along the DNA strand is the base sequence, which contains the code directing an organism’s development and function. Reading this sequence has evolved from laborious chemical reactions to automated, high-speed digital processes.
The Foundational Sanger Method
The first systematic method to reliably read DNA was Sanger sequencing, or the chain-termination method, developed by Frederick Sanger in the 1970s. This technique established the fundamental principle of reading bases sequentially, though it is slow compared to modern methods. The method uses a DNA polymerase enzyme to synthesize a new DNA strand. The reaction mixture includes chemically modified nucleotides called dideoxynucleotides (ddNTPs).
Unlike normal nucleotides, ddNTPs lack the necessary attachment point for the next base. When DNA polymerase incorporates a ddNTP, the strand synthesis is immediately terminated. Scientists use four separate reactions, each containing a different fluorescently labeled ddNTP (A, T, C, or G), to generate fragments of every possible length. These fragments are separated by size, allowing the sequence to be read one base at a time from the shortest fragment to the longest.
The Current Standard Next-Generation Sequencing
Sanger sequencing was a low-throughput process, reading only a single DNA fragment at a time. Next-Generation Sequencing (NGS) transformed the field by using Sequencing by Synthesis (SBS) to read millions of fragments concurrently. NGS begins by fragmenting the genome into small pieces, typically a few hundred base pairs long, and attaching specialized adapter sequences.
These adapted fragments are loaded onto a solid surface, called a flow cell, where they are amplified into clusters of identical DNA copies. This massive parallelization allows a single instrument to analyze countless starting molecules simultaneously. The core SBS chemistry involves adding DNA polymerase and a mixture of all four types of fluorescently labeled nucleotides to the flow cell.
Each nucleotide is engineered with a reversible termination group, ensuring only one base is added per strand during each cycle. After incorporation, a camera captures the unique fluorescent signal, identifying the base (A, T, C, or G). The terminator and fluorescent tag are then chemically cleaved and washed away, preparing the strand for the next cycle. This cycle repeats hundreds of times, building the sequence of each cluster base-by-base. The process generates terabytes of data requiring extensive computational analysis, or bioinformatics, to assemble the entire genome.
Emerging Third-Generation Technologies
Third-Generation Sequencing (TGS) builds on NGS speed by focusing on reading single, long molecules of DNA directly. These technologies generally eliminate the time-consuming amplification steps required by previous methods.
Nanopore Sequencing
Nanopore sequencing uses a protein pore embedded in a membrane. As a single DNA strand is guided through this tiny pore by a motor protein, each passing base temporarily obstructs the pore differently. This obstruction causes a measurable change in the electrical current flowing across the membrane. A computer algorithm interprets these electrical signals to determine the sequence of bases in real-time.
PacBio SMRT Sequencing
Another TGS method, Single Molecule Real-Time (SMRT) sequencing by Pacific Biosciences (PacBio), uses specialized reaction chambers called zero-mode waveguides (ZMWs). A DNA polymerase is fixed inside these chambers, and as it synthesizes a new strand, it incorporates fluorescently labeled nucleotides. The incorporation of each base is observed as a flash of light, allowing the sequence to be read in real-time. Both Nanopore and PacBio offer extremely long read lengths, which simplifies the assembly of complex genomes.