How to Sequence DNA: The Process and Major Methods

DNA sequencing is a laboratory technique that determines the precise order of nucleotides within a DNA molecule. These chemical building blocks are adenine (A), thymine (T), cytosine (C), and guanine (G). Their specific arrangement holds the biological information that guides the development and operation of cells, allowing scientists to understand the fundamental blueprint of life.

Why We Sequence DNA

DNA sequencing offers wide-ranging applications across various fields, providing insights that impact health, research, and beyond. In medicine, it plays a role in personalized treatments, allowing doctors to tailor therapies based on an individual’s genetic makeup. It also aids in diagnosing genetic diseases and identifying pathogens like viruses and bacteria. For instance, sequencing can detect disease indicators years before a typical diagnosis, which is important in cancer treatment for patient outcomes.

In research, DNA sequencing helps scientists understand gene function, including how genes direct an organism’s growth and interact with each other. It also explores evolutionary biology by tracing ancestry and relationships between species. This knowledge can support drug discovery by identifying potential drug targets and understanding drug-gene interactions.

Beyond medicine and research, DNA sequencing has applications in forensics, where it is used to identify individuals and assist in solving crimes. By analyzing DNA evidence from crime scenes, investigators can generate unique genetic fingerprints to link evidence to suspects. In agriculture, sequencing contributes to improving crop yields and enhancing disease resistance in plants. It enables researchers to identify desirable traits, such as drought tolerance or better nutrition, which can then be incorporated into new plant varieties through breeding programs.

The Fundamental Process of DNA Sequencing

DNA sequencing aims to determine the exact order of the four nucleotide bases: adenine (A), thymine (T), cytosine (C), and guanine (G). These bases pair specifically—A with T, and C with G—forming the foundation of the DNA double helix.

The general process involves breaking down a long DNA molecule into smaller, manageable fragments. These fragments are then individually read to identify the sequence of bases within them. Special markers or signals are used to distinguish each type of base as it is identified. Once the sequences of these smaller pieces are determined, computational tools are used to reassemble them, much like putting together a puzzle, to reconstruct the full DNA sequence. This assembly relies on overlapping regions between the fragments to correctly align them into a complete and continuous sequence.

Major DNA Sequencing Technologies

The evolution of DNA sequencing has seen significant advancements, moving from slower, single-fragment methods to rapid, high-throughput approaches. Sanger sequencing, also known as the “chain termination method,” emerged in the late 1970s and was the dominant method for nearly 40 years. This technique relies on DNA polymerase to synthesize new DNA strands, incorporating modified nucleotides called dideoxynucleotide triphosphates (ddNTPs). These ddNTPs cause DNA synthesis to terminate randomly when incorporated.

In Sanger sequencing, fluorescently labeled ddNTPs are incorporated into growing DNA strands, creating fragments of varying lengths. These fragments are then separated by size, and the order of the fluorescent labels is read to determine the DNA sequence. While accurate for individual DNA pieces, Sanger sequencing is relatively slow and expensive for large-scale projects, processing only one fragment at a time.

Next-Generation Sequencing (NGS), also known as massively parallel sequencing, enabled the simultaneous sequencing of millions to billions of DNA fragments. Introduced commercially around 2005, NGS platforms leverage a “sequencing by synthesis” (SBS) chemistry. In this approach, DNA fragments are prepared and amplified to create clonal clusters on a flow cell. Fluorescently labeled nucleotides are added one at a time; as each correct nucleotide is incorporated, a signal is detected and the fluorescent tag is removed, allowing the next nucleotide to be added in a cyclical process. This massively parallel nature allows NGS to generate vast amounts of DNA sequence data at a faster pace and lower cost compared to Sanger sequencing, making whole-genome sequencing more feasible.