How Does Next-Generation Sequencing (NGS) Work?

Next-Generation Sequencing (NGS) determines the precise order of nucleotides within DNA or RNA molecules on an unprecedented scale. Often called massively parallel sequencing, this technology reads millions of DNA fragments simultaneously, revolutionizing the field of genomics. NGS succeeded traditional Sanger sequencing, which was slow and labor-intensive, only able to sequence one fragment at a time. The parallel nature of NGS drastically increased throughput, making large-scale projects like whole-genome sequencing feasible and routine. The entire process involves converting genetic material into a specialized library, amplifying fragments into detectable clusters, chemically reading the sequence, and processing the resulting data.

Preparing the Genetic Material

The initial phase of NGS involves converting the raw biological sample into a standardized sequencing library. This preparation ensures the genetic material is compatible with the sequencing instrument. The starting material, which can be DNA or cDNA converted from RNA, is first broken down into smaller pieces, typically 200 to 600 base pairs long. Fragmentation is achieved through physical methods, such as sonication, or enzymatic methods.

After fragmentation, the ends of these short pieces are enzymatically repaired to create blunt ends. A single adenine base is often added to the 3’ end of each fragment, known as A-tailing. This modification prepares the fragments for the attachment of synthetic DNA sequences called adapters.

Adapters are short, customized oligonucleotides ligated to both ends of every fragmented piece. These specialized sequences serve several functions:

  • Providing binding sites for primers used during amplification and sequencing.
  • Containing sequences complementary to oligonucleotides fixed to the flow cell surface, allowing physical attachment.
  • Incorporating molecular barcodes, or indexes, which allow multiple distinct samples to be sequenced simultaneously.

Massively Parallel Amplification

After library preparation, the fragments must be amplified to generate a strong signal for the sequencer’s detection system. Since NGS platforms cannot detect a single molecule, millions of copies of each unique fragment must be created. This amplification is performed in situ, occurring directly on the flow cell surface, a glass slide containing millions of fixed primers.

The most common technique for this clonal amplification is bridge amplification. Here, the fragmented DNA “bridges” between two different types of fixed primers on the flow cell surface. A DNA polymerase extends the strand, creating a double-stranded bridge. Denaturation separates the two strands, and each single strand folds over to anneal to a nearby complementary primer, forming a new bridge.

Repeated cycles of denaturation and extension result in the formation of dense, localized clusters of identical DNA molecules. Each cluster originates from a single initial library fragment and acts as a distinct sequencing reaction. A single flow cell can support hundreds of millions of these clusters, generating the massively parallel array of templates needed for detection.

The Core Sequencing Reaction

With the clonal clusters established, the instrument begins determining the sequence using Sequencing by Synthesis (SBS). SBS is a cyclical process where a single base is identified at a time for every cluster across the flow cell. The reaction relies on the controlled incorporation of special fluorescently labeled nucleotides, which also function as reversible terminators.

In each cycle, all four nucleotide bases (A, C, G, and T) are simultaneously introduced to the flow cell, each tagged with a distinct fluorescent dye. The terminator component ensures that only a single complementary base is added to the growing DNA strand by the polymerase.

After incorporation, the flow cell is washed to remove unincorporated bases. A high-resolution camera then captures the fluorescent signal emitted by the newly added base in every cluster. The specific color recorded at each location indicates the identity of the incorporated base.

Following imaging, a chemical cleavage step removes both the fluorescent dye and the terminating component from the nucleotide. This deblocking action prepares the 3’ end of the strand to accept the next incoming nucleotide. The entire cycle of adding bases, imaging, and cleaving is repeated hundreds of times. This sequential readout generates a short sequence of data, known as a “read,” for every cluster on the flow cell.

Data Processing and Interpretation

The output from the sequencing instrument is a massive collection of raw images and fluorescent intensity measurements, not a linear genetic sequence. The first computational step is base calling, where raw light signals are digitally converted into nucleotide bases (A, T, C, or G). Each base call is assigned a quality score (Phred or Q-score), which represents the probability that the base was incorrectly identified.

These digital sequences, or reads, are then subjected to quality filtering. This process removes reads that are too short or have unacceptably low quality scores, as these introduce errors that compromise accuracy. Adapter sequences still attached to the ends of the fragments are also computationally trimmed to ensure accurate downstream analysis.

The cleaned, high-quality reads are now ready for alignment or assembly. Alignment involves computationally mapping the short reads back to a known reference genome, such as the human genome. Algorithms determine the most probable location where each read originated. For samples without a known reference, de novo assembly pieces the short reads together based on overlapping sequences to construct a new genome. Biologists use the final aligned data set to identify genetic variants, drawing conclusions about the sequenced sample.