Target enrichment next-generation sequencing (NGS) is a technique used in genetics and molecular biology. It allows scientists to focus on specific parts of a genome rather than sequencing the entire genetic code. This method captures regions of interest within a DNA or RNA sample for focused sequencing. The goal is to gain detailed genetic information from these selected areas.
Why Focus on Specific DNA Regions?
Sequencing an entire genome can be expensive and generate vast amounts of data not relevant to a specific research question. Target enrichment concentrates sequencing efforts on particular genes or regions of interest, significantly reducing costs and making large-scale studies more feasible.
Focusing on specific regions also allows for much greater sequencing depth. Researchers can achieve depths of 1,000x or higher in targeted areas, which allows for detecting rare genetic variants that might be missed with whole-genome sequencing. This increased depth improves the accuracy of variant identification, especially for low-frequency mutations. By narrowing the scope of sequencing, the amount of data generated is smaller, which simplifies data analysis and requires fewer computational resources. This enhances efficiency and accelerates the discovery of genetic insights.
How Target Enrichment Works
The workflow for target enrichment sequencing involves several steps: DNA fragmentation, library preparation, enrichment, and subsequent sequencing on an NGS platform. The enrichment step isolates specific DNA regions from the rest of the genome. Two primary methods are used for this isolation: hybrid capture and amplicon sequencing.
Hybrid Capture
Hybrid capture, also known as sequence capture, uses specially designed probes to “bait” and isolate target DNA fragments. The DNA sample is first fragmented, and adapters are added to create a sequencing library. Biotinylated oligonucleotide probes, complementary to the regions of interest, are then added and bind to their matching DNA fragments through hybridization. The probe-bound DNA fragments are isolated using streptavidin-coated magnetic beads, which bind to the biotin on the probes, allowing unwanted DNA to be washed away. This method is suited for capturing larger target regions, including entire exomes or large gene panels.
Amplicon Sequencing
Amplicon sequencing, also referred to as PCR-based enrichment, uses polymerase chain reaction (PCR) to amplify specific DNA regions. Short DNA primers are designed to flank the desired genomic sequences. These primers bind to the target regions, and PCR then creates many copies of only those specific DNA segments, known as amplicons. This approach is faster and more cost-effective than hybrid capture, making it suitable for smaller target regions or for applications with low amounts of starting DNA. After amplification, sequencing adapters and unique barcode indexes are added to these amplicons before they are sequenced.
Where Target Enrichment is Used
Target enrichment next-generation sequencing is applied across various scientific and medical fields:
- Clinical diagnostics: It helps identify genetic mutations linked to inherited diseases, such as cystic fibrosis, or to specific types of cancer. This allows for more precise diagnoses and personalized treatment strategies.
- Cancer research: The technique detects somatic mutations, which are genetic changes occurring in tumor cells but not inherited. It enables deep sequencing of tumor samples, including those with low tumor content or degraded DNA like FFPE tissues or circulating tumor DNA (ctDNA). This high sensitivity allows for detecting mutations present in small fractions of cells, even down to 0.1–0.2% variant allele frequency.
- Microbiology: The method is used for characterizing microbial communities and identifying specific pathogens. It can resolve microbial DNA or RNA sequences from complex samples dominated by host DNA, which is useful for difficult-to-culture bacteria or viruses.
- Pharmacogenomics: Target enrichment helps understand how individual genetic variations influence a person’s response to medications. This information can guide tailored drug dosages and prescriptions, improving treatment effectiveness and reducing adverse reactions.
- Agrigenomics: The technique supports large-scale genotyping projects in areas like aquaculture, livestock farming, and seed breeding. It allows for cost-effective and efficient screening of thousands to tens of thousands of genetic markers, accelerating the identification of desirable traits in crops and livestock.
Considerations for Target Enrichment Sequencing
While target enrichment sequencing offers many advantages, certain aspects require careful consideration to ensure reliable results.
Coverage Uniformity
Coverage uniformity refers to how evenly each targeted region is sequenced. Uneven coverage can lead to “holes” in the data, where some regions are sequenced at a much lower depth than others, potentially hindering variant detection. Low-quality or degraded DNA samples can impact this uniformity, leading to reduced coverage in certain areas.
Off-Target Reads
The presence of off-target reads, which are DNA sequences from non-targeted regions inadvertently sequenced, is another consideration. While target enrichment aims to minimize this, some off-target sequencing can occur, especially with hybrid capture methods, potentially wasting sequencing capacity. These non-specific reads need to be filtered out during data analysis to ensure focus remains on the regions of interest.
Bioinformatics and Quality Control
Large datasets generated by NGS, even with targeted approaches, necessitate specialized bioinformatics tools and expertise for processing and interpretation. The analysis workflow involves steps such as aligning reads to a reference genome, variant calling, and filtering out duplicate reads or low-quality data. Quality control (QC) steps are important throughout the workflow, from initial DNA input to sequencing, to ensure high-quality and reliable data. These QC measures help identify and address potential issues that could compromise result accuracy.