What Is Index Hopping in DNA Sequencing?

Index hopping is a phenomenon in DNA sequencing where genetic material from one sample is incorrectly assigned to another during the sequencing process. This misassignment affects the accuracy of genomic data, leading to misalignment of reads or incorrect assumptions in downstream analyses. Understanding this technical issue is important for ensuring the reliability of biological insights derived from DNA sequencing.

How We Read DNA: A Quick Overview

Modern DNA sequencing techniques enable processing many samples at once. This efficiency uses multiplexing or indexing, where unique molecular “barcodes” are added to each DNA sample. These barcodes are attached to DNA fragments during library preparation.

These unique identifiers allow researchers to combine many DNA samples into a single sequencing run. After sequencing, computational tools sort the reads back to their original samples based on these barcodes, a process known as demultiplexing. This strategy increases throughput and reduces costs.

The Mix-Up: What is Index Hopping?

Index hopping occurs when sequencing reads are incorrectly assigned to a different sample’s index. This happens because the index sequences, which act as unique molecular tags, can detach from their original DNA fragments and then incorrectly reattach to DNA fragments from other samples within the pooled mixture. The resulting DNA fragments then carry a swapped or “hopped” index.

This phenomenon is observed on sequencing platforms using patterned flow cells and exclusion amplification (ExAmp) chemistry, such as Illumina’s HiSeq 4000, NovaSeq 6000, and NextSeq 2000 systems. In these systems, DNA fragments and amplification primers exist in solution rather than being permanently bound to a surface, increasing the chance for free-floating adapters to anneal to unintended fragments.

This can lead to the amplification of reads with the wrong index combinations, which then pass through standard quality filters. A significant contributor to index hopping is residual adapter contamination during library preparation, where incomplete removal of indexing primers or ligated adapters creates a pool of free-floating barcodes that can participate in cross-sample annealing.

Keeping Our Data Clean: Solutions and Strategies

Index hopping can skew data, leading to false positives or negatives, and sometimes necessitating re-sequencing. For instance, in single-cell RNA sequencing, even low levels of index hopping (less than 1%) can generate “phantom molecules” that complicate downstream analysis, affecting cell characterization and potentially overestimating cell numbers. In clinical sequencing, sample cross-talk needs to be kept below 0.2% to meet regulatory standards.

To address this, researchers employ several strategies, with “unique dual indexing” (UDI) being a widely adopted solution. UDI involves tagging each sample with two distinct barcodes, one on each end of the DNA fragment, where the combination of these two barcodes is unique to that specific sample within the pooled run. If index hopping occurs with a UDI setup, the resulting hopped read will carry an unexpected combination of barcodes that does not match any valid sample in the run, allowing it to be computationally filtered out during data analysis.

Additional approaches to reduce index hopping include optimizing library preparation to minimize residual free adapters and primers, as high levels of these can increase hopping rates. For example, PCR-free library preparation methods may show higher rates of index hopping compared to methods that include a PCR amplification step, possibly due to fewer cleanup steps that remove unligated adapters. Storing libraries individually at -20°C and pooling them just prior to sequencing can also help mitigate this issue. The use of UDIs has been shown to reduce sample cross-contamination from 1% or more down to below 0.01% on some systems.