How to Do RNA Sequencing: From Sample to Data

RNA sequencing, often called RNA-seq, offers a comprehensive view into the activity of genes within a cell or tissue. This laboratory technique measures the abundance of RNA molecules, which are transient copies of genetic instructions found in DNA. By quantifying these RNA molecules, scientists can determine which genes are active, or “expressed,” and to what extent, providing insights into cellular functions and responses. RNA-seq is a versatile tool that has transformed our understanding of biological processes and disease mechanisms by revealing the dynamic nature of gene expression.

Preparing Samples for the Sequencer

Preparing biological samples for RNA sequencing begins with extracting RNA from cells or tissues. This process requires careful handling to preserve the integrity of these fragile molecules. The quality and quantity of the isolated RNA are assessed using specialized instruments, as high-quality RNA is important for accurate sequencing results. Researchers look for an RNA Integrity Number (RIN), a numerical assessment of RNA degradation, aiming for scores above 7 to ensure reliable data.

Following extraction, messenger RNA (mRNA) molecules, which carry protein-coding instructions, are often isolated from other RNA types like ribosomal RNA (rRNA). This is because rRNA is highly abundant and can obscure signals from less common mRNA molecules. The purified mRNA is then converted into complementary DNA (cDNA) using an enzyme called reverse transcriptase. This cDNA is more stable than RNA and serves as the template for subsequent sequencing steps.

The cDNA molecules are then fragmented into smaller pieces, typically 150 to 500 base pairs in length. Specialized adapter sequences are then attached to both ends of these cDNA fragments. These adapters contain sequences necessary for binding to the sequencing instrument’s flow cell, and for the amplification and identification of each fragment during sequencing. This series of steps, from RNA isolation to adapter ligation, is known as library preparation, creating cDNA fragments ready for sequencing.

Generating Sequence Reads

Once the cDNA libraries are prepared, they are loaded onto a specialized sequencing instrument, such as those utilizing sequencing by synthesis. The prepared library fragments are first attached to a solid surface called a flow cell, where they are amplified into clusters of identical DNA copies. Each cluster contains millions of copies of a single DNA fragment, making the signal strong enough for detection.

During the sequencing process, fluorescently labeled nucleotides are introduced one at a time. An enzyme called DNA polymerase incorporates them into the growing DNA strand, complementary to the template fragment. As each nucleotide is added, a unique fluorescent signal is emitted, which is captured by cameras within the sequencing instrument. This signal corresponds to the specific base (A, T, C, or G) that was incorporated.

After each base is incorporated and its signal recorded, the fluorescent label is chemically removed, allowing the next nucleotide to be added. This cyclical process of nucleotide incorporation, signal detection, and label removal is repeated hundreds of times for each cluster on the flow cell. The instrument then compiles these sequential signals to reconstruct the original sequence of each cDNA fragment, generating millions or even billions of short sequence “reads” from the entire sample. Each read represents a small piece of the original RNA transcript, typically between 50 and 300 base pairs long.

Interpreting the Data

After generating millions of short sequence reads, the raw data must be processed computationally to extract meaningful biological information. The first step in this bioinformatics pipeline involves aligning, or mapping, these short reads to a known reference genome for the organism being studied. Software algorithms are used to match each read to its corresponding location on the genome, accounting for genetic variations or sequencing errors. This alignment process determines which gene each read originated from.

Once the reads are mapped, the next step is to quantify gene expression levels. This is done by counting the number of reads that align to each gene. A higher number of reads mapping to a specific gene suggests that the gene was more actively expressed, or transcribed, in the original biological sample. These raw counts are then normalized to account for differences in sequencing depth between samples and gene length, allowing for accurate comparisons of gene expression levels.

Researchers then compare these normalized gene expression levels between different experimental conditions, such as treated versus untreated cells, or diseased versus healthy tissues. Statistical analyses identify genes that show significant changes in expression, meaning they are “upregulated” (increased expression) or “downregulated” (decreased expression) under specific conditions. Data visualization tools, such as heatmaps or volcano plots, represent these complex expression patterns, helping researchers identify affected pathways or biological processes.

Real-World Applications of RNA Sequencing

RNA sequencing is a tool across various fields of biological and medical research, providing insights into gene activity under diverse conditions.

Cancer Research

In cancer research, RNA-seq identifies specific gene expression signatures that characterize different tumor types, predict patient response to therapies, and discover new drug targets. Researchers compare gene activity in tumor cells versus healthy cells to pinpoint genes aberrantly expressed in cancer, leading to a deeper understanding of disease progression.

Infectious Diseases

The technique is applied in understanding infectious diseases, helping scientists unravel how pathogens like viruses or bacteria interact with host cells by observing changes in host gene expression during infection. This can reveal host immune response pathways or identify genes the pathogen manipulates for its survival. Such studies are important for developing new vaccines and antiviral or antibiotic treatments by targeting these specific host-pathogen interactions.

Developmental Biology

RNA-seq contributes to developmental biology, allowing researchers to track gene expression changes as an organism grows and differentiates from a single cell into complex tissues and organs. This provides a molecular blueprint of development, revealing the precise timing and location of gene activation that orchestrates cellular fate and tissue formation. Understanding these processes can shed light on developmental disorders and regenerative medicine strategies.

Agriculture

In agriculture, RNA-seq improves crop yields and resilience by identifying genes associated with desirable traits, such as drought tolerance, pest resistance, or enhanced nutritional content. By analyzing gene expression in different plant varieties or under various environmental stresses, scientists can pinpoint genetic markers for selective breeding programs. This application helps develop crops better adapted to changing climates and capable of meeting global food demands.