Decoding genetic information helps scientists understand biological processes and disease origins. Specialized computational tools analyze this data, uncovering intricate details about how our bodies function. One such tool, rMATS, assists scientists in examining specific changes in how genetic instructions are read. It provides insights into various biological states, from normal development to disease progression, by systematically comparing genetic information between different conditions.
The Biological Process of Alternative Splicing
Genetic information is organized into genes, blueprints for building proteins. Each gene contains coding segments called exons and non-coding regions called introns. Splicing removes introns, joining exons to form a continuous messenger RNA (mRNA) molecule.
While basic splicing produces a single mRNA, many genes undergo alternative splicing. This mechanism allows a single gene to produce multiple distinct mRNA molecules and different proteins. Alternative splicing selectively includes or excludes certain exons, or uses different start and end points, to generate diverse protein versions from the same genetic blueprint. This flexibility expands the functional diversity of proteins within a cell.
Introducing the rMATS Tool
Scientists need methods to detect differences in alternative splicing patterns between biological conditions, such as healthy versus diseased cells. The rMATS tool, standing for Replicate Multivariate Analysis of Transcript Splicing, is bioinformatics software designed for this. It identifies “differential alternative splicing events”—changes in how genes are spliced—by analyzing RNA-sequencing (RNA-Seq) data.
rMATS takes RNA-Seq data as input, providing a snapshot of all RNA molecules in a sample. Its objective is to statistically determine if specific alternative splicing events occur at different rates between two experimental groups. The tool is useful when researchers have multiple biological samples, or replicates, for each condition, allowing for a statistically sound comparison.
Splicing Events Detected by rMATS
rMATS detects five main types of alternative splicing events, each representing a distinct way a gene’s mRNA can be modified. These patterns allow researchers to categorize and understand the nature of splicing changes. The software analyzes RNA-Seq data to identify these variations.
Skipped Exon (SE)
One common event is a Skipped Exon (SE). In this scenario, an exon typically included in the mRNA transcript in one condition is entirely left out, or “skipped,” in another. This results in a shorter protein version that may have altered or absent functions.
Alternative 5′ Splice Site (A5SS)
An Alternative 5′ Splice Site (A5SS) event involves the use of two or more different donor sites at the beginning of an exon. This means the upstream boundary of an exon can vary, leading to an mRNA transcript that either includes a longer or shorter segment at its 5′ end. This change can affect the protein’s N-terminus or introduce new protein domains.
Alternative 3′ Splice Site (A3SS)
Conversely, an Alternative 3′ Splice Site (A3SS) occurs when two or more different acceptor sites are used at the end of an exon. This alters the downstream boundary of an exon, resulting in an mRNA transcript with a longer or shorter segment at its 3′ end. Such variations can impact the protein’s C-terminus or modify its interaction sites.
Mutually Exclusive Exons (MXE)
Mutually Exclusive Exons (MXE) describe a situation where one of two or more exons is included in the final mRNA transcript, but never both. If one exon is present, the other is excluded, and vice versa. This mechanism allows for the production of distinct protein isoforms from the same gene, each containing a different internal segment.
Retained Intron (RI)
Finally, a Retained Intron (RI) event happens when an intron, which is normally removed during splicing, remains within the mature mRNA transcript. This can lead to the introduction of premature stop codons, often resulting in a truncated and non-functional protein. Retained introns are frequently associated with disease states.
The rMATS Analysis Workflow
Analyzing alternative splicing with rMATS begins with high-throughput RNA-sequencing data, typically as aligned reads. These reads are stored in BAM (Binary Alignment Map) files, indicating where RNA sequences map to the reference genome. A gene annotation file (GTF or GFF3 format) is also provided, defining gene structures, including exon and intron locations.
rMATS proceeds with its analysis in two main steps: “prep” and “post”. In the “prep” step, the software processes BAM files, extracting and quantifying reads that support different splicing outcomes for each potential alternative splicing event. For instance, it counts reads spanning splice junctions indicating exon inclusion versus skipping.
The “post” step integrates processed data from all samples and applies a statistical model to compare splicing patterns between experimental groups. rMATS accounts for read count uncertainty within samples and biological variability across replicates. This analysis determines whether observed splicing differences are significant, producing a list of differential alternative splicing events.
Interpreting and Applying rMATS Findings
After rMATS analysis, the output provides a list of statistically significant differential splicing events. A key metric is the “Percent Spliced In” (PSI or Ψ) value, which quantifies the proportion of transcripts including a particular exon or splice variant. For example, a PSI of 0.90 for a skipped exon means that exon is included in 90% of transcripts for that gene. rMATS calculates PSI values for each sample and determines the difference in average PSI (ΔPSI) between groups.
To assess reliability, rMATS also provides statistical values like the p-value and False Discovery Rate (FDR). The p-value indicates the probability of observing a difference by random chance. The FDR, derived from p-values, controls the proportion of false positive findings among significant results, offering a robust measure of confidence. Researchers typically filter results for events with a substantial ΔPSI change and a low FDR (e.g., less than 0.05 or 0.01), indicating a reliable differential splicing event.
Interpreting rMATS findings requires connecting the data to biological function. A researcher might identify a gene where rMATS shows a significant retained intron in cancer cells compared to healthy cells. This splicing change could lead to a non-functional protein, contributing to uncontrolled cell growth or other cancer hallmarks. These computational findings guide further laboratory experiments to validate changes and investigate their functional consequences, linking molecular alterations to disease mechanisms or cellular behaviors.