Repbase: The Database for Repetitive DNA

Repbase is a specialized bioinformatics database dedicated to repetitive DNA sequences found within eukaryotic genomes. It provides a comprehensive collection of these patterns, serving as a reference for scientists worldwide. Repbase supports genomic research, from identifying genetic variations to exploring evolutionary relationships between species.

Understanding Repetitive DNA

Repetitive DNA refers to sequences that appear multiple times throughout an organism’s genome. These sequences can be short or much longer, extending to hundreds or even thousands of nucleotides. In many eukaryotic organisms, repetitive DNA makes up a substantial portion of the genome, sometimes exceeding 50% of the total genetic material.

There are two main categories of repetitive DNA: tandem repeats and interspersed repeats. Tandem repeats are sequences arranged directly adjacent to each other. Examples include microsatellites (1-9 base pairs) and minisatellites (10-100 base pairs). These repeats are often found in specific regions like telomeres, which protect chromosome ends, and centromeres, involved in chromosome segregation during cell division.

Interspersed repeats are scattered throughout the genome, often separated by non-repetitive DNA. The most common type are transposable elements (TEs), often called “jumping genes” for their ability to move or copy themselves within the genome. These elements are divided into two classes: retrotransposons and DNA transposons. Retrotransposons, such as LINEs and SINEs, copy themselves by converting their RNA into DNA, then inserting the new DNA copy into a new location. DNA transposons move directly as DNA through a “cut-and-paste” mechanism.

The Significance of Repetitive DNA

Repetitive DNA, especially transposable elements, impacts genome evolution. Their ability to move and multiply can lead to genomic rearrangements, such as insertions, deletions, and inversions. These changes can alter gene order, create new genes, or modify existing gene functions over time, influencing the adaptation and diversification of species.

Repetitive sequences also play a role in gene regulation and chromosomal structure. Some repetitive elements can act as regulatory signals, influencing gene expression by serving as binding sites for proteins. Repetitive DNA is also a component of heterochromatin, a tightly packed form of DNA that organizes chromosomes and regulates gene accessibility.

Misregulation or abnormal expansion of repetitive DNA can contribute to human diseases. For example, the expansion of specific tandem repeats is linked to over 40 human genetic disorders, many of which affect the brain. Examples include Huntington’s disease, Fragile X syndrome, and certain spinocerebellar ataxias, where increased repeats can disrupt gene function or lead to toxic protein products.

What is Repbase?

Repbase is a curated bioinformatics database of repetitive DNA sequences from eukaryotic species. Developed in 1992 for human sequences, it has expanded to include over 100 species, encompassing animals, plants, and fungi. Repbase provides a comprehensive, regularly updated reference library of known repetitive sequences.

The database contains over 44,000 sequences, mostly representing consensus sequences of large repeat families. These sequences are systematically classified based on the nature of the repeats for organized and accessible data. The Genetic Information Research Institute (GIRI), a non-profit organization, maintains and curates Repbase through ongoing updates and manual review.

Repbase also publishes Repbase Reports, a monthly electronic journal documenting newly identified repetitive DNA elements. These new entries are incorporated into the main Repbase database. This continuous curation and expansion make Repbase a standard for researchers studying repetitive DNA for analyzing and annotating genomes.

How Repbase Aids Scientific Discovery

Repbase plays a role in scientific discovery, particularly in genome annotation. When scientists sequence a new genome, they use Repbase as a reference to identify and “mask” repetitive sequences. This helps distinguish repeats from unique, protein-coding regions, allowing them to focus on biologically active parts of the genome.

Scientists also use Repbase to study evolutionary relationships. Since transposable elements accumulate and evolve, comparing repetitive sequences across different species can reveal shared ancestry and divergence. Repbase data is also used to investigate gene expression, as repetitive elements can influence nearby gene activity.

Several bioinformatics tools leverage Repbase data to perform these analyses. Programs like RepeatMasker and Censor utilize the Repbase library to identify and classify repetitive elements in newly sequenced DNA. These tools allow researchers to efficiently analyze vast amounts of genomic data, accelerating the pace of discovery in fields such as disease research, where understanding repetitive DNA’s role in genetic disorders and cancer is gaining attention.