LTR Retrotransposons: Structure, Replication, and Impact

Our understanding of the genome has transformed from a view of a static library of instructions to that of a dynamic and responsive entity. A significant part of this dynamism comes from mobile genetic elements, DNA sequences capable of moving to new positions within the genetic landscape. Among the most widespread of these are the Long Terminal Repeat (LTR) retrotransposons, which populate the genomes of most eukaryotic organisms, from yeast to humans. These elements are considered ancient components of their host’s genetic material.

Structure of LTR Retrotransposons

An LTR retrotransposon is a self-contained genetic unit with all the information for its own duplication. Its anatomy is defined by two identical DNA sequences at its ends, known as Long Terminal Repeats (LTRs). These LTRs contain the promoter and terminator sequences—the “start” and “stop” signals that the host cell’s machinery uses to read a gene.

Between the LTRs lies the internal coding region, which houses the genes for the element’s life cycle. The first gene, gag (group-specific antigen), codes for structural proteins that self-assemble into a protective particle for the replication process.

The other primary gene is pol (polymerase), which produces the enzymes for retrotransposition. These enzymes include reverse transcriptase, which creates a DNA copy from an RNA template, and integrase, which inserts the new DNA copy into the host’s genome. The structure of these elements and their genes are highly conserved across species.

The Replication Process

The replication of LTR retrotransposons is a “copy-and-paste” mechanism that allows them to multiply without leaving their original location. The cycle begins with transcription, where the host cell’s enzymes read the retrotransposon’s DNA and produce a single-stranded RNA molecule.

This RNA transcript, along with the reverse transcriptase and integrase enzymes from the pol gene, is packaged into a new particle formed by Gag proteins. Inside this protected environment, reverse transcription occurs. The reverse transcriptase enzyme uses the RNA transcript as a guide to synthesize a double-stranded DNA version of the retrotransposon.

The new DNA copy is then transported into the cell’s nucleus. There, the integrase enzyme cuts the host’s chromosomal DNA and pastes the new retrotransposon copy into the break. This integration is often random, meaning the new copy can land almost anywhere in the genome, a feature with significant consequences for the host.

Connection to Retroviruses

The structure and replication strategy of LTR retrotransposons closely resemble those of retroviruses like HIV, as they are believed to be evolutionary relatives. The primary distinction lies in the env (envelope) gene. In retroviruses, this gene codes for surface proteins that form an outer envelope, allowing the virus to bud from one cell and infect others. Most LTR retrotransposons lack a functional env gene, which traps them within their host cell, preventing them from becoming infectious particles.

Some LTR retrotransposons, known as Endogenous Retroviruses (ERVs), are direct remnants of ancient retroviral infections. These are retroviruses that infected the germline cells—sperm or egg—of an ancestor and became a permanent, heritable part of the genome. Over millions of years, many ERVs have accumulated mutations that render them inactive, making them molecular fossils that provide a direct record of past viral encounters.

Impact on Genome Evolution

The activity of LTR retrotransposons over millennia has been a significant force in shaping genomes. One of their most obvious impacts is on genome size, as these elements make up a substantial fraction of the DNA in many organisms. For example, they constitute about 8% of the human genome and can account for over 75% of the total DNA in plants like maize and barley.

Their ability to insert into new locations can cause mutations. If a retrotransposon lands in the middle of a protein-coding gene, it can disrupt its function. An insertion near a gene can also have significant effects, as the promoter within the element’s LTR can override normal regulation, causing a gene to be turned on or off at the wrong time or in the wrong tissue. This process of insertional mutagenesis is a source of genetic variation.

Despite their disruptive potential, these elements are also a source of evolutionary innovation. The host genome can co-opt sequences from inactive retrotransposons for new functions in a process known as molecular domestication. Their regulatory sequences can be repurposed to create new networks for controlling host genes, and entire elements can sometimes evolve into new, functional genes, providing raw material for natural selection.

Host Cell Regulation

Given the potential for disruption from active LTR retrotransposons, host organisms have developed defense mechanisms to keep them in check, resulting in an evolutionary arms race. The primary strategy the cell uses is epigenetic silencing, which acts like a molecular lock to keep the elements dormant.

This process involves chemically modifying the DNA of the retrotransposons or the proteins that package it. One method is DNA methylation, where chemical tags are attached directly to the retrotransposon’s DNA. These tags block the cell’s transcription machinery from accessing and reading the element.

Another method is histone modification. Histones are the proteins around which DNA is tightly coiled, and by modifying these proteins, the cell can pack regions containing retrotransposons more tightly. This condensed structure, known as heterochromatin, makes the DNA physically inaccessible and silent. These controls help maintain genomic stability by keeping most LTR retrotransposons inactive.