The Process of Transcription in Eukaryotes

Transcription is the fundamental biological process that transfers genetic information from a DNA sequence into a complementary RNA molecule, marking the first step in gene expression. In eukaryotes, this process occurs within the nucleus, separating it from translation, which takes place in the cytoplasm. This compartmentalization, along with the organization of DNA into chromatin, introduces complexity and regulation not seen in prokaryotic cells. The goal of transcription is to produce various functional RNA molecules, including those that serve as templates for protein synthesis.

The Specific Roles of Eukaryotic RNA Polymerases

The transcription machinery in eukaryotes relies on three distinct RNA polymerases, each dedicated to transcribing a specific set of genes. This specialization ensures the coordinated production of all cellular components. RNA Polymerase I (Pol I) is sequestered within the nucleolus, where it is solely responsible for transcribing the genes that encode the majority of ribosomal RNA (rRNA) components. These rRNA molecules form the structural and catalytic foundation of the ribosome.

RNA Polymerase III (Pol III) handles the transcription of smaller RNA molecules crucial for translation. Its main products are transfer RNA (tRNA) molecules, which deliver amino acids to the ribosome, and the 5S rRNA subunit. Pol III also synthesizes other small nuclear and cytoplasmic RNAs.

RNA Polymerase II (Pol II) is the enzyme responsible for synthesizing all protein-coding genes, producing the precursor messenger RNA (pre-mRNA) that will be processed into mature mRNA. Pol II also transcribes microRNAs (miRNAs) and most small nuclear RNAs (snRNAs), which are involved in gene regulation and RNA processing. Because it generates protein templates, Pol II is the most tightly regulated polymerase, requiring many accessory proteins to begin its work.

Initiation: Building the Pre-Initiation Complex

Transcription initiation for protein-coding genes begins with the assembly of the Pre-Initiation Complex (PIC) at the gene’s promoter region. This process is orchestrated by General Transcription Factors (GTFs) that recruit RNA Polymerase II (Pol II) to the DNA. The first step involves the TATA-binding protein (TBP), a subunit of the TFIID complex, recognizing and binding to the TATA box, a sequence motif found about 25 to 30 nucleotides upstream of the transcription start site.

TBP binding causes a significant bend in the DNA helix, acting as a platform for the sequential recruitment of other GTFs, including TFIIA, TFIIB, TFIIF, TFIIE, and TFIIH. TFIIF helps position the Pol II enzyme near the transcription start point. The fully assembled PIC is now poised to begin transcribing the gene.

The transition from a closed complex to an open complex, where the DNA strands separate to form the transcription bubble, is catalyzed by the helicase activity of TFIIH. The final step of initiation is promoter escape, triggered by TFIIH phosphorylating the C-terminal domain (CTD) of the largest Pol II subunit. The CTD consists of multiple repeats of a seven-amino-acid sequence, and its phosphorylation acts as a molecular switch. This modification releases Pol II from most GTFs and alters its shape, enabling it to move into the elongation phase.

Elongation and Signaling the End

In the elongation phase, RNA Polymerase II moves processively along the DNA template strand. The polymerase unwinds the double helix ahead of it, maintaining a transcription bubble of about 14 base pairs. Within this bubble, the enzyme synthesizes the complementary RNA strand by adding ribonucleoside triphosphates in the 5′ to 3′ direction.

The newly synthesized RNA briefly forms a hybrid helix with the DNA template before peeling away, allowing the DNA to re-anneal behind the enzyme. Elongation is not continuous; the polymerase can pause, backtrack, and requires elongation factors to maintain speed and fidelity. The Pol II CTD remains highly phosphorylated, serving as a dynamic binding site for various RNA-processing factors.

Termination for Pol II-transcribed genes is tightly coupled to the processing of the 3′ end of the nascent RNA. This process is signaled by the transcription of a specific sequence, typically an AAUAAA polyadenylation signal (PAS) in the pre-mRNA. Once transcribed, this signal is recognized by cleavage and polyadenylation factors that bind to the Pol II CTD.

These factors cleave the RNA transcript downstream of the signal, releasing the pre-mRNA. Transcription then stops via one of two mechanisms. The Torpedo model suggests that a 5′ to 3′ exonuclease, like XRN2, degrades the remaining RNA fragment attached to Pol II until it catches up and forcibly dislodges the polymerase from the DNA. The Allosteric model suggests that the binding of cleavage factors induces a conformational change in Pol II that destabilizes the enzyme, causing it to dissociate.

Preparing RNA for Translation

The primary RNA transcript (pre-mRNA) must undergo three mandatory post-transcriptional modifications to become mature mRNA capable of leaving the nucleus for translation. These modifications often begin while transcription is still underway.

The first modification is the addition of a 5′ cap, which occurs very early. This cap is 7-methylguanosine, a modified guanine nucleotide attached to the 5′ end via an unusual triphosphate linkage. The cap protects the transcript from degradation and is later recognized by the ribosome to initiate protein synthesis.

The second modification is 3′ polyadenylation, which involves adding a long tail of adenine nucleotides (the poly-A tail) to the newly cleaved 3′ end. This tail enhances the stability of the mRNA molecule and facilitates its export from the nucleus.

The third and most complex modification is RNA splicing, which removes non-coding introns and joins the coding exons together. Splicing is carried out by the spliceosome, a large ribonucleoprotein complex composed of small nuclear ribonucleoproteins (snRNPs). The spliceosome recognizes specific sequences, excises the intron in a looped lariat structure, and ligates the adjacent exons. Splicing allows for alternative splicing, where different combinations of exons from a single pre-mRNA can encode multiple distinct proteins.