How Was the Genetic Code Deciphered?

The deciphering of the genetic code in the 1960s solved a profound mystery in molecular biology. The challenge lay in translating the four-letter alphabet of genes (DNA and RNA bases: Adenine, Uracil, Guanine, Cytosine) into the twenty-amino-acid alphabet of proteins. This translation process, defined by the Central Dogma, requires precise chemical rules for a four-symbol code to specify a twenty-symbol product. Scientists needed to determine the minimum number of nucleotides required to specify a single amino acid and then map every combination to its corresponding protein building block.

Establishing the Triplet Structure of the Code

The first step was determining the size of the “word” specifying one amino acid. Mathematically, since four bases could only code for sixteen amino acids (\(4^2=16\)), a code of at least three bases was necessary to account for all twenty standard amino acids (\(4^3=64\)). In 1961, Francis Crick, Sydney Brenner, and colleagues provided genetic evidence for this triplet structure using bacteriophages and the mutagen proflavin.

Proflavin causes frameshift mutations by inserting or deleting single base pairs, shifting the gene’s reading frame. This typically results in a non-functional protein. The researchers found that combining a single insertion and a single deletion often restored function, suggesting the reading frame was re-established.

Crucially, function was also restored when three insertions or three deletions were introduced. This demonstrated that the genetic message was read sequentially in non-overlapping units of three bases. Reading in blocks of three ensured that adding or subtracting a multiple of three bases minimally disrupted the protein sequence, confirming the triplet code.

The First Experimental Assignment of a Codon

The theoretical understanding of the codon was quickly followed by the first direct biochemical proof. In 1961, Marshall Nirenberg and Heinrich Matthaei developed a cell-free protein synthesis system using ruptured E. coli bacteria. This in vitro system contained all necessary cellular machinery—ribosomes, transfer RNA (tRNA), and enzymes—but lacked a natural messenger RNA (mRNA) template.

By adding synthetic RNA, they controlled the message being translated. Their breakthrough used a synthetic RNA molecule composed entirely of Uracil bases, known as poly-U. They tested 20 reaction tubes, each containing the extract and a different radioactively labeled amino acid.

The tube with phenylalanine showed dramatic incorporation into a newly synthesized protein chain. This experiment proved that the repeating Uracil sequence directed the production of a polypeptide composed solely of phenylalanine. Thus, the first codon was established: UUU codes for Phenylalanine.

Systematically Mapping the Remaining Codons

The poly-U experiment was a monumental first step, but homopolymers (like poly-U or poly-A) could only assign a few codons. Systematically deciphering the remaining 61 sense codons required more sophisticated methods.

Defining Codons Using Repeating Polymers

Chemist Har Gobind Khorana developed techniques to synthesize RNA polymers with defined, repeating sequences. For instance, Khorana synthesized a polymer with an alternating Uracil and Guanine sequence, (UG)n, which was read as UGU and GUG codons. The resulting polypeptide contained an alternating sequence of Cysteine and Valine, proving UGU and GUG coded for those two amino acids, though the exact assignment was ambiguous. By creating different repeating sequences, such as (AAG)n, Khorana’s team narrowed down possibilities for many other codons.

The Triplet Binding Assay

The final piece came from Nirenberg’s lab with the development of the filter-binding assay, or triplet binding assay, in collaboration with Philip Leder. This technique used chemically synthesized, short RNA triplets. They found that a specific triplet could bind a ribosome and recruit the transfer RNA (tRNA) carrying its corresponding amino acid. Using radioactive amino acids and filtering the mixture, researchers quickly and accurately determined which amino acid was bound to the ribosome-triplet complex. This simple, high-throughput method allowed Nirenberg and Leder to rapidly assign the majority of the remaining 64 codons by 1966, completing the genetic code table.

Finalizing the Code and Its Key Characteristics

The combined work revealed a complete genetic code with several specific characteristics. With 64 possible codons and only 20 amino acids, the code was found to be redundant, meaning multiple codons specify the same amino acid (e.g., Leucine is coded by six different triplets).

This redundancy provides protection against single-base mutations, as a change in the third position often still results in the correct amino acid incorporation. The coding sequence begins with the start codon AUG, which codes for Methionine and signals the ribosome to begin translation.

Translation concludes when the ribosome encounters one of three specific stop codons: UAA, UAG, or UGA. These “nonsense” codons signal the release of the completed protein chain. Notably, the code was found to be nearly universal, meaning these 64 codons specify the same amino acids across almost all forms of life, underscoring a common evolutionary origin.