Why Is Gene Similarity Lower Than Protein Similarity?

Genes and proteins are fundamental components of life. Genes serve as blueprints for building and maintaining organisms, while proteins carry out most cellular functions. Genes are DNA segments storing genetic information in a sequence of nucleotides. Proteins are complex molecules of amino acids, which fold into unique three-dimensional structures to perform their diverse roles.

A phenomenon observed in biology is that genetic sequences often show lower similarity compared to the protein sequences they encode. This raises the question of why the relationship between gene and protein similarity is not always direct.

From Gene to Protein: The Molecular Journey

The journey from gene to protein involves a two-step process: gene expression. First, transcription occurs in the nucleus of eukaryotic cells, where a gene’s DNA sequence is copied into a messenger RNA (mRNA) molecule. This mRNA then travels into the cytoplasm, carrying genetic instructions.

In the cytoplasm, mRNA encounters ribosomes, the cellular machinery for protein synthesis. This initiates translation, where the mRNA sequence is read in groups of three nucleotides called codons. Each codon specifies a particular amino acid, and transfer RNA (tRNA) molecules bring the corresponding amino acids to the ribosome. The amino acids are then linked together, forming a polypeptide chain that folds into a functional protein. This process ensures genetic information is converted into functional proteins.

The Redundant Genetic Code

The genetic code itself is a primary reason for the observed difference in gene and protein similarity. The genetic code is degenerate, or redundant, meaning more than one three-nucleotide codon can specify the same amino acid. There are 64 possible codons, but only 20 common amino acids, meaning most amino acids are encoded by multiple codons. For example, both leucine and serine are specified by six different codons.

This redundancy allows flexibility in the genetic sequence without altering the resulting protein. A single nucleotide change within a gene might lead to a new codon that still codes for the same amino acid. Such a change is a silent or synonymous mutation because it has no effect on the protein’s amino acid sequence. Consequently, two genes can differ in nucleotide sequence due to these silent mutations, yet produce proteins identical in amino acid composition, resulting in higher protein similarity despite lower gene similarity.

Impact of DNA Changes on Protein Structure

Even when a DNA change results in an altered amino acid sequence, its impact on the overall protein can be minimal. This occurs through conservative amino acid substitutions. These involve replacing one amino acid with another that has similar chemical properties, such as size, charge, or hydrophobicity. For instance, replacing valine with leucine, both nonpolar amino acids, might have a less disruptive effect on protein structure and function than replacing valine with a charged amino acid like aspartic acid.

The three-dimensional structure of a protein, which dictates its function, is often robust enough to tolerate minor changes without significant disruption. If an amino acid change occurs in a region of the protein that is not critical for its function or stability, or if the new amino acid is chemically similar to the original one, the protein may still fold correctly and retain its activity. Thus, even with some non-silent mutations, the resulting protein can still be functionally and structurally very similar to the original.