Proteins are complex molecules that serve as the fundamental working units within all living cells. They are essential for virtually every bodily function, from catalyzing chemical reactions and providing structural support to transporting molecules and defending against pathogens. Determining the exact number of distinct human proteins presents a considerable challenge due to the intricate biological complexity involved.
The Current Understanding of Protein Numbers
The human genome contains approximately 19,000 to 20,000 protein-coding genes. While each gene can initiate the production of at least one protein, the total count of distinct proteins is significantly higher. Estimates suggest the human body contains 80,000 to 400,000 different protein types, with some projections reaching up to a million, depending on how a “distinct protein” is defined. This difference highlights the dynamic nature of protein production and modification within human cells.
Why the Number is Dynamic and Complex
The discrepancy between the number of protein-coding genes and the greater number of distinct proteins arises from sophisticated biological mechanisms. One primary mechanism is alternative splicing, where messenger RNA (mRNA) segments from a single gene are rearranged or excluded. This allows one gene to produce multiple mRNA transcripts, each leading to a unique protein variant or isoform. Over 95% of multi-exon human genes undergo alternative splicing, greatly expanding the potential protein repertoire.
Proteins also undergo post-translational modifications (PTMs) after synthesis. These chemical changes attach functional groups or other proteins to the nascent molecule. Examples include phosphorylation, glycosylation, and methylation, among over 650 known types. These modifications can alter a protein’s function, stability, localization, or interactions, increasing protein diversity and roles within the body.
The Science of Protein Discovery
Scientists employ proteomics to identify, quantify, and characterize proteins on a large scale. This specialized field aims to understand the entire set of proteins present in a cell, tissue, or organism at a given time, known as the proteome. Proteomics uses advanced technologies to unravel the complexity of protein expression and function.
Mass spectrometry (MS) is a fundamental technique in protein discovery. This method determines the mass-to-charge ratio of protein fragments, helping identify proteins and their modifications. Proteins are typically broken into peptides, then analyzed by the mass spectrometer. Tandem mass spectrometry (MS/MS) further fragments peptides to determine amino acid sequences, enabling identification against protein databases. Other methods like chromatography and electrophoresis also separate proteins before MS analysis, aiding in studying complex mixtures.
Mapping the Human Proteome
The Human Proteome Project (HPP) is an international scientific collaboration coordinated by the Human Proteome Organization (HUPO). Launched in 2010, the HPP aims to systematically identify and characterize all human proteins, building on knowledge from the Human Genome Project. The project seeks to understand the structure and function of these proteins, including their variants and post-translational modifications.
The HPP’s goals include creating a comprehensive catalog of human proteins, which is important for advancing our understanding of human biology and disease. By mapping the proteome, researchers gain insights into disease mechanisms, identify potential diagnostic markers, and develop more targeted therapies. As of recent reports, approximately 93% of the predicted proteins encoded in the human genome have been identified with protein-level evidence through this collaborative effort.