CpG sites represent a fundamental structural feature within the human genome. This sequence, Cytosine-phosphate-Guanine, is a dinucleotide where a cytosine (C) nucleotide is immediately followed by a guanine (G) nucleotide on the same strand of DNA, connected by a phosphate group (the “p”). These specific locations serve as a primary target for chemical modification, making them a significant marker across the entire genetic landscape. The presence and state of these sites hold considerable influence over how the cell’s machinery interprets genetic instructions.
Defining the Dinucleotide and its Location
The CpG dinucleotide is structurally defined by the 5′ to 3′ orientation of the DNA strand, placing the cytosine directly upstream of the guanine. While the genome contains billions of nucleotides, the occurrence of this specific C-G is surprisingly rare, appearing at less than one-fifth the expected frequency in general DNA. This scarcity is a consequence of a natural chemical instability that has pruned these sites from the genome over evolutionary time.
The regions where these dinucleotides are concentrated are known as CpG islands, which are distinct segments typically spanning 300 to 3,000 base pairs. These islands are identified by having a high concentration of both guanine and cytosine bases, along with a significantly greater density of CpG sites compared to the rest of the genome. CpG islands are not randomly distributed; they are predominantly located in the regulatory areas of genes, known as promoter regions.
Approximately 70% of all gene promoters in humans contain a CpG island, placing them directly at the site where gene expression is initiated. Their location at the start of a gene allows them to physically influence the binding of the cell’s transcriptional machinery. The presence of these islands is a recognized indicator that the gene is likely capable of being actively expressed in a wide variety of cell types.
The Role of DNA Methylation
The significance of the CpG site lies in its capacity to serve as a substrate for a chemical modification called DNA methylation. This process involves the covalent attachment of a small methyl group to the fifth carbon position of the cytosine ring within the dinucleotide sequence. The addition of this chemical tag is carried out by specialized enzymes called DNA methyltransferases (DNMTs).
The DNMT family includes two functional categories of enzymes: de novo methyltransferases (DNMT3A and DNMT3B), which establish new marks, and a maintenance enzyme (DNMT1). The de novo enzymes create new methylation patterns, while DNMT1 recognizes existing methylated sites on a parental DNA strand after replication. This maintenance enzyme copies the mark onto the newly synthesized daughter strand, ensuring the methylation pattern is faithfully inherited by the new cell. This mechanism makes the change a stable, heritable form of gene regulation.
The physical consequence of methylation is the silencing of the associated gene, effectively acting as a long-term “off switch” for gene activity. Methylation at a promoter’s CpG island prevents the proteins required for transcription from binding to the DNA. This blockage is not just a simple physical barrier; the methyl groups recruit specific methyl-binding proteins (MBDs).
These methyl-binding proteins then act as scaffolds to recruit larger complexes, including histone deacetylases, which work to condense the surrounding chromatin structure. This change in the physical organization of the DNA makes the gene physically inaccessible to the transcription machinery. DNA methylation at CpG sites represents a central mechanism of non-sequence-based gene regulation.
CPGs in Developmental Biology and Disease
The precise control of CpG methylation patterns is fundamental to developmental biology, allowing a single fertilized egg to differentiate into specialized cell types. During cellular differentiation, a global wave of DNA methylation occurs, which serves to silence genes associated with pluripotency and stem cell identity. This strategic silencing ensures that a newly specialized cell, such as a liver cell or a neuron, maintains its specific identity.
CpG methylation also drives genomic imprinting, which involves the expression of a gene exclusively from either the maternal or paternal chromosome. This parent-of-origin-specific gene activity is established during the formation of germ cells, where specific Imprinting Control Regions (ICRs) are methylated on one parental allele, leading to its silencing. This monoallelic expression pattern is then maintained through all subsequent cell divisions, demonstrating the power of CpG methylation as a cellular memory system.
Aberrant methylation patterns at CpG sites are a feature of human disease, particularly cancer. Cancer cells often exhibit a dual disruption in their methylation landscape: global hypomethylation and localized hypermethylation. Global hypomethylation is a pervasive decrease in methylation across the entire genome, especially at repetitive sequences like LINE-1 elements.
This hypomethylation contributes to genomic instability and an increased rate of mutations. Conversely, cancer cells frequently display hypermethylation, or increased methylation, specifically at the CpG islands of tumor suppressor genes. This targeted hypermethylation acts as a mechanism to silence protective genes, such as those involved in DNA repair or cell cycle control, allowing the cell to grow and divide uncontrollably. The specific methylation state of CpG sites provides a tangible molecular indicator of a cell’s health and developmental history.