Promoter Capture Hi-C: A Technique for Gene Regulation

Promoter Capture Hi-C (PCHi-C) is a method used to create a high-resolution map of the physical connections between gene promoters and other genomic regions. Promoters are the sequences that initiate the process of turning a gene on. Understanding the long-range interactions that promoters make with distant parts of the DNA is a significant part of deciphering how genes are regulated. The technique shows which parts of the genome are in direct physical contact with a gene’s starting point, even if they are far apart in the linear sequence.

This approach provides a snapshot of the complex folding of DNA within a cell’s nucleus. By identifying these specific contacts, scientists can link regulatory elements, such as enhancers, to the genes they control. This information helps build a more complete picture of the network that governs gene expression. The method focuses specifically on promoters, generating a detailed chart of their interaction landscapes across the genome.

The Foundational Technique of Hi-C

Promoter Capture Hi-C is an advanced application of Hi-C, a method designed to map all physical interactions within the genome. The Hi-C process begins by using a chemical, like formaldehyde, to crosslink the DNA inside a cell’s nucleus. This step freezes the three-dimensional structure of the genome, creating stable chemical bonds between DNA strands that are physically close, preserving the interactions as they exist within the cell.

Once the DNA is locked in its folded state, the genome is cut into millions of small fragments using restriction enzymes. These enzymes are proteins that recognize and cut DNA at specific sequence patterns. The ends of these fragments are then filled in and marked with a biotin molecule, a molecular tag that will be used for later isolation.

A primary step in the Hi-C process is ligation, where the tagged DNA fragments are joined back together. Because of the crosslinking, fragments that were physically near each other in the nucleus are more likely to be ligated together. This creates new, hybrid DNA molecules composed of sequences from different parts of the genome. These molecules serve as a direct record of a genomic interaction, and the collection forms what is known as a Hi-C library.

The Promoter Capture Hi-C Workflow

The PCHi-C workflow begins where the standard Hi-C process ends, introducing a specific enrichment step. After a Hi-C library is generated, the goal shifts to isolating only those interactions that involve a gene promoter. This targeted approach reduces the complexity of the data and focuses on the connections most relevant for understanding gene activation and control.

This enrichment is achieved through a process called in-solution hybrid selection. Researchers use custom-designed “baits,” which are short, biotinylated RNA molecules synthesized to be complementary to the sequences of thousands of known gene promoters. When these baits are mixed with the Hi-C library, they bind specifically to the DNA fragments that contain a promoter sequence through hybridization.

Once the RNA baits have hybridized to their target DNA fragments, the complex can be isolated from the rest of the library. Because the baits are tagged with biotin, magnetic beads coated with streptavidin are used to capture them. This step filters the Hi-C library, enriching it for ligation products where at least one of the interacting DNA fragments is a promoter. This promoter-centric library is then sequenced, providing a detailed map of what these specific genomic regions are physically touching.

Applications in Genomic Research

A primary application of PCHi-C is mapping connections between enhancers and promoters. Enhancers are segments of DNA that can boost a gene’s expression but are located very far from the gene they regulate, making them difficult to link by linear sequence alone. PCHi-C identifies physical contact between a distant enhancer and a specific gene promoter, providing strong evidence of a functional regulatory relationship.

This ability to connect enhancers to their target genes is useful for understanding the mechanisms behind diseases. Genome-wide association studies (GWAS) identify genetic variants (SNPs) associated with a disease, but these variants are found in non-coding regions of the genome, making their functional role a mystery. Researchers can use PCHi-C to investigate whether a disease-associated SNP is located within an enhancer that physically interacts with a known disease-relevant gene.

If PCHi-C data reveals a physical loop connecting an enhancer containing a risk variant to a gene’s promoter, it provides a mechanical explanation for the genetic association. This shows how a change in a non-coding region can affect the regulation of a specific gene, potentially leading to the disease state. This process helps translate the statistical findings of GWAS into biological hypotheses about how genetic variations contribute to complex diseases.

Interpreting Promoter Capture Hi-C Data

The raw output from a PCHi-C experiment is a large dataset listing millions of interactions. Not all of these detected interactions are biologically meaningful, as many occur from the random motion of DNA within the nucleus. A necessary step is to apply statistical analysis to distinguish these random collisions from stable interactions. This requires specialized bioinformatics pipelines to process the sequencing data.

To accomplish this, scientists use dedicated software tools, such as CHiCAGO. This tool assesses the interactions for every promoter and compares the observed frequency of an interaction against what would be expected by random chance, considering factors like the linear distance between the two genomic regions. Based on this statistical model, the software assigns a confidence score to each promoter interaction, allowing researchers to filter out noise and focus on statistically significant connections.

The final, filtered data is represented using visualization tools to make the complex network of interactions understandable. A common method is the use of arc plots, where the linear genome is displayed as a horizontal line. Arches are drawn to connect gene promoters to the other genomic regions they significantly interact with. The height or color of the arc can represent the strength or statistical significance of the connection, providing a visual map of the regulatory landscape for specific genes or genomic regions.

What Is Ultra Cold Storage and How Does It Work?

What Is an mRNA CDMO and What Do They Do?

Detecting Apoptosis with Flow Cytometry