The Hi-C Protocol: A Step-by-Step Explanation

The genetic material within each cell is not a simple, linear strand. It is intricately folded into a complex, three-dimensional structure that fits within the microscopic cell nucleus. This organization is fundamental, as DNA folding influences which genes are active and which are silenced. To understand this architecture, scientists developed Hi-C, a method that provides a snapshot of the genome’s 3D structure by identifying which DNA segments are physically close inside the nucleus, even if they are far apart in the linear sequence.

The Core Steps of the Hi-C Experiment

The Hi-C protocol involves several stages to map the genome’s spatial arrangement.

Cross-linking: The protocol begins by treating cells with formaldehyde. This chemical creates covalent cross-links between proteins and DNA that are in close proximity, essentially freezing the genome’s spatial arrangement at a specific moment in time. This process captures both transient and stable contacts.
Digestion: The DNA is cut into smaller pieces using a restriction enzyme, which recognizes and cuts DNA at specific sequences. The choice of enzyme influences the resolution of the final data, as enzymes that cut more frequently generate smaller fragments.
Marking and Ligation: The overhangs left by the restriction enzyme are filled in with nucleotides, one of which is labeled with a biotin molecule—a type of molecular tag. This step ensures that only the original ends of the DNA fragments carry this tag. The marked ends are then joined together in a process called ligation. Because the sample is diluted, ligation is most likely to occur between fragments that were cross-linked together, creating chimeric DNA molecules that represent two regions that were spatially adjacent in the nucleus.
Purification and Sequencing: The cross-links are reversed, and proteins are removed. The DNA is sheared into smaller fragments, and those containing the biotin-tagged junctions are isolated using streptavidin-coated magnetic beads. This enrichment isolates the informative molecules, which are then sequenced to identify the interacting genomic regions.

Generating and Interpreting Hi-C Data

After the laboratory work, the process shifts to computational analysis. The sequenced DNA fragments provide the raw data to map the genome’s 3D interactions. Each fragment contains a pair of DNA reads representing the two ligated genomic regions. These reads are mapped to a reference genome to determine their precise location, and by counting how many times every pair of locations appears together, researchers build an interaction frequency matrix.

This matrix is most often visualized as a Hi-C contact map, which serves as the primary output of the experiment. A contact map is a two-dimensional grid where both the x-axis and the y-axis represent the linear sequence of a chromosome. The color of each pixel in this grid indicates the interaction frequency between the corresponding genomic regions; a more intense color signifies a higher frequency of contact.

Reading a contact map reveals genomic organization. A strong signal along the map’s diagonal represents frequent interactions between adjacent DNA segments. The more telling information comes from off-diagonal signals, which are points of high intensity representing long-range interactions where distant regions of the chromosome are brought into close physical contact.

Scientific Discoveries Using Hi-C

The application of Hi-C led to the discovery of Topologically Associating Domains (TADs). These are neighborhoods along a chromosome where DNA interacts frequently within the domain but less frequently with adjacent domains. TADs are considered fundamental units of chromosome organization, helping to insulate genetic neighborhoods and regulate gene activity within them.

Hi-C data also revealed that the genome is segregated into two large-scale compartments. The ‘A’ compartment corresponds to active, gene-rich regions (euchromatin), while the ‘B’ compartment is associated with inactive, gene-poor regions (heterochromatin). This partitioning helps organize the nucleus into functionally distinct zones.

Hi-C has also been instrumental in identifying specific chromatin loops. These loops are formed when distant genomic elements, such as enhancers and promoters, are brought into direct physical contact. Such interactions are a mechanism of gene regulation, allowing an enhancer element to activate a gene that may be hundreds of thousands of base pairs away on the linear DNA sequence. Mapping these connections has advanced our understanding of how gene expression is controlled.

Variations and Advancements of the Hi-C Technique

The foundational Hi-C protocol has inspired more advanced techniques. One improvement is in situ Hi-C, where ligation is performed within intact nuclei. This modification reduces random ligations and increases the resolution of the interaction map. This variant has become a standard in the field due to its improved signal-to-noise ratio.

For finer detail, researchers developed Micro-C. This method uses micrococcal nuclease instead of restriction enzymes to fragment DNA. Because this nuclease cuts DNA between nucleosomes (the basic units of DNA packaging), Micro-C can map interactions at a very high resolution, revealing details not visible with standard Hi-C.

To study how genome architecture varies between cells, scientists created single-cell Hi-C. This adaptation captures 3D genome maps from individual cells, revealing cell-to-cell variability that is averaged out in bulk experiments. This approach is useful for studying complex tissues or developmental processes.