What Is IGV and How Is It Used in Genomics?

The Integrative Genomics Viewer (IGV) is an interactive software tool from the Broad Institute for the visual exploration of large, complex genomic datasets. It allows scientists to view their own data, or publicly available data, in the context of a reference genome. This capability helps researchers investigate genetic information, identify patterns, and formulate new hypotheses about gene function and disease.

Core Functionality and Visualization

IGV functions as a genome browser, presenting data in an intuitive and interactive format. The main interface displays genomic information in horizontal rows known as tracks, aligned to a reference genome sequence at the top. This reference is accompanied by a chromosome ideogram, a visual map of the chromosome, with a red box indicating the specific region in view. The experience is analogous to using a digital map, where different layers of information can be toggled on and off to reveal distinct details.

Users interact with the data through a navigation panel with controls for zooming and panning along the genome. As a user zooms from a whole-chromosome view to a specific gene, the level of detail increases to reveal individual DNA bases. The main data panel is where the various tracks are displayed, each representing a different dataset loaded by the user.

IGV can display multiple tracks simultaneously, allowing for direct comparison between different experiments or samples. Researchers can customize the appearance of these tracks by changing their height, color, and order. This highlights the most relevant information for their specific research question and enables the integration of diverse datasets.

Supported Data Types

IGV’s capacity to display a wide array of data types is central to its function. These are loaded as distinct file formats, with each answering different biological questions.

Sequence Alignments (BAM/CRAM): These files show how short DNA or RNA sequences from an experiment, called reads, align to the reference genome. This appears as a “coverage” plot showing the density of reads, which is useful for assessing data quality and identifying large-scale structural changes. An accompanying index file is required for IGV to quickly access the data without loading the entire file into memory.
Genomic Variants (VCF): Variant Call Format files pinpoint specific locations where a sample’s DNA sequence differs from the reference genome. These differences can be single nucleotide polymorphisms (SNPs) or small insertions and deletions. In IGV, VCF files appear as a track with markings at each variant position, allowing researchers to quickly spot mutations.
Gene Annotations (BED/GFF): These files act like a map of the genome, indicating the precise coordinates of genes, exons, and other functional elements. This allows researchers to see if their experimental data overlaps with known biological features. For instance, a variant in a VCF file can be cross-referenced with the annotation track to see if it falls within a gene.
Quantitative Data (WIG/BigWig): These files show a numerical value across the genome, often as a graph or heatmap. This is useful for visualizing data from experiments like RNA-Seq, which measures gene expression levels, or ChIP-Seq, which identifies where proteins bind to DNA. This data shows which genes are more active or where regulatory events are occurring.

Practical Applications in Research

IGV’s visualization capabilities are applied to interpret complex data and drive discovery across different fields. The ability to overlay multiple data types allows for a comprehensive view of genomic events in contexts like cancer research, gene expression studies, and epigenetics.

In cancer genomics, a researcher might use IGV to investigate the genetic drivers of a tumor. They could load the DNA sequence data from a tumor sample as a BAM file and a VCF file highlighting known cancer-associated mutations. By navigating to a gene like TP53, the researcher can visually inspect the sequence reads to confirm a mutation, assess data quality, and determine what percentage of cells carry the change.

Another application is in gene expression analysis. A scientist could compare RNA-sequencing data from healthy and diseased tissue, loaded as BigWig files in separate quantitative tracks. By comparing the signal peaks over specific genes, the researcher can identify genes that are turned “on” or “off,” providing clues about the molecular basis of a disorder.

In epigenetics, investigators study how chemical modifications to DNA regulate gene activity. An epigenetics researcher might use IGV to view data from a ChIP-seq experiment, which identifies protein binding sites across the genome. By loading this data as a BigWig track alongside gene annotations, they can see whether the protein binds near the start of genes, suggesting a role in activation.

Getting Started with IGV

IGV is available in two main versions. The primary version is a downloadable desktop application for Mac, Windows, and Linux, which is best for handling large, local data files. A web-based version, igv.js, is also available for easier sharing and embedding but is better suited for smaller datasets. The desktop application is recommended for most research.

Upon opening the application, the first step is to select a reference genome. IGV provides a drop-down menu with dozens of pre-loaded genomes, such as the human reference genome (hg19 or hg38), which can be loaded from the IGV server. This provides the genomic context for all other data.

The next step is to load a dataset. A good way to explore the software is to load a sample dataset from the IGV-hosted server via the “File” menu. This includes a variety of data types from public projects and allows you to experiment with the interface without needing your own data files.

Once a genome and data track are loaded, you can navigate to a gene of interest. Typing a gene name, like GAPDH, into the search bar will automatically jump to that location. This three-step process—loading a genome, a data track, and navigating to a gene—forms the basis of most analyses in IGV.