What Is a Manhattan Plot and How Is It Used?

A Manhattan plot is a specialized scatter plot primarily used in genetics and genomics research. It serves as a powerful visualization tool to identify genomic regions statistically associated with a specific trait, disease, or characteristic. These plots display a large number of data points, helping researchers make sense of complex genetic datasets and quickly pinpoint potential genetic influences.

Components of a Manhattan Plot

A Manhattan plot visually organizes genetic data to highlight significant findings. The horizontal (X-axis) represents genomic location, typically arranged by chromosome number from left to right. Each point on this axis signifies a single genetic marker, such as a Single Nucleotide Polymorphism (SNP), and its chromosomal position. The vertical (Y-axis) quantifies the statistical significance of the association between each marker and the studied trait. This significance is commonly expressed as the negative logarithm (base 10) of the p-value, or -log10(p).

A higher point on the Y-axis indicates a stronger statistical association and a smaller p-value. For instance, a p-value of 10^-8 corresponds to a Y-axis value of 8, and 10^-15 to 15, demonstrating that smaller p-values result in greater heights. To improve visual clarity and distinguish chromosomes, points are often colored alternately for successive chromosomes. This plot type is widely applied in Genome-Wide Association Studies (GWAS), analyzing millions of genetic markers to find associations with specific traits or diseases.

Understanding the Peaks

The most striking features of a Manhattan plot are its “peaks,” which are tall spires of points rising prominently from the baseline. These peaks signify genomic regions where genetic markers show a strong statistical association with the trait or disease under investigation. The height of these peaks directly correlates with the strength of the statistical evidence; taller peaks indicate more significant associations. Researchers often use a “significance threshold,” represented by a horizontal line, to differentiate truly significant findings from random noise.

Points that rise above this predetermined threshold are considered “hits” or “discoveries,” indicating a high likelihood that genetic markers in that region are genuinely associated with the trait. A common genome-wide significance threshold in GWAS is a p-value of 5 x 10^-8, appearing as a -log10(p) value of 8 on the plot. These significant peaks highlight genomic regions that may contain genes or regulatory elements influencing the trait, suggesting areas for further, more focused research. These peaks denote a statistical association, not necessarily a direct cause; additional studies are needed to confirm their functional relevance.

The Origin of the Name

The distinctive name “Manhattan plot” originates from its visual resemblance to the iconic skyline of Manhattan, New York City. The densely packed points, varying in height, especially the prominent “peaks,” create an image akin to skyscrapers towering over lower buildings. This striking visual analogy makes the complex data more intuitive and memorable for researchers and the scientific community. The name reflects the plot’s appearance, where statistical signals rise from a background of less significant data, much like a city’s architectural landscape.