Somatic mutations are genetic alterations that arise in cells after conception and are not inherited. These changes accumulate over a person’s lifetime and are a defining feature of cancer. Unlike germline mutations, which are present in every cell from birth, somatic mutations are confined to a specific subset of cells, such as those within a tumor. To manage this vast amount of data, scientists developed systematic repositories that centralize information from thousands of cancer samples, transforming research into the disease’s origins and progression.
The COSMIC Database
The world’s largest resource for this information is the Catalogue of Somatic Mutations in Cancer (COSMIC), maintained by the Wellcome Sanger Institute. Its purpose is to curate and present information on somatic mutations found in human cancers. Launched in 2004 with data from only four genes, it has since expanded to include millions of mutations across thousands of genes. The database documents mutation types, affected genes, and the cancer types in which these alterations are observed.
COSMIC integrates data from scientific publications and large-scale cancer genomics projects. Major contributors include The Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC). These collaborations have systematically analyzed the genomes of thousands of tumor samples across numerous cancer types. This data enriches curated databases like COSMIC, ensuring the information remains current and comprehensive.
The curation process involves both expert manual review and bioinformatic pipelines. For well-established cancer genes, scientific curators review published literature to extract detailed mutation data. For the information produced by genome-wide sequencing, automated systems identify and annotate genomic variations. This dual approach ensures both depth of detail for high-impact genes and breadth of coverage for the entire cancer genome.
Key Information within the Catalogue
The data within a cancer mutation catalogue contains different types of genetic changes. Researchers distinguish between “driver” and “passenger” mutations. Driver mutations provide a selective growth advantage to cancer cells, contributing to the disease’s development. In contrast, passenger mutations are alterations that occur by chance and do not contribute to cancer growth, making driver mutations the ideal targets for therapy.
The catalogue categorizes mutations into several main types.
- Point mutations are one of the most common, involving a change to a single nucleotide in the DNA sequence.
- Insertions or deletions, often called indels, are where one or more nucleotides are either added to or removed from the DNA.
- Copy number variations (CNVs) are changes where large segments of DNA are either duplicated or deleted entirely, which can affect the production of proteins.
- Structural rearrangements are larger changes where pieces of chromosomes are moved, inverted, or exchanged, which can create novel “fusion genes” with cancer-promoting functions.
How the Catalogue Drives Cancer Research and Treatment
The collection of somatic mutation data has led to the development of targeted therapies. These drugs are designed to attack cancer cells that harbor certain driver mutations. For example, the discovery of the BRAF V600E mutation in many melanomas led to the development of BRAF inhibitor drugs. These have proven effective in patients whose tumors carry this specific alteration.
This genetic information also transforms cancer diagnostics and prognostics. The presence of specific mutations can act as a biomarker, helping to classify a tumor more accurately than traditional pathology. Different mutations within the same cancer type can indicate different prognoses or predict whether a tumor will respond to a particular treatment. This allows for more personalized medicine, where strategies are tailored to the genetic profile of a patient’s tumor.
The catalogue is also a tool for fundamental cancer biology research. By analyzing mutational patterns across thousands of tumors, scientists can identify new genes involved in cancer. Researchers can also uncover “mutational signatures,” which are characteristic patterns of mutations left by specific processes, such as exposure to ultraviolet light or tobacco smoke. This provides clues about the underlying causes of different cancers.
Methods of Data Generation
Populating these catalogues begins with collecting tumor samples from patients through a biopsy or surgical resection. A matched normal sample, such as blood, is also collected from the same individual. This comparison is foundational, as it allows scientists to differentiate mutations unique to the cancer from the inherited genetic variants present in all of the person’s cells.
The core technology used is high-throughput DNA sequencing, also known as Next-Generation Sequencing (NGS). Whole Genome Sequencing (WGS) reads the entire DNA sequence of a cell, providing a comprehensive view of all genetic alterations. A more targeted approach is Whole Exome Sequencing (WES), which focuses only on the protein-coding regions of genes, where a majority of disease-causing mutations are found.
The final step is bioinformatics analysis. Raw sequencing data is processed through computational pipelines to align it with the human reference genome and identify differences. These algorithms must distinguish true somatic mutations from benign germline variants and potential sequencing errors. This bioinformatic filtering ensures the accuracy of the data that ultimately populates the cancer mutation catalogues.