What Is a CADD Score and How Is It Used?

A CADD score is a computational tool that helps scientists and clinicians understand the potential impact of changes in an individual’s DNA. CADD stands for Combined Annotation Dependent Depletion. It predicts how likely a genetic variation is to be harmful or disease-causing. This score provides a standardized metric to assess the deleteriousness of genetic variants across the human genome.

The Science Behind CADD

CADD integrates information from various sources to generate a single score for genetic variants. This includes evolutionary conservation, assessing how much a DNA sequence has changed across species. Highly conserved regions often indicate important biological functions, so variations in these areas might be more impactful.

The tool also incorporates functional genomics data, such as information from the ENCODE project and UCSC genome browser tracks. This data covers aspects like regions of the genome that are open and accessible for gene regulation, or areas where transcription factors bind. CADD also considers protein-level effects, including predictions from other tools like SIFT and PolyPhen that estimate how an amino acid change might affect protein function.

A higher CADD score indicates a greater likelihood that a genetic variant is deleterious or disease-causing. Raw scores are converted into a “scaled” or “PHRED-like” score, typically ranging from 1 to 99, for easier interpretation. For instance, a scaled score of 10 means the variant is among the 10% most deleterious possible substitutions in the human genome, while a score of 20 indicates it’s in the top 1%.

Practical Uses in Genetics

CADD scores are widely used in genetic research and clinical settings to prioritize genetic variants. When scientists perform large-scale sequencing studies, they often find thousands or millions of genetic variations in an individual. CADD helps researchers focus on variants most likely to be relevant to a disease by assigning a quantitative measure of potential impact. This prioritization saves time and resources, directing further investigation towards promising candidates.

In clinical diagnostics, CADD scores are useful for interpreting “variants of unknown significance” (VUS). These are genetic changes whose impact on health is not yet clear. For rare diseases, where understanding the genetic basis is challenging, CADD helps clinicians assess a variant’s potential pathogenicity. For example, it can aid in identifying candidate genes for new diseases or refining existing diagnoses by providing an evidence-based prediction of a variant’s deleteriousness.

CADD scores have been applied in various contexts, including the study of highly penetrant contributors to severe Mendelian disorders and in genome-wide association studies (GWAS). The tool’s ability to provide a genome-wide assessment for both coding and non-coding variants makes it a versatile asset. It helps researchers and clinicians sift through complex genomic data to pinpoint variations that warrant closer examination.

Contextualizing CADD Scores

A CADD score is a prediction tool, not a definitive diagnosis. It provides an estimate of a variant’s deleteriousness based on computational models and integrated data. Genetic variant interpretation is a complex process that requires multiple lines of evidence.

CADD scores are always used in conjunction with other information. This includes clinical symptoms observed in a patient, their family history of disease, and results from functional studies conducted in a laboratory. Other computational predictions from different tools may also be considered to build a comprehensive picture.

While guidelines exist for “cutoff” or “threshold” scores to suggest pathogenicity, these are not absolute. For instance, a CADD score of 15 or higher indicates a potentially deleterious variant, while a score of 20 is often considered significant. However, these thresholds are arbitrary and require expert human interpretation within the specific clinical or research context. The CADD score is a valuable piece of the puzzle, but it is never the sole determinant in assessing the impact of a genetic variant.