How to Read Gene Mutation Nomenclature

Gene mutation nomenclature is a standardized system for describing changes in DNA or protein sequences. This universal language allows scientists, clinicians, and researchers worldwide to accurately communicate genetic findings. Understanding it is important for comprehending genetic diseases, informing diagnoses, and guiding new treatments.

Foundations of Nomenclature

The Human Genome Variation Society (HGVS) guidelines serve as the primary standard for gene mutation nomenclature. HGVS notation employs a reference sequence and a coordinate system, numbering positions within that sequence. Genetic variations are described using prefixes indicating the reference sequence type: ‘c.’ for coding DNA, ‘g.’ for genomic DNA, and ‘p.’ for protein sequences.

Interpreting Common Mutation Types

Understanding specific symbols and abbreviations is key to interpreting common gene mutation types. A substitution, where one base is replaced by another, is indicated by the ‘>’ symbol, such as `c.76A>T` for DNA or `p.Lys25Ter` for protein. Substitutions can lead to missense mutations like `p.Ala15Val`, where one amino acid is replaced by another. A nonsense mutation, such as `p.Gln15Ter` or `p.Gln15`, results in a premature stop codon.

Deletions are denoted by `del`, as seen in `c.123_126del`, indicating the removal of bases between positions 123 and 126. Insertions are represented by `ins`, for example, `c.123_124insG`, meaning a guanine (G) was inserted between positions 123 and 124. Duplications use `dup`, such as `c.123_126dup`.

Complex changes involving both deletions and insertions are described with `delins`, like `c.123_128delinsG`, where a segment is deleted and a new sequence (G) is inserted. Frameshift mutations, which alter the protein’s reading frame, are indicated by `fs` or `fsTer`. For example, `p.Arg97ProfsTer23` signifies that arginine at position 97 changed to proline, leading to a frameshift and a termination codon 23 amino acids downstream.

Understanding Reference Sequences

The reference sequence helps pinpoint the location and nature of a gene mutation. The ‘c.’ prefix indicates a change within the coding DNA sequence. For coding DNA, numbering starts with +1 at the ‘A’ of the ATG start codon. Positive numbers refer to positions within exons, while negative numbers indicate positions upstream in non-coding regions.

The ‘g.’ prefix denotes a genomic DNA sequence, often used for mutations in non-coding regions like introns or when the specific transcript is unknown. Numbering for genomic DNA begins from the start of a chromosome. This distinction is important because genomic coordinates include introns and intergenic regions, unlike coding DNA.

When a mutation affects the protein, the ‘p.’ prefix describes the change at the amino acid level. Amino acid numbering begins with the initiator methionine as +1. Less common prefixes include ‘m.’ for mitochondrial DNA and ‘r.’ for RNA sequences, specifying the type of nucleic acid affected.

Decoding Full Mutation Names

Interpreting a full gene mutation name involves breaking down its components. Consider `CFTR:c.1521_1523delCTT`. CFTR is the gene symbol. The ‘c.’ prefix indicates a change in the coding DNA sequence. The numbers `1521_1523` denote the positions, and `delCTT` signifies a deletion of the nucleotides CTT. This means CTT was deleted at positions 1521 to 1523 in the CFTR gene’s coding DNA sequence.

Another example is `HBB:p.Glu6Val`. HBB is the gene symbol. The ‘p.’ prefix specifies a change at the protein level. `Glu6Val` indicates that Glutamic acid (Glu) at position 6 has been replaced by Valine (Val).

Finally, `BRCA1:c.5266dupC` illustrates a third type. BRCA1 identifies the gene. The ‘c.’ prefix refers to a coding DNA sequence variant. The number `5266` points to the position, and `dupC` means the nucleotide Cytosine (C) at that position has been duplicated. This describes an insertion of an identical nucleotide immediately after the original one in the BRCA1 gene’s coding DNA sequence.