The GTEx (Genotype-Tissue Expression) database is a foundational resource in human biology and genetics, launched by the National Institutes of Health (NIH) in 2010. Its primary goal is to enhance understanding of how genetic variations influence gene expression across various human tissues and individuals. It generates a comprehensive catalog of gene activity, offering insights into how inherited DNA differences relate to gene expression patterns throughout the body. The data collected helps researchers explore the functional consequences of genetic variations and their impact on complex human diseases.
What GTEx Contains
The GTEx database provides detailed information on gene expression profiles derived from RNA sequencing data. This includes measurements of gene activity across a diverse collection of human tissues, such as the brain, heart, muscle, liver, and skin. The dataset also incorporates genetic variation data, specifically genotyping information, from the same individuals who donated tissue samples. This integration allows scientists to establish connections between specific genetic differences and observed patterns of gene expression in various tissues.
The final GTEx dataset (V8) includes DNA data from 838 post-mortem donors and 17,382 RNA-seq samples collected from 54 tissue sites and two cell lines. Beyond genetic and expression data, the database also contains demographic and clinical information for each donor, such as medical histories and current medications. This contextual information helps researchers interpret the genetic and expression data within the context of an individual’s health status.
How GTEx Data is Collected
The data for the GTEx database is acquired through a systematic process involving post-mortem donors. Tissue samples are collected from recently deceased individuals, with strict adherence to ethical considerations and donor consent. The project aims to collect biospecimens from approximately 900 unique donors, with around 50 tissue samples typically obtained from each individual.
Following collection, certified pathologists dissect and examine each tissue sample for quality. These tissues are then preserved using specific fixatives before being processed in laboratories. Molecular techniques are employed to generate the data, including RNA sequencing to measure gene expression levels and genotyping to identify genetic variants present in each donor.
Significance in Biomedical Research
The GTEx database impacts scientific discovery and biomedical research by clarifying how genetic variations affect gene activity in different tissues. It helps scientists understand the normal function of genes across various organs and identify genes that are active only in specific tissues. This resource also enables researchers to pinpoint genetic variants, known as expression quantitative trait loci (eQTLs), that influence gene activity.
The data helps researchers interpret findings from genome-wide association studies (GWAS), which identify genetic variants linked to diseases but often do not explain the underlying mechanisms. By combining GWAS data with GTEx information, scientists can better understand how specific genetic variations contribute to disease susceptibility and progression. For example, GTEx data has been used to identify genes associated with diseases such as bipolar disorder, coronary artery disease, Crohn’s disease, rheumatoid arthritis, and type 1 diabetes. The database also aids in identifying potential drug targets by providing a reference for normal gene activity, which can be compared to gene expression in disease states like cancer.
Accessing and Using the Database
The GTEx data is publicly available, promoting widespread data sharing and collaborative research efforts. Researchers and the general public can access and explore the data through the user-friendly GTEx Portal. This platform allows for visual exploration of gene expression across tissues without requiring advanced computational skills.
For more experienced users, raw data files can be downloaded for in-depth analysis. This controlled access data, including genotypes and RNA-seq BAM files, is available through platforms like the National Center for Biotechnology Information’s database of Genotypes and Phenotypes (dbGaP) and the National Human Genome Research Institute’s (NHGRI) Genomic Analysis and Visualization and Informatics Labspace (AnVIL). The GTEx Portal also provides an API for retrieving gene expression data.