What Are CDEs? Common Data Elements in Research

CDEs, or Common Data Elements, are standardized questions paired with a specific set of allowable answers, designed so that different research teams collect data in exactly the same way. Think of them as a shared language for clinical research. When every study asks about a patient’s age, pain level, or diagnosis using the same wording and the same response options, the resulting data can be compared, combined, and reused far more easily than if each team invented its own questions from scratch.

How a CDE Actually Works

A single CDE has three parts: the question itself, the list of acceptable responses, and a layer of background information (metadata) describing what the element measures and how it should be used. For example, one of the NIH’s demographic CDEs is “Sex.” Rather than letting each research site phrase this differently or offer a different set of choices, the CDE specifies the exact question text and the exact response options every site must use. This eliminates the small inconsistencies that make it difficult to merge datasets later.

The NIH maintains a core set of demographic CDEs used across both adult and pediatric studies. These cover basics like date of birth, age, sex, ethnicity, race, highest level of education, employment status, relationship status, annual household income, and pain duration. There are also CDEs for social determinants of health and geographic classification codes that distinguish rural from urban areas. These demographic elements form a baseline that most federally funded studies are expected to adopt.

Why Standardization Matters

Without CDEs, two stroke studies might record disability outcomes using completely different scales, making it nearly impossible to pool their results. CDEs solve this by ensuring consistency at the point of data collection, not after the fact. The practical benefits are significant: they enable cross-study comparisons and large-scale meta-analyses, improve data quality by reducing ambiguity, simplify training for research staff, promote interoperability between different data systems (including patient registries and electronic health records), and cut the time and cost of designing new studies because teams don’t have to reinvent data collection tools from the ground up.

For individual researchers, CDEs also function as a kind of expert shortcut. Instead of spending months deciding how to measure a particular outcome, a team can adopt a CDE that has already been vetted by specialists in the field. This is especially valuable for early-career investigators or smaller sites that may not have the resources to develop and validate their own instruments.

The NIH CDE Repository

The NIH CDE Repository is a free, publicly accessible database where researchers can search for and download standardized data elements. As of late 2024, the repository holds more than 24,000 total CDEs, 184 NIH-endorsed CDEs organized into four collections, and 1,679 curated forms (which are pre-built sets of CDEs grouped together for specific study types). No account is needed to browse, search, or view any of the CDEs or forms. Signing in unlocks additional features, but the core content is open to anyone.

CDEs in the repository are organized by classifications, which are typically tied to a specific NIH institute or research initiative. This makes it straightforward to find elements relevant to a particular disease area or funding program.

Disease-Specific CDEs

Beyond the universal demographic elements, many CDEs are tailored to specific medical conditions. The National Institute of Neurological Disorders and Stroke, for instance, has developed dedicated CDE sets for more than 20 conditions, including stroke, traumatic brain injury, Parkinson’s disease, epilepsy, multiple sclerosis, ALS, Huntington’s disease, spinal cord injury, and sports-related concussion. Each set defines the exact questions and response formats that researchers studying that condition should use, covering everything from symptom severity scales to functional outcome measures.

Other NIH institutes have followed the same model for their own disease areas. The result is a growing ecosystem where studies of the same condition, even when conducted by different teams in different countries, produce data that can be directly compared and combined.

How CDEs Are Created

A new CDE doesn’t simply appear in the repository. The process involves assembling an expert panel that includes researchers, clinicians, and representatives from the relevant NIH institute. This group reviews existing literature, identifies gaps in current data collection practices, and drafts candidate elements. The proposed CDEs then go through a formal endorsement process where they’re evaluated against established criteria before being approved and added to the repository.

For newer disease areas, the NIH funds dedicated conferences and working groups specifically to build consensus around which data elements should be standardized. This was the approach taken recently for autoimmune disease research, where investigators were brought together with NIH program staff and data science experts to define, validate, and submit new CDEs. The entire lifecycle, from initial proposal to repository inclusion, is designed to ensure that each element reflects genuine expert agreement rather than one team’s preferences.

CDEs in Practice: Building Study Forms

Researchers use CDEs as building blocks when designing case report forms, the documents (paper or electronic) that capture patient data during a study. Rather than creating every field from scratch, teams pull standardized CDEs from the repository and assemble them into forms tailored to their protocol. This approach saves time, reduces errors, and produces data that is already formatted for comparison with other studies using the same elements.

Electronic case report forms built with CDEs offer additional advantages. The system can auto-populate repetitive fields like protocol ID and site code, enforce the correct response options through built-in validation checks, and flag inconsistencies in real time. Maintaining a library of CDE-based form templates is a recommended best practice because it makes launching new studies faster and more cost-effective.

CDEs and Electronic Health Records

One of the longer-term goals of CDE standardization is tighter integration with electronic health records. When CDEs are built using recognized medical vocabularies and coding systems, they can be mapped to the data structures that hospitals and clinics already use. This means that, in principle, some research data could be pulled directly from a patient’s medical record rather than entered manually by a research coordinator. The practical effect is less duplicate data entry, fewer transcription errors, and a faster path from clinical care to research-ready datasets.