DCB Guidance for the NIH Data Management and Sharing (DMS) Policy
The NCI Division of Cancer Biology (DCB) provides information and guidance about the NIH Data Management and Sharing (DMS) Policy to researchers.
Introduction to the NIH DMS Policy
Aims of the NIH DMS Policy:
- To promote a culture in which Data Management and Sharing are an integral component of a biomedical research project, rather than an administrative or additive one.
- Data Management and sharing practices are consistent with the FAIR (Findable, Accessible, Interoperable, and Reusable) data principles.
- All research grants that generate scientific data include a DMS Plan where investigators have prospectively planned on how to preserve and share scientific data with the scientific community.
Additional information about Sharing Data can be found at an NCI Office of Data Sharing webpage.
DCB Overview of the NIH DMS Policy
Every NIH grant applicant with application receipt due dates for or after January 2023, must provide a Data Management and Sharing (DMS) plan. NIH expects that applicants will maximize appropriate data sharing.
A DMS plan consists of 6 elements that address the following topics related to how scientific data will be collected, which of the collected data will be preserved and shared, what metadata and data standards will be used, where the data will be archived, how the shared data will be findable/searchable for reuse and how the process of data sharing will be managed throughout the grant period.
- Element 1: Data to be managed and shared
- Element 2: Documentation of software tools/codes
- Element 3: Documentation of data standards
- Element 4: Repositories, data access, and data sharing timelines
- Element 5: Data access, distribution, and reuse
- Element 6: Data management and oversight
The NIH Genomic Data Sharing Policy (GDS) is now harmonized into the DMS Policy, so the single DMS plan should satisfy requirements of both the GDS policy and the DMS policy if large-scale omics data are being generated.
DCB Guidance Related to the Elements of an NIH DMS Plan
Element 1A
Data types: List ALL data types (not only omics data) proposed in the Research Strategy section of the grant application.
Sample types: List the corresponding sample types (e.g., tumor tissue, cell lines, PDX, organoids, primary cell lines, etc.). Sample types must also include the species from which data are being generated.
Sample number: If any omics data are being generated, providing sample number will help determine if the GDS policy will apply to the proposed research project.
If human genomic data is being generated and the GDS policy applies:
- A institutional certification (IC) must be submitted with the application or with Just-in-Time (JIT) documents.
- Human genomic data must be registered in dbGaP AND data must be deposited in an NIH-supported repository within 9 months of complete dataset collection and quality control.
If non-human genomic data is being generated and the GDS policy applies:
- Datasets should be available no later than the date of initial publication or end of the award, whichever comes first.
Sample volume: List the amount/volume of data that will be generated. This will help evaluate the budget required towards the DMS plan.
Additional information related to Sample Volume can be found in the DCB Guidance for Estimating the Volume of a Dataset in NIH DMS Plans.
NIH provides discounts on partner cloud services like Amazon Web Services (AWS), Google Cloud, and Microsoft Azure via the NIH STRIDES initiative for cloud computing, storage, and related services.
Element 1A can be presented in a tabular form that includes the data types, sample types, sample number, and sample volume of all the data to be generated in the proposed research project. An example table can be found in DCB Information Related to Element 1A in NIH DMS Plans.
Element 1B
List the data types generated in Element 1A that will be preserved and shared.
Justification of why certain data or data types will not be preserved and shared must be provided. Ethical, legal, and technical factors should guide the extent of data preserving and sharing.
Principal Investigators may consider whether raw or processed data will be shared and in which commonly accepted/agreed upon data formats the data will be shared.
Element 1C
All data that are preserved and shared should be accompanied by their metadata and other associated relevant documentation.
Metadata are data about how a dataset or resource came about and how it is internally structured (e.g. the unit of analysis, collection method, sampling procedure, sample size, categories, variables, etc.).
Metadata have to be gathered by the researchers according to best practices in their research community and should be published together with the data.
If no metadata standards are defined for the data types/research field, provide minimum information that someone would need to know to be able to work with the dataset without any further input from you. It is recommended to think as a consumer of the data, not the producer.
Examples of typical metadata elements |
---|
Biological material (e.g., species, genotypes, tissue type, age, health conditions) |
Biological context (e.g., specimen growth, entrainment, samples preparation) |
Experimental factors and conditions (e.g. drug treatments, stress factors) |
Primers, plasmid sequences, cell line information, plasmid construction |
Specifics of data acquisition |
Specifics of data processing and analysis |
Definition of variables |
LOT numbers |
Accompanying code, software used (version number), parameters applied, statistical tests used, seed for randomization |
Element 2
State whether specialized tools, software, and/or code are needed to access or manipulate shared scientific data. If so, provide the name(s) of the needed tool(s) and software and specify how they can be accessed.
The use of open-source code and tools is highly encouraged.
Element 3
State what common data standards will be applied to the scientific data and associated metadata.
Data standards are pivotal for enabling interoperability of datasets and resources. A data standard is defined as a type of standard, which is an agreed upon approach to allow for consistent measurement, qualification or exchange of an object, process, or unit of information.
Widely accepted research standards should be used, and it is recommended to use the data standard requirements of established repositories where the data is planned to be submitted.
If no consensus standards exist in the scientific field, this should be indicated.
Examples of some community data standards for various data types:
Data Type | Standards | File Formats |
---|---|---|
Sequencing (RNA, DNA, & next gen) | MINSEQE | BAM, FASTQ |
Microarray | MIAME | |
DNA hypersensitivity or methylation assays and immunoprecipitation (IP) of proteins followed by sequencing | ENCODE | |
Proteomic datasets | MIAPE | |
Flow cytometry | FCS | .fcs |
Imaging (Microscopy) | OME | PNG, TIFF |
Imaging (Electron Microscopy) | EMPIAR | |
Medical Imaging (CT, PET, Ultrasound, MRI) | DICOM | DICOM |
Element 4A
List the repository or repositories where scientific data and metadata generated will be archived. It is encouraged to preserve and share data through established repositories.
Here are lists of repositories where scientific data generated from an NIH-funded award can be deposited and archived:
NIH encourages the use of domain-specific repositories where possible; however, such repositories are not available for all datasets. When researchers cannot locate a repository for their discipline or the type of data they generate, a generalist repository (which accepts data regardless of data type, format, content, or disciplinary focus) can be a useful place to share data.
Desirable attributes of repositories where scientific data generated from an NIH-funded award can be deposited include:
- Unique persistent identifiers
- Long-term sustainability
- Metadata
- Curation and quality assurance
- Free and easy access
- Broad and measured reuse
- Security and integrity
- Confidentiality
- Provenance
- Retention policy
Human omics data that meet the GDS policy threshold of large-scale data should be archived in an NIH-supported data repository.
Non-human omics data that meet the GDS policy threshold of large-scale data can be archived in any established data repository.
Element 4B
Data archived in repositories must be findable and identifiable. An established repository will assign accession numbers, digital object identifiers (DOI), or unique persistent identifiers to deposited data.
Mention how the data will be findable and identifiable. The recommended format in publications is citation of repositories, trackable IDs, and associated URL locations where applicable.
Element 4C
State when and for how long the data will be shared. It is expected that data will be made available at the time of publication or before the end of the award, whichever comes first.
However, human omics data that meet the GDS policy threshold for large-scale omics data must be registered in dbGaP and deposited in an NIH-supported data repository within 9 months of all data collection and quality control (after an initial round of analysis or computation to clean the data and for quality control).
Repositories usually set time limits on data availability.
Element 5A
Broad sharing of scientific data is highly encouraged.
List, if any, factors that will affect subsequent access, distribution or reuse of scientific data. Provide justification if broad data sharing is not possible.
Element 5B
State if the shared scientific data will be open access or controlled access.
Data generated from human subjects including from patient derived xenografts, primary tumors, organoids, and primary cell lines are recommended to be deposited with controlled access to protect patient identity, even if samples are de-identified.
Element 5C
If generating scientific data derived from humans, describe how the privacy, rights, and confidentiality of human research participants will be protected (e.g., through de-identification, Certificates of Confidentiality, and other protective measures).
It is highly encouraged to obtain informed consent from human subjects that includes explicit allowance of broad research use of biospecimens. Additional information can be found at the Considerations for Obtaining Informed Consent webpage.
Element 6
Describe how compliance with this DMS Plan will be monitored and managed at your institution and by whom (e.g., titles, roles).
Sharing DMS Activities in RPPRs and Requesting Revisions to DMS Plans
Details about reporting DMS activities in section C5.c of Research Performance Progress Reports (RPPRs) and processes for requesting revisions to an approved DMS Plan can be found in a document with DMS Policy Updates (as of Jan. 2025).
Additional information can also be found in the NIH Guide for Grants and Contracts:
- NOT-OD-24-175: Reporting Data Management and Sharing (DMS) Plan Activities in Research Performance Progress Reports (RPPRs) Submitted on or After October 1, 2024
- NOT-OD-24-176: Updated Processes for Requesting Revisions to an Approved Data Management and Sharing (DMS) Plan
Additional Resources Related to the NIH DMS Policy
- NIH Data Management and Sharing (DMS) Policy Website
- NCI Office of Data Sharing webpage on
How to Write a Data Management and Sharing (DMS) Plan - NCI Office of Data Sharing webpage on Data Sharing and Policy Guidance
- NIH Science and Technology Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) initiative Website
DCB Contact for the NIH DMS Policy
If you have DCB-related questions about the NIH DMS Policy, please contact Dr. Soumya Korrapati.