Skip to Main Content

Research Data Management

Good research starts with good data. This guide will help you understand how to organize, store, protect, and share your data throughout your research journey.

Why Data Sharing?

Data sharing is a core component of Research Data Management: it involves preparing, documenting and depositing your datasets in trusted repositories so that other researchers can discover, access and reuse them. By applying clear metadata standards, persistent identifiers (e.g., DOIs) and appropriate licensing, data sharing ensures that your work remains findable, interpretable and citable long after publication.

🔍

Enhanced Discoverability

Data sharing makes research findings more accessible and discoverable to the global community.

🔄

Increased Collaboration

Sharing data fosters collaboration among researchers, leading to more comprehensive studies.

📈

Improved Reproducibility

Shared data allows others to verify and reproduce research results, enhancing credibility.

Time Efficiency

Researchers save time by reusing existing data rather than collecting new datasets.

💰

Cost Savings

Reduces duplication of efforts and costs associated with data collection and storage.

🌍

Broader Impact

Shared data can be used across disciplines, maximizing the impact of research.

🔬

Accelerated Innovation

Open data accelerates scientific progress by enabling new analyses and discoveries.

📚

Educational Value

Shared datasets serve as valuable resources for teaching and training purposes.

What Data Should Be Shared?

When preparing to share data, prioritize transparency and reusability by including well-structured datasets, clear documentation, and reproducible code. However, always assess whether the data contain confidential, sensitive, or restricted elements, and take appropriate steps such as anonymization or controlled access to ensure compliance with ethical and legal standards.

✅ Data You Are Encouraged to Share

Generated Table
Type of Data Description Purpose
Raw Data Original observations, measurements, or readings collected during research. Enables replication, validation, or reanalysis by other researchers.
Processed/Analyzed Data Cleaned, transformed, or aggregated data used in your publications. Allows others to reproduce findings and perform secondary analyses.
Metadata Descriptive information about the dataset (e.g., variables, units, data collection methods). Facilitates data discovery, interpretation, and reuse.
Documentation README files, codebooks, protocols, and lab notes that describe the dataset. Ensures proper understanding and reuse of data by others.
Code & Scripts Analytical scripts, software code, or workflows used for data processing and analysis. Enhances reproducibility and transparency of research workflows.

⚠️ Data You Should Not Share

Generated Table
Type of Data Risk/Concern Recommendation
Personally Identifiable Information (PII) Risk of identifying individuals from the data. Anonymize or de-identify the data before sharing.
Sensitive Data May include health records, genetic data, or confidential social data. Apply ethical review and data access controls.
Proprietary or Restricted Data Owned by third parties or subject to license/contractual restrictions. Obtain permission or provide summary data if sharing is blocked.
Data Under Embargo Temporarily restricted due to publication or funding requirements. Share after embargo period ends, if allowed.

Where to Share Data?

Choosing the right repository is essential for ensuring your research data is discoverable, accessible, citable, and preserved in the long term. Researchers should select a repository that aligns with their discipline, data type, or institutional policies.

Repository Type Description Examples Best For Key Features
General-purpose Accepts data from any discipline. Easy to use and broadly accessible. Zenodo, Figshare, Dryad, OSF (Open Science Framework) Multidisciplinary projects or when no subject-specific repository exists DOI assignment, version control, basic metadata support, easy sharing
Disciplinary Tailored for specific research domains, often with community standards. GenBank (genetics), ICPSR (social sciences), PANGAEA (earth sciences) Discipline-specific data types and formats Domain metadata, strong community uptake, citation metrics
Institutional Hosted by universities or research institutes to support affiliated staff. HKU Scholars Hub, DataHub@PolyU Internal data sharing, policy compliance, institutional visibility Authentication, institutional branding, data preservation policies
Journal/Publisher-linked Connected to journal submissions; some require mandatory data deposit. Elsevier’s Mendeley Data, Springer Nature’s figshare, Dryad (linked) Journal articles with data availability policies Integration with peer review, citation linking, curated datasets
Government/Funder Repositories Required or recommended by funding agencies or government bodies. NIH dbGaP, UK Data Service, European Open Science Cloud (EOSC) Funded research, compliance with data mandates Secure access control, compliance with legal/ethical frameworks, persistent archiving

How to Make Your Data Easier to Share?

Ensuring your data is well-described and organized dramatically improves discoverability, reuse, and citation. Adopting established metadata standards helps you and others understand the structure, provenance, and context of your dataset—making sharing seamless and compliant with both institutional and funder requirements.

 

Key Metadata Standards

 

  • Dublin Core – A simple, widely-adopted set of 15 core elements (e.g., title, creator, date) that provides a baseline for cross-domain interoperability.
  • DataCite Metadata Schema – Designed for research outputs, this schema supports rich descriptions (e.g., related identifiers, funding information) and DOI registration.
  • Domain-Specific Standards – Tailor your metadata to community best practices, such as MIAME for microarray experiments, ISO 19115 for geographic information, or other discipline-focused schemas.