Guides: Research Data Management: Data Sharing

Why Data Sharing?

Data sharing is a core component of Research Data Management: it involves preparing, documenting and depositing your datasets in trusted repositories so that other researchers can discover, access and reuse them. By applying clear metadata standards, persistent identifiers (e.g., DOIs) and appropriate licensing, data sharing ensures that your work remains findable, interpretable and citable long after publication.

🔍

Enhanced Discoverability

Data sharing makes research findings more accessible and discoverable to the global community.

🔄

Increased Collaboration

Sharing data fosters collaboration among researchers, leading to more comprehensive studies.

📈

Improved Reproducibility

Shared data allows others to verify and reproduce research results, enhancing credibility.

⏳

Time Efficiency

Researchers save time by reusing existing data rather than collecting new datasets.

💰

Cost Savings

Reduces duplication of efforts and costs associated with data collection and storage.

🌍

Broader Impact

Shared data can be used across disciplines, maximizing the impact of research.

🔬

Accelerated Innovation

Open data accelerates scientific progress by enabling new analyses and discoveries.

📚

Educational Value

Shared datasets serve as valuable resources for teaching and training purposes.

What Data Should Be Shared?

When preparing to share data, prioritize transparency and reusability by including well-structured datasets, clear documentation, and reproducible code. However, always assess whether the data contain confidential, sensitive, or restricted elements, and take appropriate steps such as anonymization or controlled access to ensure compliance with ethical and legal standards.

✅ Data You Are Encouraged to Share

Generated Table

Type of Data	Description	Purpose
Raw Data	Original observations, measurements, or readings collected during research.	Enables replication, validation, or reanalysis by other researchers.
Processed/Analyzed Data	Cleaned, transformed, or aggregated data used in your publications.	Allows others to reproduce findings and perform secondary analyses.
Metadata	Descriptive information about the dataset (e.g., variables, units, data collection methods).	Facilitates data discovery, interpretation, and reuse.
Documentation	README files, codebooks, protocols, and lab notes that describe the dataset.	Ensures proper understanding and reuse of data by others.
Code & Scripts	Analytical scripts, software code, or workflows used for data processing and analysis.	Enhances reproducibility and transparency of research workflows.

⚠️ Data You Should Not Share

Generated Table

Type of Data	Risk/Concern	Recommendation
Personally Identifiable Information (PII)	Risk of identifying individuals from the data.	Anonymize or de-identify the data before sharing.
Sensitive Data	May include health records, genetic data, or confidential social data.	Apply ethical review and data access controls.
Proprietary or Restricted Data	Owned by third parties or subject to license/contractual restrictions.	Obtain permission or provide summary data if sharing is blocked.
Data Under Embargo	Temporarily restricted due to publication or funding requirements.	Share after embargo period ends, if allowed.

Where to Share Data?

Choosing the right repository is essential for ensuring your research data is discoverable, accessible, citable, and preserved in the long term. Researchers should select a repository that aligns with their discipline, data type, or institutional policies.

Repository Type	Description	Examples	Best For	Key Features
General-purpose	Accepts data from any discipline. Easy to use and broadly accessible.	Zenodo, Figshare, Dryad, OSF (Open Science Framework)	Multidisciplinary projects or when no subject-specific repository exists	DOI assignment, version control, basic metadata support, easy sharing
Disciplinary	Tailored for specific research domains, often with community standards.	GenBank (genetics), ICPSR (social sciences), PANGAEA (earth sciences)	Discipline-specific data types and formats	Domain metadata, strong community uptake, citation metrics
Institutional	Hosted by universities or research institutes to support affiliated staff.	HKU Scholars Hub, DataHub@PolyU	Internal data sharing, policy compliance, institutional visibility	Authentication, institutional branding, data preservation policies
Journal/Publisher-linked	Connected to journal submissions; some require mandatory data deposit.	Elsevier’s Mendeley Data, Springer Nature’s figshare, Dryad (linked)	Journal articles with data availability policies	Integration with peer review, citation linking, curated datasets
Government/Funder Repositories	Required or recommended by funding agencies or government bodies.	NIH dbGaP, UK Data Service, European Open Science Cloud (EOSC)	Funded research, compliance with data mandates	Secure access control, compliance with legal/ethical frameworks, persistent archiving

How to Make Your Data Easier to Share?

Ensuring your data is well-described and organized dramatically improves discoverability, reuse, and citation. Adopting established metadata standards helps you and others understand the structure, provenance, and context of your dataset—making sharing seamless and compliant with both institutional and funder requirements.

Key Metadata Standards

Dublin Core – A simple, widely-adopted set of 15 core elements (e.g., title, creator, date) that provides a baseline for cross-domain interoperability.
DataCite Metadata Schema – Designed for research outputs, this schema supports rich descriptions (e.g., related identifiers, funding information) and DOI registration.
Domain-Specific Standards – Tailor your metadata to community best practices, such as MIAME for microarray experiments, ISO 19115 for geographic information, or other discipline-focused schemas.