Guides: Research Data Management: Introduction

What is Research Data Management?

Good research starts with good data. This guide will help you understand how to organize, store, protect, and share your data throughout your research journey. Whether you're a student or a seasoned researcher, mastering Research Data Management (RDM) makes your work more efficient, credible, and reusable.

📊

What is Research Data?

Research data refers to any information collected, observed, or created for analysis to produce original research results, including numbers, texts, images, and more.

🗂️

What is Research Data Management?

Research Data Management (RDM) involves organizing, storing, and preserving data throughout the research lifecycle to ensure accessibility, reproducibility, and long-term value.

🔍

Why does RDM matter?

RDM ensures data integrity, supports collaboration, meets funder requirements, and maximizes the impact and reuse of research findings.

The Research Data Lifecycle

The Research Data Lifecycle describes the stages that research data go through from the beginning to the end of a research project (Hodge, 2000). It typically includes planning, collecting, processing, analyzing, preserving, sharing, and reusing data. Understanding this cycle helps researchers manage their data effectively, ensuring it remains accurate, secure, and accessible throughout and after the project.

References:

Hodge, G. M. (2000). Best Practices for Digital Archiving: An Information Life Cycle Approach. D-Lib Magazine, 6(1).

https://doi.org/10.1045/january2000-hodge

Welcome to the Research Data Lifecycle

The research data lifecycle represents the stages that data goes through from initial planning to long-term preservation and reuse.

Instructions:

Click on any stage circle to see detailed information
Use the "Play Tour" button for an automated walkthrough

The FAIR principles

Want your research to have a bigger impact? The FAIR principles —Findable, Accessible, Interoperable, and Reusable—help ensure your data can be discovered, understood, and reused by others (Wilkinson et al., 2016). This guide will show you simple steps to make your data FAIR and future-ready.

See the original document for more details.

Reference:

Wilkinson, M. D., Dumontier, M., Aalbersberg, Ij. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L. B., Bourne,

P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., … Mons, B. (2016). The

FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3(1), Article 160018.

https://doi.org/10.1038/sdata.2016.18

FAIR Data Principles

F - FINDABLE A - ACCESSIBLE I - INTEROPERABLE R - REUSABLE

F - FINDABLE

Data should be easy to find for both humans and computers

Key Requirements:

F1: Data are assigned globally unique and persistent identifiers

F2: Data are described with rich metadata

F3: Metadata clearly references data identifier

F4: Metadata are registered in searchable resources

Examples:

DOI Assignment

Research datasets assigned DOIs like "10.1234/example-dataset-2023"

Rich Metadata

Detailed descriptions including author, date, methodology, keywords

Repository Registration

Datasets indexed in services like DataCite, Google Dataset Search

A - ACCESSIBLE

Data should be accessible and retrievable by identifier using standardized protocols

Key Requirements:

A1: Data are retrievable by identifier using standardized protocols

A1.1: Protocol is open, free, and universally implementable

A1.2: Protocol allows for authentication and authorization

A2: Metadata are accessible even when data are no longer available

Examples:

HTTP/HTTPS Access

Data accessible via standard web protocols with proper authentication

API Endpoints

RESTful APIs with clear documentation and access controls

Persistent Metadata

Metadata remains available even after data removal

I - INTEROPERABLE

Data should be interoperable and integrate with other data and work with applications or workflows

Key Requirements:

I1: Data use formal, accessible, shared, broadly applicable language

I2: Data use vocabularies that follow FAIR principles

I3: Data include qualified references to other data

Examples:

Standard Formats

JSON, XML, CSV formats with standardized schemas

Controlled Vocabularies

Using ontologies like Gene Ontology, FAIR vocabularies

Linked Data

RDF, semantic web technologies for data linking

R - REUSABLE

Data should be well-described and reusable so they can be replicated and combined

Key Requirements:

R1: Data are described with rich metadata

R1.1: Data are released with clear and accessible data usage license

R1.2: Data are associated with detailed provenance

R1.3: Data meet domain-relevant community standards

Examples:

Open Licenses

Creative Commons, MIT, GPL licenses with clear usage terms

Provenance Information

Detailed history of data creation, processing, and modifications

Community Standards

Following discipline-specific data formats and metadata standards

1 of 4