Mr. Isaac Tai
Assistant Librarian (User Services)
Email: cytai@eduhk.hk
Tel: (852) 2948-6681
We warmly welcome faculty members, researchers, and academic departments to partner with us in organizing tailored workshops that support students' research skills development and academic success.
Data processing in research data management involves the systematic transformation, cleaning, analysis, and preparation of raw research data to make it suitable for analysis and interpretation. It's a critical component that bridges the gap between data collection and meaningful research outcomes, ensuring data quality, integrity, and usability throughout the research lifecycle
The Data Processing Lifecycle outlines the essential steps for managing and preparing research data for analysis and reuse. This structured approach ensures that data remains accurate, consistent, and meaningful throughout the research process.
Gathering raw data from various sources, for example, through experiments, surveys, sensors, databases, data collection APIs, etc.
Identifying and correcting errors, inconsistencies, and missing values to ensure data quality and reliability.
Converting data into suitable formats and combining multiple data sources for comprehensive analysis.
Applying statistical methods, machine learning algorithms, and analytical techniques to extract insights.
Creating comprehensive metadata, documentation, and ensuring long-term preservation of processed data.
Research data often comes in diverse forms — numbers, text, images, maps, and signals. Each type requires specific processing techniques to extract meaningful insights. By leveraging appropriate tools and methodologies, researchers can ensure the quality, reusability, and interpretability of their data throughout the research lifecycle.
Technique | Description | Common Tools/Methods | Applications in RDM |
---|---|---|---|
Statistical Processing | Use descriptive and inferential statistics to summarize and interpret data. | R, SPSS, Python (pandas, statsmodels), SAS | Data validation, hypothesis testing, estimating trends, regression modeling |
Machine Learning | Apply algorithms to learn patterns from data. Includes supervised, unsupervised, and deep learning. | scikit-learn, TensorFlow, Keras, Weka | Predictive analytics, anomaly detection, recommendation systems |
Text Processing | Extract and interpret information from textual datasets using Natural Language Processing (NLP). | NLTK, spaCy, Gensim, TextBlob | Literature reviews, survey analysis, sentiment analysis, research article mining |
Image Processing | Analyze and enhance visual data such as scans, X-rays, or satellite images. | OpenCV, ImageJ, MATLAB, TensorFlow (CV modules) | Biomedical imaging, remote sensing, historical manuscript digitization |
Spatial Analysis | Analyze data with geographic or spatial components using mapping and clustering techniques. | QGIS, ArcGIS, GeoPandas, Google Earth Engine | Environmental monitoring, urban planning, epidemiological tracking |
Signal Processing | Analyze time-series or sensor-generated data with mathematical transforms and filters. | MATLAB, SciPy, Audacity, EEGLAB | EEG/ECG analysis, environmental sensors, audio processing in linguistic research |