IBED data management guidelines for PhDs and postdocs

IBED wants to encourage temporary scientific staff to practice good data management that will allow internal and external colleagues to build on the results of their work. It is our desire that good data management practices become a regular part of our way of working in the institute. To stimulate the development of a culture of good data management, IBED offers a bonus of €1000,- gross to leaving postdocs and PhD candidates, who have organized their data according to a set of minimum requirements. The bonus is offered to contract and bursary PhD candidates, and occasionally to external PhD candidates after consultation with the supervisor and institute manager. Postdocs can apply for the bonus if they worked on a research project for at least one year. See the full list of currently submitted archives1. Having well-managed data means that anyone can find a particular dataset including all of the information needed to readily use that data. This will prevent data loss and enhance reuse. We expect all temporary staff to have organized their data according to the following requirements:

1. Overview

Requirements Description
1. Well-organized data archive 📂 Data are stored in a digital archive with a logical structure containing all the necessary information, including code and protocols, to reproduce the results of a research project.
2. Metadata 🔎 Datasets and chapters are documented with metadata so that the digital archive is self-explanatory and can be understood by collaborators and supervisors.
3. Data publication 🌐 Data is Published in online repositories for accessibility and compliance with FAIR principles.

Since every project is different, the requirements were meant to be as generic as possible so that every project can fit in. To get started please check RDM minimum requirements explained and best practices including tips for 📝 writing of a data management plan, 📂 file and directory naming, 🔢 spreadsheet formatting, 📄 file formats and 🔧 writing code. To help you meet the requirements for the bonus also a checklist was made and several example archives are listed. Help and guidance can also be obtained by consulting IBED’s data manager, currently Johannes De Groeve, at any stage of your research. Further, it is recommended to organize a meeting early onwards during your appointment to give you a headstart in applying good data management practices.

You can apply for the bonus by organizing a meeting with your supervisor and with IBED’s data manager (mailto:j.degroeve@uva.nl) during which you demonstrate how you stored your data. Make this appointment preferably 2 months before you will leave IBED, but also later is no problem. The supervisor will evaluate the scientific completeness of the archive and the data manager will evaluate its technical merits. If your data archive meets the requirements, the data manager will confirm this to the institute manager, who will then implement the bonus payment. See roadmap below for a guaranteed successfull application.

Example archives

type Source Description
PhD data archive de Nijs A. E. 2024 A well-organized and documented data archive including data and code from various (un)published chapters.
Sadia M. 2024 Consistent publication of data and code alongside publications throughout PhD is sufficient to obtain the bonus.
Publication archive Salvatori, De Groeve et al. 2015 A well-organized data archive including descriptions, data and code to reproduce the analysis linked to a publication.

2. Roadmap

By following these steps, you can ensure that your data is organized properly and meets IBED’s requirements for the DM Bonus.

Step Action Description
1. Consult IBED’s Data Manager (optional) Schedule an appointment with the data manager. This is optional but recommended to get guidance on good data management practices early in your project. The data manager can be consulted in any stage of your appointment for RDM related issues, uncertainties and questions.
2. Review RDM minimum requirements and best practices Familiarize yourself with RDM requirements and best practices. Understand the RDM minimum requirements and checklist. Inform yourself about RDM best practices and check example archives for inspiration.
3. Keep your data and code organized Create a digital archive, document metadata, and publish your data. Store your data with logical structure, including code and protocols. Ensure metadata is added for clarity. Publish your data in an online repository, ensuring it meets FAIR principles. Use checklist ✅.
4. Prepare for evaluation Schedule an appointment with your supervisor and data manager. Arrange this appointment at least 2 months before departure, but also later is no problem. Your supervisor will evaluate scientific completeness, and the data manager will assess technical aspects.
5. Evaluation and confirmation Supervisor and data manager confirm your archive meets the requirements. If your data archive is in order, the data manager will confirm this to the institute manager.
6. Receive the bonus Institute manager processes the bonus payment. After confirmation, you will receive the €1000 bonus payment.

3. Minimum requirements explained

3.1. Set up a digital archive

📂 A digital archive is a structured directory tree containing all the material to reproduce research. Such a digital archive should have a logical structure. The following web article is a good example of how a well-organized directory tree might look like. Moreover, the author (De Cock T.) provides an automatic folder structure generator based on the principles described in the web article. If this directory structure matches your research workflow you are free to use it. See Salvatori, De Groeve et al. for an example of a well-organized digital archive and check also the best practices for file and directory naming for more ideas.

A digital archive should:

1. Follow a logical directory structure for easy navigation.
2. Include all essential datasets, protocols, scripts, and outputs to reproduce the research.
3. Use consistent and clear file and directory naming conventions.

Example chapter structure in archive:

Example 1

.
├── README.md
├── code
│   ├── 0_data_preparation.Rmd
│   └── 1_analysis.Rmd
├── data
│   ├── input
│   └── output
├── docs
│   ├── manuscript.docx
│   ├── figs
│   └── tabs
└── project_name.Rproj

Example 2

.
├── README.md
├── code
│   ├── 0_data_preparation.Rmd
│   └── 1_analysis.Rmd
├── data
│   ├── raw
│   └── processed
├── docs
│   ├── manuscript.docx
│   ├── figs
│   └── tabs
└── project_name.Rproj

3.2. Metadata

🔎 Metadata provides essential details about datasets and chapters/projects — who, what, where, when, why, and how — to ensure they are understandable, reusable, and interoperable. Researchers can use any method to document metadata, but data formats (txt, json, markdown) editable with a text editor are preferred. If no metadata standard is used within your study field it is recommended to use the attributes specified by the Dublin Core (DC) Element Set (see here for element set description). Include metadata as separate METADATA files (<dataset>_metadata) or include dataset metadata in the chapter level README. Please follow the links for example metadata files using the Dublin-Core Element Set for (1) Datasets (txt, json) and (2) Chapters (txt, json) in txt and json format.

Each chapter should include a README:

1. Describe briefly the chapter, the content and directory structure.
2. Describe which scripts and datasets are used for which analysis.
3. Document the units and column descriptions for tabular data if not provided elsewhere.
4. Specify analysis software, software versions and any dependencies.
5. If datasets cannot be published (e.g., privacy concerns, size), document the reasons and a reference contact person. If allowed, publish the metadata of the restricted data and store the actual data on institutional storage (see also below).

Each dataset should include the following metadata:

1. Who created the data.
2. When was the data created.
3. Where was the data created (i.e. spatial context or study area of samples, laboratory).
4. Who is the owner.
5. What does the data represent (In case of tabular data, document column descriptions, units, missing values and abbreviations).
6. Why and how it was collected.

3.3. Publish

🌐 Publishing archives in repositories like Figshare, Zenodo or OSF (see more examples) ensures accessibility and compliance with FAIR principles. To ensure that all data and code linked to a PhD can be found, including already published material, it is necessary to add the DOIs of already published resources to the data archives' metadata. In general-purpose repositories like Figshare and Zenodo linked resources can be added as related resources.

1. Chapters data and code can be embargoed, made public, or stored as private repository
2. Chapters can be published all together (e.g. as zip), or Chapter-by-Chapter
3. For the Chapter-by-Chapter approach: Create a "Community" (Zenodo, see instructions) or "Collection" (Figshare, see instructions) to organize chapters
4. Make sure that chapters/datasets that are already published are referenced to in the data archive's description
5. For privacy-sensitive datasets included in public repositories make sure data is anonymized or pseudonymized
6. If dataset and chapters cannot be published (not even under embargo) publish the metadata of the restricted data and store the actual data on institutional storage (see also below). Don't forget to document the reasons of restriction and a reference contact person.

3.4. Guidelines for raw data

Raw data refers to the unprocessed or minimally processed data directly obtained from research activities. While it is encouraged to make raw data publicly available, there might be reasons why that is not possible. For instance, a common issue is that raw data is too large to submit in public archives. In case of restrictions for making raw data public, the minimal requirement for unpublished raw data are that its metadata is published and that the data is stored securely in institutional repositories. Such raw datasets could be submitted in an Institutional Research Drive by making a request to the Data Manager, or other long-term project specific (e.g. radar, uva-bits) and more generic (lab server, Tape IBED) institutional infrastructures are also valid options.

1. Researchers are encouraged to make raw data, before pre-processing, public where possible.
2. Researchers are encouraged to use non-proprietary and/or open data formats (e.g. csv).
3. When raw data cannot be published due to sensitivity, size, or access restrictions, document the reasons clearly and provide metadata linking to the stored raw data.
4. Use appropriate metadata to describe raw data comprehensively, ensuring it is accessible and understandable for future use. Make sure that the metadata clearly describes the link between raw data and derived datasets.

4. Checklist

✅ This checklist for FAIR Compliance includes the minimum requirements which you can use to prepare and access your digital research data archive.

General requirements Logical directory structure for easy navigation.
Include needed datasets, protocols, scripts, and outputs to reproduce the research.
Use consistent and intuitive file naming
Metadata README files in each chapter provide detailed documentation.
Metadata explains the "who, what, where, when, why, and how".
Units, columns, and abbreviations of datasets are clearly described.
Publish Publish data in FAIR-compliant repositories with DOIs.
Non-public data is justified, with access protocols documented.

  1. To open a link in a new tab: right-click -> open in new tab