Best practices

1. Data Management Plan

📝 Make a data management plan (DMP) from the start and you will be far ahead. Here are some essentials to think about when setting up a plan.

1. Types of data

What types of data will you be creating or capturing: experimental data, observational data, model simulations, retrieval of existing data?
How will you capture, create, and/or process the data? (e.g. used instruments, software, imaging)

2. Contextual details (Metadata) needed to make data meaningful to others

What will be the naming convention of your files?
What file formats will you be using?

3. Quality control

What will you do to ensure that the data will not be erroneous? (during the data generation/collection process, during the data entry process and/or during further data pro-cessing; also list what software or rules do you use to check quality)
What are the roles in quality control? (do you ask a collaborator or your supervisor to perform a check on the data)

4. Storage, backup and security

What will be the URL where your data will be available?
What is your backup plan for the data?

5. Copyright and reuse

Who will own the copyright or intellectual property rights to the data?
How and among who will the data be shared during the project and when the project is finished?

6. Protection and privacy

If relevant: how are you addressing any ethical or privacy issues? (e.g. limiting access, encryption, anonymization of data)?

2. Naming files and directories

📂 Start your project on the right foot! Organizing your files and directories effectively can save you countless hours and make collaboration a breeze. While there is no single "best" way to structure and name files, different methods may be more suitable for specific projects, depending on the nature of the work and personal preferences.

A common challenge faced by students and researchers alike is maintaining an organized project directory. The key to overcoming this is to start with a well-defined directory structure tailored to each individual project unit, such as a manuscript or thesis. This structure should be informed by your project proposal’s roadmap and Data Management Plan (DMP). While your project units and their requirements may evolve, having a clear starting point can provide a solid foundation for managing your data effectively.

Here, we provide practical guidelines to help you establish a consistent and efficient approach to naming files and directories. These examples are designed to be simple yet adaptable to suit your project's specific needs.

Key Guideline	Short Description	Long Description
Organize by Relevant Metadata	Sort files by compound, technique, date, or collector.	Sort files in directories based on essential metadata such as compound, technique, date, or person collecting the data to enhance findability and maintain consistency across your dataset.
Establish Clear Naming Conventions	Create a well-defined naming system for files.	Define a consistent and intuitive naming convention. Ensure collaborators or users understand the rationale behind the structure and follow it to maintain data integrity and usability.
Fully Describe File Contents in Filenames	Include key details in filenames for easy identification.	Filenames should comprehensively identify file contents, allowing files to be located easily through search. Avoid abbreviations that may confuse others and ensure clarity for anyone accessing the data in the future.
Avoid Special Characters and Spaces	Use underscores instead of spaces or special characters.	Do not use special characters like `@`, `#`, or spaces in filenames. Instead, use underscores `_` to separate words, ensuring filenames are compatible across systems and applications.
Use ISO 8601 Date Format	Adopt the YYYYMMDD date format for filenames.	Always use the ISO 8601 date format (YYYYMMDD) in filenames to maintain consistent sortability. This format ensures that files are ordered chronologically in directory listings and simplifies future retrieval.

3. Spreadsheets

🔢 Spreadsheets are a powerful tool for organizing, analyzing, and exploring data, but they come with risks if not handled properly. Following best practices ensures your data remains consistent, interpretable, and ready for future use. Key practices include maintaining raw data integrity, using clear and standardized formats, and ensuring tables are well-structured. For long-term storage, consider saving spreadsheets in a CSV (comma-separated values) format, complemented by metadata files that describe the dataset's structure and conventions. Adopting these habits will not only safeguard your data but also enhance its usability for future analyses and collaborations.

Below are essential guidelines to keep your spreadsheet data organized and reliable:

Key Guideline	Short Description	Long Description
Keep Raw Data Raw	Preserve the original data.	Always keep a copy of the raw data unchanged. Use a separate file for performing calculations and manipulations to avoid corrupting the original data.
Use Single Rectangular Tables	Create one table per spreadsheet.	Each spreadsheet should contain a single rectangular table with a single header line. Ensure data is entered consistently across rows and columns.
Avoid Empty Rows or Columns	Maintain continuity in tables.	Do not leave empty rows or columns in the table to ensure the data structure remains intact and software can read it without errors.
Use Descriptive Column Labels	Choose compact, descriptive column names.	Column labels should be clear and concise. Avoid spaces or special characters, using only letters, numbers, and underscores `_` for readability and compatibility.
Keep Columns Homogeneous	Use one data type per column.	Ensure that each column contains a single data type or unit. For categorical data, use a consistent set of labels throughout the column.
Align Column Order Across Tables	Keep similar columns in the same order.	When creating multiple tables, order similar columns consistently to make data easier to compare and merge later.
Standardize Missing and Exceptional Values	Define rules for special cases.	Choose a clear method to encode missing values, detection limits, and other exceptions, and apply it consistently across the dataset.
Ensure Consistent Formats and Spelling	Standardize labels and codes.	Use consistent formats and spelling throughout the dataset. For example, use the same labels for categories like gender (`M`, `F`) and avoid switching languages.
Separate Date and Time Values	Store date and time as distinct columns.	Store year, month, day, and time components (if relevant) in separate columns for clarity and compatibility with analysis tools.
Avoid Visual and Pop-Up Annotations	Add annotations as columns.	Do not use color-coding or pop-up notes in your data. Instead, include annotations as additional columns (e.g., `notes`) in the table for clarity and exportability.
Use Decimal Degrees for Spatial Data	Store GPS coordinates in decimal degrees.	Record spatial information using latitude and longitude in decimal degrees (WGS84 coordinate system). This ensures compatibility with GIS and mapping tools.
Include Metadata	Document column labels, units, and labels.	Provide metadata that explains the meaning of column labels, measurement units, and other conventions. Store this in a separate sheet or a `.txt` file for easy reference (e.g. the dataset `water_quality_2017_05.csv` would be accompanied by the metadata files `water_quality_2017_05_metadata.txt`)
Use CSV for Long-Term Storage	Save data in CSV format.	For archival purposes, save data as `.csv` files.

4. File formats

📄 The use of standard file formats with consistent naming conventions is critical for maintaining data accessibility and usability in the long term. Thoughtful consideration of file formats can help ensure your data remains identifiable and usable by others in the future.

When selecting tools and formats for storing your data, pay close attention to the following key principles:

Preservation and Accessibility: Whenever possible, opt for open-standard formats that are widely recognized and easily reusable. For instance, saving plain text files (.txt) is preferable to proprietary formats like PDFs for preservation purposes.
Software and Compatibility: Include information about any specific software or versions required to view your data, such as SPSS v.3 or Microsoft Excel 97-2003.
Version Control and Conversion Considerations: Clearly document version control practices and specify if data will transition between formats during its lifecycle. Highlight any features that could be lost during format conversion, such as system-specific labels.

Below, you’ll find detailed recommendations for preferred and acceptable formats across various data types. These guidelines draw on the UK Data Archive's best practices for managing and sharing data.

Data Type	Preferred Formats	Other Acceptable Formats
Documentation and Scripts	- Plain text (.txt) - Markdown (.md) - Open Document Text (.odt) - Rich Text Format (.rtf) - HTML (.htm, .html)	- Widely-used proprietary formats, e.g., MS Word (.doc/.docx) or MS Excel (.xls/.xlsx) - XML marked-up text (.xml) according to an appropriate DTD/schema (e.g. XHMTL 1.0) - PDF/A or PDF (.pdf)
Spectroscopic Data	- JCAMP format (NMR, IR, RAMAN, UV, Mass Spectrometry)
Geospatial Data	- GeoPackage - Georeferenced TIFF (.tif, .tfw) - GeoJSON - ESRI Shapefile (.shp, .shx, .dbf, .prj, .sbx, .sbn) - CAD data (.dwg) - KML (.kml)	- ESRI Geodatabase format (.mdb) - MapInfo Interchange Format (.mif)
Digital Image Data	- TIFF version 6 uncompressed (.tif)	- JPEG (.jpeg, .jpg) - Other TIFF versions (.tif, .tiff) - JPEG 2000 (.jp2) - PDF/A or PDF (.pdf)
Digital Video Data	- MPEG-4 High Profile (.mp4)	- JPEG 2000 (.mj2)
Digital Audio Data	- FLAC (.flac) - WAV (.wav) - MP3 (spoken word only)	- MP3 (general use) - AIFF (.aif)
Qualitative (Textual) Data	- XML text with appropriate DTD/schema (.xml) - Rich Text Format (.rtf) - Plain text, UTF-8 (Unicode) (.txt)	- Plain text, ASCII (.txt) - HTML (.html) - MS Word (.doc/.docx) - LaTeX (.tex)
Quantitative Data with Metadata	- SPSS portable format (.por) - Delimited text with setup file (SPSS, Stata, SAS) - Structured text or marked-up metadata file (e.g., DDI XML)	- MS Access (.mdb/.accdb)
Quantitative Tabular Data (Minimal)	- Comma-separated values (.csv) - Tab-delimited file (.tab)	- Delimited text using unique delimiters (.txt) - Proprietary formats (MS Excel (.xls/.xlsx), MS Access (.mdb/.accdb), dBase (.dbf), OpenDocument Spreadsheet (.ods))

5. Writing code

🔧 In research, writing clear and well-structured scripts is critical for ensuring reproducibility, transparency, and collaboration in scientific studies. Following coding best practices can save time, reduce errors, and make your analyses more accessible to others.

It is highly recommended to use code for all processing steps, from data preparation, data filtering & extraction to the final analysis. By writing scripts for these tasks, you ensure that the raw data remains untouched while creating a reproducible workflow. This approach avoids relying on proprietary software, which may offer convenience but often uses file formats and processing steps that are not easily replicated. By embracing coding, you align your workflows with open science principles, ensuring your data and results are accessible, transparent, and reusable - which can ultimately be used to validate your research.

Learning to code may feel like a steep curve at first, but the effort is highly rewarding. Moreover, IBED's Computational Support Team provides coding support and resources to help you get started and overcome challenges. As you develop your skills, you'll gain the ability to automate tasks, handle large datasets more efficiently, and collaborate with others more effectively. For those new to coding, starting with simple steps like organizing code by projects, adding comments and using clear names for variables and functions can significantly improve your scripts. As you become more comfortable, adopting tools like version control systems or writing modular code can further enhance your workflow and facilitate collaboration in larger projects. These best practices are designed to help you produce reliable, reusable, and efficient code for your research. For the real coding enthousiasts, please find here a step-by-step tutorial for making R packages and using git.

Below is a table summarizing these best practices, with links to resources to help you get started.

Key Guideline	Short Description	Long Description
Use Projects to Organize Your Work	Structure your work into well-defined projects.	Using projects to organize your code, data, and related materials ensures a coherent structure that is easy to navigate and maintain. Each project should have a clear directory structure and naming conventions to differentiate its components. Define the scope and objectives for each project, and break down the work into smaller, manageable tasks. By structuring your work as individual projects, you can easily collaborate with others, track progress, and isolate changes. Projects should be self-contained, with clear dependencies and documentation on how to run and use them. (Example: .Rproj in RStudio for R-based projects)
Write Readable Code	Use clear naming and comments.	Choose descriptive names for variables, functions, and files. Add comments to explain complex logic and improve code readability for others and your future self.
Follow a Style Guide	Use consistent formatting.	Adhere to a style guide to maintain consistency in your code. For R, refer to Hadley Wickham's Style Guide. For Python, follow PEP 8.
Adopt Modular Design	Break code into reusable parts.	Structure code into functions, classes, or modules for better organization, reusability, and maintenance.
Process Raw Data with Code	Use scripts for data processing.	Write code to process raw data rather than manipulating it directly in software. This keeps raw data untouched, ensures reproducibility.
Use Version Control	Track and manage code changes.	Use a version control system like Git to track code changes. Install GitHub Desktop for a beginner-friendly interface. Commit changes often, write meaningful commit messages, and use branches for feature development. See here for setting up git for users of R.
Collaborate Using Pull Requests	Review and discuss changes.	When collaborating with others, use pull requests to review changes before merging them into the main branch, ensuring code quality and consistency.
Tag Releases	Mark stable versions of code.	Use tags (e.g., `v1.0.0`) in Git to identify and organize stable releases, making them easier to reference in future development.
Test Your Code	Ensure code reliability.	Write unit tests or integration tests to catch errors early and confirm that changes do not break existing functionality.

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search