baRcodeR with PyTrackDat: Open-source labelling and tracking of biological samples for repeatable science


Repeatable experiments with accurate data collection and reproducible analyses are fundamental to the scientific method but may be difficult to achieve in practice. Several flexible, open-source tools developed for the R and Python coding environments aid the reproducibility of data wrangling and analysis in scientific research. In contrast, analogous tools are generally lacking for earlier stages, such as systematic labelling and processing of field samples with hierarchical structure (e.g. time points of individuals from multiple lines or populations) or curating heterogenous data collected by different researchers over several years. Such tools are critical for modern research given trends toward globally distributed collaborators using higher-throughput technologies. As a step toward improving repeatability of methods for the collection of biological samples, and curation of biological data, we introduce the R package baRcodeR and the PyTrackDat pipeline in Python. The baRcodeR package provides tools for generating biologically informative, hierarchical labels with digitally encoded 2D barcodes that can be printed and scanned using low-cost commercial hardware. The PyTrackDat pipeline integrates with baRcodeR output to build a web interface for sample management and tracking along with data collection and curation. We briefly describe the application of principles from baRcodeR and PyTrackDat in three large research projects, which demonstrate their value to (i) help document sampling methods, (ii) facilitate collaboration and (iii) reduce opportunities for human errors and omissions that could otherwise propagate through downstream data analysis to compromise biological inference.

In review.