When publishing data, merely providing file access is insufficient for a simple reason: data are not static. Data often (and should!) continue to evolve; file formats can change, bugs will be fixed, new data are added, and derived data needs to be integrated.
While version control systems are a de-facto standard for open source software development, a similar level of tooling and culture is not present in the open data community.
DataLad builds on top of git-annex and extends it with an intuitive command-line interface. It enables users to operate on data using familiar concepts, such as files and directories, while transparently managing data access and authorization with underlying hosting providers.
A powerful and complete Python API is also provided to enable authors of data-centric applications to bring versioning and the fearless acquisition of data into continuous integration workflows.
So far, more than 30 individuals have contributed to the development of DataLad. See an up-to-date list of them on GitHub. In addition, there are a number of DataLad extensions packages. Some of them, and their contributors, can be found on GitHub too. Additionally, the DataLad Handbook, a versatile educational resource on (research) data management, has also received contribution by over 35 individuals.
How to cite DataLad?
When referring to DataLad in a publication, please cite:
Halchenko et al., (2021). DataLad: distributed system for joint management of code, data, and their relationship. Journal of Open Source Software, 6(63), 3262. https://doi.org/10.21105/joss.03262
DataLad development is funded as a US-German project on collaborative research in computational neuroscience (CRCNS):
- DataLad - a decentralized system for integrated discovery, management, and publication of digital objects of science (Halchenko / Hanke), co-funded by the US National Science Foundation (NSF 1912266) and the German Federal Ministry of Education and Research (BMBF 01GQ1905).
- DataGit: converging catalogues, warehouses, and deployment logistics into a federated "data distribution" (Halchenko / Hanke), co-funded by the US National Science Foundation (NSF 1429999) and the German Federal Ministry of Education and Research (BMBF 01GQ1411)
and additional support has been provided by
- Center for Reproducible Neuroimaging Computation (formerly CRNC, now ReproNim) Kennedy, funded by the US National Institute of Biomedical Imaging and Bioengineering (NIBIB) (NIH 1P41EB019936-01A1)
- European Union’s Horizon 2020 research and innovation programme under grant agreement no. 945539: Human Brain Project (SGA3).
- European Union’s Horizon 2020 research and innovation programme under grant agreement no. 826421: Virtual Brain Cloud.
- The German federal state of Saxony-Anhalt and the European Regional Development Fund (ERDF), Project: Center for Behavioral Brain Sciences.