When publishing data, merely providing file access is insufficient for a simple reason: data are not static. Released data often (and should!) continue to evolve; file formats can change, bugs will be fixed, new data are added, and derived data needs to be integrated.
While version control systems are a de-facto standard for open source software development, a similar level of tooling and culture is not present in the open data community.
DataLad builds on top of git-annex and extends it with an intuitive command-line interface. It enables users to operate on data using familiar concepts, such as files and directories, while transparently managing data access and authorization with underlying hosting providers.
A powerful and complete Python API is also provided to enable authors of data-centric applications to bring versioning and the fearless acquisition of data into continuous integration workflows.
The following people have contributed to DataLad (in alphabetical order).
- Gergana Alteva
- Horea Christian
- Jason Gors
- Yaroslav Halchenko
- Michael Hanke
- Christian Olaf Häusler
- Benjamin Poldrack
- Debanjum Singh Solanky
- Alex Waite
More information on the DataLad development community is available on GitHub.
From the Makers of
A US-German collaboration for a computational neuroscience (CRCNS) project. DataGit: converging catalogues, warehouses, and deployment logistics into a federated "data distribution" (Halchenko / Hanke), co-funded by the US National Science Foundation (NSF 1429999) and the German Federal Ministry of Education and Research (BMBF 01GQ1411).