Why DataLad?

When publishing data, merely providing file access is insufficient for a simple reason: data are not static. Released data often (and should!) continue to evolve; file formats can change, bugs will be fixed, new data are added, and derived data needs to be integrated.

While version control systems are a de-facto standard for open source software development, a similar level of tooling and culture is not present in the open data community.

DataLad builds on top of git-annex and extends it with an intuitive command-line interface. It enables users to operate on data using familiar concepts, such as files and directories, while transparently managing data access and authorization with underlying hosting providers.

A powerful and complete Python API is also provided to enable authors of data-centric applications to bring versioning and the fearless acquisition of data into continuous integration workflows.

Contributors

The following people have contributed to DataLad (in alphabetical order).

  • Gergana Alteva
  • Horea Christian
  • Jason Gors
  • Yaroslav Halchenko
  • Michael Hanke
  • Christian Olaf Häusler
  • Benjamin Poldrack
  • Debanjum Singh Solanky
  • Alex Waite

More information on the DataLad development community is available on GitHub.

From the Makers of

NeuroDebian logo PyMVPA logo Studyforrest logo

Acknowledgments

A US-German collaboration for a computational neuroscience (CRCNS) project. DataGit: converging catalogues, warehouses, and deployment logistics into a federated "data distribution" (Halchenko / Hanke), co-funded by the US National Science Foundation (NSF 1429999) and the German Federal Ministry of Education and Research (BMBF 01GQ1411).

NSF logo Dartmouth logo BMBF logo OvGU logo