DataLad logo DataLad logo DataLad logo DataLad logo DataLad logo DataLad logo

distributed data management

free and open source

  Get DataLad

What is DataLad?

DataLad is a free and open source distributed data management system that keeps track of your data, creates structure, ensures reproducibility, supports collaboration, and integrates with widely used data infrastructure.

Install DataLad

Install DataLad and its dependencies, Git and git-annex, on all major operating systems using Python and the datalad-installer:

$ pip install datalad-installer
$ datalad-installer git-annex -m datalad/packages
$ pip install datalad

Depending on your operating system, other installation options are also possible. For detailed instructions on all installation and procedures and further configuration, please visit the DataLad Handbook

DataLad is part of the Debian and Ubuntu operating systems and available on CentOS, Redhat, Fedora, and similar systems. DataLad can be installed or upgraded via conda and apt:

Using conda:

$ conda install -c conda-forge datalad

Using apt:

$ sudo apt-get install datalad

Find out more about Linux installation in the DataLad Handbook

DataLad is available via OS X’s homebrew package manager or alternatively via conda:

Using conda:

$ conda install -c conda-forge datalad

Using homebrew:

$ brew install datalad

Find out more about macOS installation in the DataLad Handbook

On a Windows machine with Python, the best route for installing DataLad is to install its dependencies with the datalad-installer and then follow up with pip:

$ pip install datalad-installer
$ datalad-installer git-annex -m datalad/packages
$ pip install datalad

Find out more about Windows installation in the DataLad Handbook

Keep Track

Building on top of Git and git-annex, DataLad allows you to version control arbitrarily large files in datasets, without the need for custom data structures, central infrastructure, or third party services.

  •   Track changes to your data
  •   Revert to previous versions
  •   Capture full provenance records
  •   Ensure complete reproducibility
DataLad version control
DataLad nested datasets

Create Structure

A DataLad dataset is a directory with files, managed by DataLad. You can link other datasets, known as subdatasets, and perform commands recursively across an arbitrarily deep hierarchy of datasets. This helps you to create structure while maintaining advanced provenance capture abilities, versioning, and actionable file retrieval.

Use DataLad

DataLad is a free and open source command line tool with a Python API and is compatible with all major operating systems. Use DataLad to:

  •   create new datasets locally
  •   clone other datasets
  •   get content on-demand
  •   save changes to datasets
  •   drop content as needed
  •   push changes to a remote location

... and much more!

  Try out DataLad
Computer console

datalad create my_dataset

datalad save -m "hello world"

datalad push --to location


datalad clone location

datalad get example.txt

datalad drop example.txt

DataLad collaboration

Collaborate

DataLad lets you consume datasets provided by others, and collaborate with them. You can install existing datasets and update them from their sources, or create sibling datasets that you can publish updates to and pull updates from. The collaborative power of Git, for your data.

DataLad in the Wild

DataLad is integrated with a variety of hosting services and data management platforms, and extended and used by a diverse community. Export datasets to third party services such as GitHub or Figshare with built-in commands. Extend DataLad to be compatible with your preferred data supplier or workflow. Or use a multitude of other DataLad-compatible services such as Dropbox or Amazon S3. Search through all integrations, extensions, and use cases to find the right fit for your data!

  Browse use cases
DataLad integrations and extensions
DataLad learning

Learn More

DataLad is not solely a data management system, but also an open source community of users, developers, and researchers all contributing to its growth. To support this community, DataLad maintains several important resources:

Install
DataLad

Install DataLad and its dependencies on Linux, macOS, or Windows

DataLad
Handbook

Become an expert DataLad user with this rich educational resource

Community
Chat

Join the community on Matrix, say hi, and ask questions

Technical
Forum

Get help from DataLad experts and users to solve your challenges

DataLad on
GitHub

Contribute via GitHub by creating issues or sending a pull request

Developer
Docs

Dive into the DataLad API with the developer documentation

DataLad
Tutorials

Hands-on tutorials and videos to help you on your DataLad journey

Supporting DataLad

DataLad development is funded as a US-German project on collaborative research, with primary funding from the US National Science Foundation (NSF 1912266, NSF 1429999) and the German Federal Ministry of Education and Research (BMBF 01GQ1905, BMBF 01GQ1411). Additional support has been provided by the US National Institute of Biomedical Imaging and Bioengineering (NIH 1P41EB019936-01A1) via ReproNim, the European Union’s Horizon 2020 research and innovation programme under (945539, 826421), and the German federal state of Saxony-Anhalt and the European Regional Development Fund.

DataLad funding
DataLad citation

Citing DataLad

Please cite the following article when referring to DataLad in publications:

Yaroslav O. Halchenko, Kyle Meyer, Benjamin Poldrack, Debanjum Singh Solanky, Adina S. Wagner, Jason Gors, Dave MacFarlane, Dorian Pustina, Vanessa Sochat, Satrajit S. Ghosh, Christian Mönch, Christopher J. Markiewicz, Laura Waite, Ilya Shlyakhter, Alejandro de la Vega, Soichi Hayashi, Christian Olaf Häusler, Jean-Baptiste Poline, Tobias Kadelka, Kusti Skytén, Dorota Jarecka, David Kennedy, Ted Strauss, Matt Cieslak, Peter Vavra, Horea-Ioan Ioanas, Robin Schneider, Mika Pflüger, James V. Haxby, Simon B. Eickhoff, and Michael Hanke, (2021). DataLad: distributed system for joint management of code, data, and their relationship. Journal of Open Source Software, 6(63), 3262, 10.21105/joss.03262

  Copy   BibTex   RIS