DataLad can create DataLad datasets using any data files published on the web. But the one-time import of data isn't enough, which is why DataLad can be automated to monitor such data sources and incorporate any modifications made to them over time — thus enabling the easy publication and maintenance of entire distributions of datasets.
Using this automated process, the DataLad team maintains data trackers for a number of popular public data portals. These datasets, some automatically generated and others manually created and curated, are collated into a DataLad super-dataset that is published publicly in its entirety at http://datasets.datalad.org. This super-dataset establishes the official DataLad data distribution that is available via the DataLad resource identifier ///. Some of these datasets (e.g. ///crcns) require authentication credentials, but — other than the supplying of those credentials — access to all resources is completely uniform regardless of the data's origin. DataLad also aggregates all relevant metadata for these datasets — so they can be discovered using DataLad's search.
At present, DataLad's super-dataset offers uniform access to over 10TB of scientific data. This includes the following datasets, listed by their DataLad resource identifiers for use with the datalad clone command:
- OpenFMRI — ///openfmri
- NeuroVault — ///neurovault
- International Neuroimaging Data-sharing Initiative (INDI) — ///indi
- Healthy Brain Network Serial Scanning Initiative (HBN-SSI) — ///hbnssi
- Data sharing for Collaborative Research in Computational Neuroscience (CRCNS.org) — ///crcns
- several individual research labs — ///labs
- and many more ... — ///