What is a Dataset?
  • Print
  • Share
  • Dark
    Light

What is a Dataset?

  • Print
  • Share
  • Dark
    Light

Dataset

DATASET is a collection of items (files), their metadata and annotations, and it is the basic unit for managing training sets.
In Dataloop we distinguish between two types of datasets:

Master dataset
Clone dataset

Dataset

Dataset is the basic data storage and/or management unit in Dataloop, similar to storage buckets found in AWS, Azure and GCP

Master Dataset

The Master Dataset is a Dataset that holds the root storage to the items, so any item actions (e.g. delete) can remove the item completely from the Dataloop platform.
In addition, Master Datasets allow you to maintain a file system like structures, i.e. folders, sub folders, etc.
Master Datasets are usually used to manage raw data management for direct uploads.
Master Datasets are the default behavior of Dataloop.

Advanced Topic

New users can skip this section

Cloned Dataset

Cloned Datasets are a list of pointers, functioning as virtual items that do not replicate the binaries of the underlying storage once cloned or copied.
Cloned datasets do not allow file system structure; they are a flat list of items.
Cloned datasets are mainly used for:

  • Golden training sets management
  • Reproducibility (dataset training snapshot)
  • Experimentation (creating subsets from different kinds)
  • Task/Assignment management

Merged Dataset

You can merge multi datasets into a single one to better organize your data.
Merge cloned datasets to have the annotations from different datasets on the same item.

For additional information, please go to the Clone & Merge Datasets page.

Advanced Topic

New users may skip this section

Data Storage

The Dataloop platform has a flexible storage engine, which enables to attach different binary storage devices such as:

  • Cloud storage devices like GCS, S3, Elastifile etc.
  • File system storage
  • Network drives
  • Databases, such as: mongo GridFS

Each storage medium is supported through its drivers and additional drivers are continuously being added.

More in Storage .

Was This Article Helpful?