Data Versioning
  • 01 May 2023
  • Dark
    Light
  • PDF

Data Versioning

  • Dark
    Light
  • PDF

Article Summary

Dataloop’s powerful data versioning provides you with unique tools for data management.
Clone, merge, and slice & dice your files to create multiple versions for various applications. Sample use cases include:

  • Golden training sets management
  • Reproducibility (dataset training snapshot)
  • Experimentation (creating subsets from different kinds)
  • Task or assignment management

Data Version "Snapshot"

Use the versioning feature to save data (items, annotations, metadata, etc.) before any major process. For example, data versioning can serve as a rollback mechanism to original datasets in case of an error without losing the data.

Cloning Dataset Items

Clone dataset items with annotations or metadata. It does not clone the Item status, such as approved, completed, discarded, etc.

Important:
  1. Cloned datasets are created with the same recipe as the original ones.
  2. Do not make any changes to items while cloning is in progress. For example, add, edit, or delete annotations, or move items, etc.

Cloning Entire Datasets

You can clone the entire datasets by following the instructions:

  1. From the left portal menu, select Data Management > Datasets.
  2. Choose the Dataset from the list and click on the ellipsis icon.
  1. Click Clone Dataset from the list.
  2. From the Clone Dataset/Items window, choose to which dataset you want to clone the items:
    1. Existing Dataset:
      1. Select a dataset from the list.
      2. Search and select the folder in the dataset you want the item to be cloned to (root folder, subfolders, etc.).
    2. New Dataset: Enter a name for the new dataset.
  3. Choose the cloning options:
    1. Clone with item annotations
    2. Clone with item metadata
  4. Click Clone.

Cloning Items

Dataloop allows cloning Items into the target datasets.

You can clone items only from internal storage (for example, Dataloop cloud storage) to internal storage or from external storage (for example, S3) to external storage that uses the same storage driver (for example, using the same integration secret and storage driver pointing at the same location).

To clone an item, follow the steps:

  1. From the left portal menu, select the Data Management > Datasets.
  2. Click on a dataset from the list.
  3. Use any of the following options:
    1. Right-click a single or multiple item(s) and select Clone from the list.
    2. Select single or multiple item(s) and click Clone Dataset icon.
    3. Click Clone Dataset icon to clone all items in the dataset.
  1. From the Clone Dataset/Items window, choose to which dataset you want to clone the items:
    1. Existing Dataset:
      1. Select a dataset from the list.
      2. Search and select the folder in the dataset you want the item to be cloned to (root folder, subfolders, etc.).
    2. New Dataset: Enter a name for the new dataset.
  2. Choose the cloning options:
    1. Clone with item annotations
    2. Clone with item metadata
  3. Click Clone.

Merge Datasets

Dataset merging outcome depends on how similar or different the datasets are.

  • Cloned Datasets – items, annotations, and metadata will be merged. It means that you will see annotations from different datasets on the same item.

Merging items from a cloned datasets is only possible if the items being merged were cloned from the same master item, i.e., the cloned items must both point to the same reference.

  • Different datasets (not clones) with similar recipes: Items will be summed up, and related items will be duplicated.
  • Datasets with different recipes: Datasets with different default recipes cannot be merged. Use the Switch recipe option on the dataset level (ellipsis icon) to match recipes between datasets, and be able to merge them.

To merge datasets, follow the instructions:

  1. From the left portal menu, select Data Management > Datasets.
  2. Select the datasets from the list.
  3. Click the Merge icon.
  4. In the Merge Datasets window, enter a Name for the newly merged dataset.
  5. Select whether to merge With Items Annotations and/or With Items Metadata (i.e., with information entered by annotators).

Once the merge is completed successfully, the new dataset is added to the list with Dataset type as Merge.


What's Next