Dataloop’s powerful data versioning provides its users with unique tools for data management.
Clone, merge, and slice & dice your files to create multiple versions for various applications. Sample use cases include:
- Golden training sets management
- Reproducibility (dataset training snapshot)
- Experimentation (creating subsets from different kinds)
- Task/Assignment management
Data Version "Snap-shot"
Use our versioning feature as a way to save data (items, annotations, metadata) before any major process. For example, data versioning can serve as a roll-back mechanism to original datasets in case of an error without losing the data.
Dataset versioning tools are limited to datasets with a maximum number of 20,000 files. We’re working hard to expand this to 20M in the near future.
Dataset merging outcome depends on how similar or different the datasets are.
- Cloned Datasets – items, annotations, and metadata will be merged. This means that you will see annotations from different datasets on the same item.
Merging items that belong to cloned datasets is only possible if the items being merged were cloned from the same master item, i.e., the cloned items should both point to the same reference.
- Different datasets (not clones) with similar recipes – items will be summed up and there will be duplication of similar items.
- Datasets with different recipes – Datasets with different default recipes cannot be merged. Use the ‘Switch recipe’ option on dataset level (3-dots action button) to match recipes between datasets, and be able to merge them.
To merge datasets follow these steps:
- Go to your dataset list.
- Select the datasets you would like to merge.
- Click the "Merge" button on the right.
- In the dialog box, enter a name for the newly merged dataset and select whether you would like to merge with item annotations and/or metadata (i.e., with information entered by annotators).
Once the merge is completed successfully, you should see the new dataset added to the list with a "Merged" icon next to it.
Cloning Dataset / Items
Clone dataset items with annotations or metadata. Item status won't be cloned (approved/completed/discarded).
- Cloned datasets are created with the same recipe as the original ones
- Refrain from changes to items while cloning is in progress (e.g. dont add/edit/delete annotations, move items etc)
Cloning Entire Datasets
Cloning entire datasets can be done by selecting the “Clone Dataset” option from the dataset 3-dot action button. Enter in the new dialogue box the cloned dataset's name and whether you wish to clone items with annotations or metadata.
Items can also be cloned into target datasets.
Items can only be cloned from internal storage (i.e., the Dataloop cloud storage) to internal storage or from external storage (for example S3) to an external storage based on the same storage-driver (e.g. using the same integration secret and storage driver pointing at the same location).
To clone an item, follow these steps:
- Browse a dataset
- Use any of the following options
- Right click a single item to clone
- Select multiple items and use the Clone button
- Use the clone button to clone all items - entire dataset or current active query.
- Select the target dataset you wish the item to be cloned to.
- Select the folder in the dataset you wish the item to be cloned to (root folder, subfolders, etc.).
- Select whether you wish the item to be cloned with its annotations and metadata.
- Click “CLONE.”