Overview & Features
  • 18 Apr 2023
  • Dark
    Light
  • PDF

Overview & Features

  • Dark
    Light
  • PDF

Article Summary

Dataloop brings enterprise level performances for unstructured data management and versioning. Enables sub-second queries on millions of files by item attributes, item metadata, or user metadata.

Data Management Features

  • Cloud native: Ingest and sync from popular cloud storage providers, such as AWS, GCP, Azure, etc.
  • Dataloop storage: Optionally, upload file binaries to Dataloop to store them on its GCP-based storage.
  • Linked items: Create URL items without storing them on the Dataloop platform or even connecting to cloud storage.
  • Metadata layer: Every item has metadata that is populated automatically with item-attributes when the item is added to a dataset. User metadata can be added anytime.
  • DQL: Dataloop Query Language allows querying by:
    • Item attributes: Mime type, file name, creation/update time, size, etc.
    • Item metadata: Annotations, labels & attributes added to items, users working on items, etc.
    • User metadata: Any context added to the item metadata, such as order-number, GEO location, camera number, etc.
  • Performance: Sub-second queries on millions of files by item attributes, item metadata, or user metadata.
  • Version control: Clone and Merge actions to version the data accordingly with the model version.
  • Privacy: Meet data privacy standards
  • Browser: Browse data from a user-friendly interface and it supports different view options.
    • Thumbnails view with adjustable thumbnail size
    • List view with file details
    • Filters based on item-attributes, item metadata and user-etadata
    • Direct DQL queries
    • Save and reuse DQL queries
    • Folders management: Create, rename, or delete folders
    • File management: Move between folders, clone, delete
    • Create models from selected data
    • Create annotation or QA tasks from a selected data
    • Trigger the selected data to a function (FaaS) or Pipeline
    • View item metadata
    • Item function executions log
    • Export data (Item JSON file)
    • Upload data (when using File system storage)
  • Developer tools: All Data-management actions are available from API and SDK interfaces, such as DQL filters, versioning control, import, export, etc.

Data Management Specifications

Recommended
Items in DatasetMaximum 300,000 items
Annotations in DatasetMaximum 200,000 items*
Items cloneLimited to 20,000 items
Items mergeLimited to 20,000 items
Storage20 GB
Filter items by annotation data (label/attribute)Up-to 200,000 Annotations per dataset

** More than 200,000 annotations can be stored and queried in a dataset, if the query is with an annotation entity only. 200,000 is the limit when a query is with both item and annotation entities*.

Cloud Providers & Features

Cloud ProviderResource TypeIntegration Type
AWSS3 BucketCross Account
AWSS3 BucketAccess Key
AWSS3 BucketSTS
GCPGCS BucketPrivate Key
AzureBlobClient Secret
AzureDatalake Gen2Client Secret
  • Dataloop supports sub-folder specific access in buckets, which offers security and versatility in managing your data

Supported File Formats

TypeRecommended
ImageJPG, JPEG, PNG, TIFF
VideoWEBM, MP4
AudioWAV, MP3, OGG, FLAC, M4A, AAC
LidarPCD
NLP / NERTXT, JSON, EML, PDF

Image Item

Recommended
Size15 MB
FormatsSee supported formats
Resolution24.0 MP
Total annotations1,000
Image EXIF setting

Browsers can align rotated images when you view them in the annotation studio. To avoid unwanted rotations, ensure the EXIF values on your images are as intended.

Valid EXIF values are 1 to 8. Values outside this range (for example: 0) are ignored.
For more information read here.

Video Item

Recommended
LengthUnder 5 minutes and 30 FPS
Recommended formatWEBM
Recommended storageDataloop Storage
Recommended video size50 MB
Total annotationsUp to 200

Storing videos on external cloud storage (S3/GCS/Azure) may result in latencies when serving the video to annotators. For optimal video annotation performances, it is recommended to store video files on the Dataloop storage.


What's Next