- 18 Apr 2023
Overview & Features
- Updated On 18 Apr 2023
Dataloop brings enterprise level performances for unstructured data management and versioning. Enables sub-second queries on millions of files by item attributes, item metadata, or user metadata.
Data Management Features
- Cloud native: Ingest and sync from popular cloud storage providers, such as AWS, GCP, Azure, etc.
- Dataloop storage: Optionally, upload file binaries to Dataloop to store them on its GCP-based storage.
- Linked items: Create URL items without storing them on the Dataloop platform or even connecting to cloud storage.
- Metadata layer: Every item has metadata that is populated automatically with item-attributes when the item is added to a dataset. User metadata can be added anytime.
- DQL: Dataloop Query Language allows querying by:
- Item attributes: Mime type, file name, creation/update time, size, etc.
- Item metadata: Annotations, labels & attributes added to items, users working on items, etc.
- User metadata: Any context added to the item metadata, such as order-number, GEO location, camera number, etc.
- Performance: Sub-second queries on millions of files by item attributes, item metadata, or user metadata.
- Version control: Clone and Merge actions to version the data accordingly with the model version.
- Privacy: Meet data privacy standards
- Browser: Browse data from a user-friendly interface and it supports different view options.
- Thumbnails view with adjustable thumbnail size
- List view with file details
- Filters based on item-attributes, item metadata and user-etadata
- Direct DQL queries
- Save and reuse DQL queries
- Folders management: Create, rename, or delete folders
- File management: Move between folders, clone, delete
- Create models from selected data
- Create annotation or QA tasks from a selected data
- Trigger the selected data to a function (FaaS) or Pipeline
- View item metadata
- Item function executions log
- Export data (Item JSON file)
- Upload data (when using File system storage)
- Developer tools: All Data-management actions are available from API and SDK interfaces, such as DQL filters, versioning control, import, export, etc.
Data Management Specifications
|Items in Dataset||Maximum 300,000 items|
|Annotations in Dataset||Maximum 200,000 items*|
|Items clone||Limited to 20,000 items|
|Items merge||Limited to 20,000 items|
|Filter items by annotation data (label/attribute)||Up-to 200,000 Annotations per dataset|
** More than 200,000 annotations can be stored and queried in a dataset, if the query is with an annotation entity only. 200,000 is the limit when a query is with both item and annotation entities*.
Cloud Providers & Features
|Cloud Provider||Resource Type||Integration Type|
|AWS||S3 Bucket||Cross Account|
|AWS||S3 Bucket||Access Key|
|GCP||GCS Bucket||Private Key|
|Azure||Datalake Gen2||Client Secret|
- Dataloop supports sub-folder specific access in buckets, which offers security and versatility in managing your data
Supported File Formats
|Image||JPG, JPEG, PNG, TIFF|
|Audio||WAV, MP3, OGG, FLAC, M4A, AAC|
|NLP / NER||TXT, JSON, EML, PDF|
|Formats||See supported formats|
Browsers can align rotated images when you view them in the annotation studio. To avoid unwanted rotations, ensure the EXIF values on your images are as intended.
Valid EXIF values are 1 to 8. Values outside this range (for example: 0) are ignored.
For more information read here.
|Length||Under 5 minutes and 30 FPS|
|Recommended storage||Dataloop Storage|
|Recommended video size||50 MB|
|Total annotations||Up to 200|
Storing videos on external cloud storage (S3/GCS/Azure) may result in latencies when serving the video to annotators. For optimal video annotation performances, it is recommended to store video files on the Dataloop storage.