- 08 May 2023
WebM and Frame-Accurate Annotation
- Updated On 08 May 2023
Challenges of Frame-Accurate Video Annotations
Dataloop’s Video Tool brings pixel-accurate frame annotations to videos.
Non-Streamable Video Formats
Currently, there are many different video compression formats and video containers. Some are non-streamable and therefore annotators have to wait until the entire video downloads to their browser. Some are loosely encoded and require referencing to a different (previous or later) frame, such as the I, P, and B frames.
Time-Based vs Frame-Based
The HTML5 video component (the component browsers use to play video files) is a time-based component, but it lacks the functionality to find a specific frame-by-time specification.
Dataloop bypasses these challenges with the following simple equation:
- Frames = Duration * FPS
- FrameOfSecondX = SecondX * FPS
We recognize that the above equation does not work in several cases:
FPS changes between seconds:
- Videos in which a specific second has one frame, whereas other frames have the average FPS. Usually, these cases can be seen in corrupted videos.
- Videos in which the FPS is unstable and changes between seconds. Usually, these cases can be seen in live stream/low-quality or re-converted videos.
The number of frames is different from Duration * FPS:
- Videos where the start time is negative. Usually, these are videos that rely on I/P frames that cannot be located in the video (i.e., are not in the 0 seconds to end time range). This can be related to cutting videos into sub-videos with loose encodings, such as MP4.
- Videos where the number of frames written in the header is wrong. This is usually due to bad format conversions.
We also found that different browsers react differently. Where due to B frames, different browsers start at different frames at the same time.
Dataloop Uses WEBM
To overcome these problems and provide frame-accurate annotations, Dataloop endorses converting videos into the WebM-VP8 video compression format.
WebM media file format: The WebM format is an audiovisual media format. It offers a royalty-free alternative format that can be used in HTML5 video and audio elements. The format supports streaming and VP8 coding formats.
VP8 compression format: The VP8 format features a pure intra-mode, i.e., using only independently coded frames without temporal prediction, to enable random access in applications like video editing. VP8 enables the use of decoder implementations with a relatively small memory footprint.
Ensuring Frame-Accurate Annotations
Dataloop ensures frame-accurate annotations only on videos that are WebM-VP8 encoded. Users can upload any video format to our platform for data management purposes. Annotation accuracy is best achieved on WebM-VP8 videos.
The videos need to meet these requirements:
- Number of frames (nb_read_frames & nb_frames) = Duration * FPS
- Start time = 0
- Average frame rate = frame rate (avg_frame_rate = r_frame_rate)
WEBM Conversion Details
- Conversion to WEBM does not apply to videos uploaded to the platform. It is an optional step (Default: ON) when creating annotation tasks with video items.
- Proceeding with the WEBM conversion when creating a task will result in the installation of the WEBM conversion service in the project. Conversion-compute costs are assumed by and limited to the Projects' account.
- As a service deployed in the project, privileged users (owners, developers) can configure its resources according to expected load, such as setting auto-scalers, changing instance types, monitoring the service execution log, and more.
- The WEBM converter is shared on Dataloops' GIT, free to fork, and adapts to project-specific needs.
Once the conversion is complete, a new replace-Modality file is created and used by the Video Annotation Studio in the annotation process. The default configuration can typically handle video files smaller than 1.07 GB. For larger video files, consider upgrading the instance type to a stronger machine.
The default configuration can typically handle video files smaller than 1.07 GB. For larger video files, consider upgrading the instance type to a stronger machine.
Files that fail to pass the conversion process cannot be annotated and are effectively blocked in the annotation studio, with a corresponding message explaining the reason for the situation.
Frames and FPS Difference
WEBM files that have Frame Differance or FPS differences compared with the original file, will also show an alert in the studio and will be effectively blocked for annotations.
The threshold for alerting on frames and FPS difference can be adjusted, e.g. developers can allow for 1 or 2 frames difference, or X FPS difference, understanding the possible consequences on frame-accurate annotations.
To adjust the Frame and FPS difference:
- From the Project Overview, click on the Setting icon and select Config.
- Select the Media tab.
- Adjust the values and save.
WEBM for Linked Items
Linked items cannot be converted to WEBM, since their binaries are not intended to be on the Dataloop platform. However, the same frame-level-accuracy concerns apply for linked items that aren't in WebM format, and the corresponding messaging is shown for users in Annotation Studio.
Project managers can choose to permanently hide such WebM format warnings for linked items in their project by enabling the option "Disable WEBM format warning when using linked items" in Project settings.
Training Your Model
After uploading your video to the Dataloop platform and annotating it, you will need to follow these steps to use the frame-accurate annotations to train your model. As annotations are only accurate for the WebM file, and NOT the original file uploaded:
- Download your video file in JSON with the annotations from dataloop.
- In the JSON file you downloaded, you will find the WebM file item ID (“ref”) and URL to stream the WebM file of your video:
- Use the annotations and the WebM file to train your model.