WebM and Frame-Accurate Annotation
Dataloop’s Video Tool brings pixel-accurate frame annotations to videos.
Currently, there are many different video compression formats and video containers – some are non-streamable, and therefore annotators should wait until the entire video downloads to their browser, and some are loosely encoded and require referencing to a different (previous or later) frame (e.g., I, P, B frames).
The HTML5 video component (the component browsers use to play video files) is a time-based component, but it lacks the functionality to find a specific frame by time specification.
Dataloop bypasses these challenges with the following simple equation:
Frames = Duration * FPS
FrameOfSecondX = SecondX * FPS
We recognize that this equation does not work in several cases:
FPS changes between seconds:
a. Videos in which a specific second has 1 frame, where other frames have the average FPS. Usually these cases can be seen in corrupted videos.
b. Videos in which the FPS is unstable and changes between seconds. Usually these cases can be seen in live stream/low quality or re-converted videos.
The number of frames is different than Duration * FPS:
a. Videos where the start time is negative. Usually these are videos that rely on I/P frames that cannot be located in the video (i.e., are not in the 0 seconds to end time range). This can be related to cutting videos into sub videos with loose encoding, such as mp4.
b. Videos where the number of frames written in the header is wrong. This is usually due to bad format conversions.
We also found that different browsers react differently, see the following known chrome bug, where due to B frames, different browsers start at different frames at the same time.
To avoid these problems and provide frame-accurate annotations, we convert videos into WebM-VP8 video compression format.
WebM media file format – The WebM format is an audiovisual media format. It is primarily intended to offer a royalty-free alternative format that can be used in the HTML5 video and audio elements. The format supports streaming and VP8 coding format.
VP8 compression format – The VP8 format features a pure intra-mode, i.e., using only independently-coded frames without temporal prediction, to enable random access in applications like video editing. VP8 enables the use of decoder implementations with a relatively small memory footprint.
Ensuring Frame-Accurate Annotations
Dataloop can ensure frame-accurate annotations only on videos that are WebM-VP8 encoded. While users can upload any video format to our platform for the use of data management, annotation accuracy can be best achieved on WebM-VP8 videos.
The videos need to meet these requirements:
- Number of frames (nb_read_frames & nb_frames) = Duration * FPS
- Start time = 0
- Average frame rate = frame rate (avg_frame_rate = r_frame_rate)
We offer the following FFMPEG scripts for you to convert your data before uploading it to the system:
- FFprobe script – get the data as json.
- Validator script – using the FFprobe output, validate correctness of video.
- FFmpeg convert to webm script – with FFprobe output & file path, convert to webm.
Dataloop supports a variety of video file formats and offers an automatic conversion to video files that are not already in WebM format for video files up to 1.07 GB. Video files over 1.07 GB will not be converted and will be uploaded in their original format.
If you are uploading files over 1.07 GB, please read our article on Manually Converting Large Files to WebM.
Once the conversion is complete, a new modality file will be created and used by Dataloop’s Video Annotation Studio in the annotation process.
Files that fail to pass the conversion process cannot be annotated and are effectively blocked in the annotation studio, with a corresponding message shown, explaining the reason for the situation.
Training Your Model
After uploading your video to the Dataloop platform and annotating it, you will need to follow these steps in order to use the frame-accurate annotations to train your model, as annotations are only accurate with respect to the WebM file, and NOT to the original file uploaded:
- Download your video file JSON with the annotations from dataloop.
- In the JSON file you downloaded you will find the WebM file item-ID (“ref”) and URL to stream the WebM file of your video:
- Use the annotations and the WebM file to train your model.