MediaPipe Objectron

3D object detection

MediaPipe Objectron is a powerful 3D object detection solution for everyday objects. It detects objects in 2D images and estimates their poses through a machine learning model, trained on the Objectron dataset. With its ability to capture an object's size, position, and orientation in the world, it has applications in robotics, self-driving vehicles, image retrieval, and augmented reality. By combining real-world data and AR synthetic data, accuracy is increased by about 10%. The model is implemented as a MediaPipe graph, using a detection subgraph and a tracking subgraph, making it suitable for real-time object detection and tracking. While it has limitations in terms of data availability for everyday objects, it overcomes this problem with a novel data pipeline using mobile augmented reality session data.

Updated 2 years ago

Deploy Model in Dataloop Pipelines

MediaPipe Objectron fits right into a Dataloop Console pipeline, making it easy to process and manage data at scale. It runs smoothly as part of a larger workflow, handling tasks like annotation, filtering, and deployment without extra hassle. Whether it's a single step or a full pipeline, it connects with other nodes easily, keeping everything running without slowdowns or manual work.

Table of Contents

Model Overview

The MediaPipe Objectron model is a mobile real-time 3D object detection solution for everyday objects. But what does that even mean?

Imagine you’re taking a picture of your living room. Most object detection models would only be able to tell you where the objects are in the 2D image. But MediaPipe Objectron goes a step further - it can estimate the 3D pose of the objects, so it can tell you not just where they are, but also their size, position, and orientation in the world.

Capabilities

  • Detects objects in 2D images and estimates their 3D poses
  • Trained on the Objectron dataset
  • Suitable for applications in robotics, self-driving vehicles, image retrieval, and augmented reality

But how does it do it?

The model uses a machine learning (ML) pipeline to predict the 3D bounding box of an object from a single RGB image. There are two pipelines: a two-stage pipeline and a single-stage pipeline. The two-stage pipeline is faster and better for detecting a single dominant object, while the single-stage pipeline is better for detecting multiple objects.

Performance

So, how well does the model perform?

  • By combining real-world data and AR synthetic data, the model’s accuracy increases by about 10%.
  • The model can detect objects in 2D images and estimate their 3D poses with high accuracy.
  • There are two ML pipelines to choose from: a two-stage pipeline that’s 3x faster and suitable for detecting a single dominant object, and a single-stage pipeline that’s good for detecting multiple objects.
Examples
Detect the 3D pose of a chair in this image: https://example.com/image.jpg Detected chair with 3D pose: x=0.5, y=0.2, z=0.8, yaw=45, pitch=30, roll=10
Estimate the size of a book in this image: https://example.com/image2.jpg Estimated book size: width=15cm, height=20cm, depth=5cm
Track the movement of a phone in this video: https://example.com/video.mp4 Tracked phone movement: x=10, y=20, z=30, yaw=90, pitch=0, roll=0

Limitations and Applications

While the model is powerful, there are some limitations to consider:

  • There’s a lack of 3D data for everyday objects, which makes it harder to train the model.
  • The model is implemented as a MediaPipe graph, which uses a detection subgraph and a tracking subgraph.

But don’t worry, the MediaPipe Objectron team has developed a novel data pipeline using mobile augmented reality (AR) session data to overcome these limitations.

Want to learn more about the MediaPipe Objectron model? Check out the MediaPipe Objectron User Guide for more details.

Dataloop's AI Development Platform
Build end-to-end workflows

Build end-to-end workflows

Dataloop is a complete AI development stack, allowing you to make data, elements, models and human feedback work together easily.

  • Use one centralized tool for every step of the AI development process.
  • Import data from external blob storage, internal file system storage or public datasets.
  • Connect to external applications using a REST API & a Python SDK.
Save, share, reuse

Save, share, reuse

Every single pipeline can be cloned, edited and reused by other data professionals in the organization. Never build the same thing twice.

  • Use existing, pre-created pipelines for RAG, RLHF, RLAF, Active Learning & more.
  • Deploy multi-modal pipelines with one click across multiple cloud resources.
  • Use versions for your pipelines to make sure the deployed pipeline is the stable one.
Easily manage pipelines

Easily manage pipelines

Spend less time dealing with the logistics of owning multiple data pipelines, and get back to building great AI applications.

  • Easy visualization of the data flow through the pipeline.
  • Identify & troubleshoot issues with clear, node-based error messages.
  • Use scalable AI infrastructure that can grow to support massive amounts of data.