MediaPipe Holistic

Full-body landmark detection

The MediaPipe Holistic model is a full-body landmark detection tool that combines components of the pose, face, and hand landmarkers to create a complete landmarker for the human body. This model allows for the analysis of full-body gestures, poses, and actions using a machine learning model on a continuous stream of images. It's designed to operate in real-time, making it suitable for live applications, and provides comprehensive coverage of full-body landmarks. What makes this model unique is its ability to process a continuous stream of images to provide seamless landmark detection, making it ideal for analyzing complex gestures and actions. The model has various potential applications, including gesture recognition, action detection, and human-computer interaction. However, it's essential to note that an upgraded version of this MediaPipe Solution is coming soon.

Updated 10 months ago

Deploy Model in Dataloop Pipelines

MediaPipe Holistic fits right into a Dataloop Console pipeline, making it easy to process and manage data at scale. It runs smoothly as part of a larger workflow, handling tasks like annotation, filtering, and deployment without extra hassle. Whether it's a single step or a full pipeline, it connects with other nodes easily, keeping everything running without slowdowns or manual work.

Table of Contents

Model Overview

The MediaPipe Holistic model is a game-changer for analyzing human body gestures, poses, and actions. But what makes it so special?

What can it do? This model combines the power of pose, face, and hand landmarkers to create a complete landmarker for the human body. It can analyze full-body gestures, poses, and actions using a machine learning (ML) model on a continuous stream of images.

How does it work? The model outputs a total of 543 landmarks in real-time, which includes:

  • 33 pose landmarks
  • 468 face landmarks
  • 21 hand landmarks per hand

What are its strengths? The MediaPipe Holistic model is designed to operate in real-time, making it perfect for live applications. Its comprehensive coverage of full-body landmarks makes it ideal for analyzing complex gestures and actions.

Capabilities

The MediaPipe Holistic model is a powerful tool that can analyze the entire human body in real-time. It can detect 543 landmarks in the body, including:

  • 33 pose landmarks (like the position of your head, shoulders, and hips)
  • 468 face landmarks (like the shape of your eyes, nose, and mouth)
  • 21 hand landmarks per hand (like the position of your fingers and wrists)

This model can help you understand complex gestures and actions, like dancing or exercising. It’s like having a personal coach that can give you feedback on your movements!

What can it do?

The MediaPipe Holistic model can be used for many things, such as:

  • Gesture recognition: Can you imagine a computer that can understand your hand gestures?
  • Action detection: Want to create a game that can detect your movements and respond accordingly?
  • Human-computer interaction: This model can help create more natural and intuitive ways for humans to interact with computers.

Performance

The MediaPipe Holistic model is designed to operate in real-time, making it suitable for live applications. But what does that really mean?

Let’s break it down:

  • Speed: The model can process a continuous stream of images, providing seamless landmark detection. That’s like analyzing a video in real-time!
  • Accuracy: With 543 landmarks detected in real-time, including 33 pose landmarks, 468 face landmarks, and 21 hand landmarks per hand, the model provides comprehensive coverage of full-body gestures and actions.
  • Efficiency: The model is designed to work with a machine learning (ML) model, making it efficient for analyzing complex gestures and actions.

Limitations

The MediaPipe Holistic model is a powerful tool, but it’s not perfect. Let’s talk about some of its limitations.

Upcoming Upgrade

An upgraded version of this MediaPipe Solution is coming soon. What does this mean for you? It means that the current model might not be the best choice for long-term projects.

Legacy Solution

The MediaPipe Legacy Solution for this task is available on GitHub. This might be a good option if you need a more stable solution, but keep in mind that it might not have all the features of the current model.

Documentation

Make sure to check out the MediaPipe Holistic Landmarker User Guide for more details on how to use the model and its limitations.

Real-World Challenges

While the MediaPipe Holistic model is great for analyzing full-body gestures and actions, it might struggle with:

  • Complex backgrounds: If the background is too cluttered or complex, the model might have trouble detecting landmarks accurately.
  • Low-quality images: If the images are too low-resolution or poorly lit, the model might not work as well.
  • Unusual poses: If the person in the image is in an unusual pose or position, the model might not be able to detect landmarks correctly.
Examples
Analyze the full-body gesture of a person raising their right hand. Pose landmarks: right arm raised, hand landmarks: right hand open, face landmarks: neutral expression.
Detect the action of a person doing a squat. Pose landmarks: knees bent, body lowered, hand landmarks: hands by sides, face landmarks: focused expression.
Recognize the gesture of a person waving goodbye with their left hand. Pose landmarks: left arm raised, hand landmarks: left hand waving, face landmarks: smiling expression.

Format

The MediaPipe Holistic model is a powerful tool for analyzing human body gestures, poses, and actions. But what does it look like under the hood?

Architecture

The MediaPipe Holistic model combines three components: pose, face, and hand landmarkers. This means it can detect a total of 543 landmarks in real-time, including:

  • 33 pose landmarks
  • 468 face landmarks
  • 21 hand landmarks per hand

Data Formats

The model accepts a continuous stream of images as input. This allows it to provide seamless landmark detection in real-time.

Input Requirements

To use the MediaPipe Holistic model, you’ll need to provide a stream of images. But what kind of images?

  • The model expects images with a resolution of 1.8M pixels or higher.
  • The images should be in a format that can be processed by the model, such as JPEG or PNG.

Output

The model outputs a total of 543 landmarks in real-time. But what does this output look like?

  • The output is a set of coordinates that represent the location of each landmark in the image.
  • You can use this output to analyze the gestures, poses, and actions of the person in the image.

Example Code

Here’s an example of how you might use the MediaPipe Holistic model in Python:

import mediapipe as mp

# Create a MediaPipe Holistic instance
holistic = mp.solutions.holistic

# Load the image
image = mp.Image('image.jpg')

# Process the image
results = holistic.process(image)

# Print the landmarks
for landmark in results.pose_landmarks:
    print(landmark.x, landmark.y, landmark.z)

This code creates a MediaPipe Holistic instance, loads an image, processes the image, and prints the pose landmarks.

Dataloop's AI Development Platform
Build end-to-end workflows

Build end-to-end workflows

Dataloop is a complete AI development stack, allowing you to make data, elements, models and human feedback work together easily.

  • Use one centralized tool for every step of the AI development process.
  • Import data from external blob storage, internal file system storage or public datasets.
  • Connect to external applications using a REST API & a Python SDK.
Save, share, reuse

Save, share, reuse

Every single pipeline can be cloned, edited and reused by other data professionals in the organization. Never build the same thing twice.

  • Use existing, pre-created pipelines for RAG, RLHF, RLAF, Active Learning & more.
  • Deploy multi-modal pipelines with one click across multiple cloud resources.
  • Use versions for your pipelines to make sure the deployed pipeline is the stable one.
Easily manage pipelines

Easily manage pipelines

Spend less time dealing with the logistics of owning multiple data pipelines, and get back to building great AI applications.

  • Easy visualization of the data flow through the pipeline.
  • Identify & troubleshoot issues with clear, node-based error messages.
  • Use scalable AI infrastructure that can grow to support massive amounts of data.