Computer-visionHow-ToBeginner · 4 min read

How to Use MediaPipe for Pose Estimation in Computer Vision

Use mediapipe.solutions.pose to load the pose estimation model, then process images or video frames with Pose.process() to get body landmarks. Visualize or use these landmarks for applications like fitness tracking or gesture recognition.

📐

Syntax

The main steps to use MediaPipe Pose are:

Import mediapipe and cv2 for image processing.
Create a Pose object with desired parameters.
Call pose.process(image) on each image/frame to get pose landmarks.
Access results.pose_landmarks for detected keypoints.

This workflow works for both images and video streams.

python

import mediapipe as mp
import cv2

mp_pose = mp.solutions.pose
pose = mp_pose.Pose(static_image_mode=False, min_detection_confidence=0.5)

# To process an image:
# results = pose.process(image)
# landmarks = results.pose_landmarks

💻

Example

This example captures video from your webcam, runs pose estimation on each frame, and draws the detected landmarks on the video in real-time.

python

import cv2
import mediapipe as mp

mp_drawing = mp.solutions.drawing_utils
mp_pose = mp.solutions.pose

cap = cv2.VideoCapture(0)

with mp_pose.Pose(min_detection_confidence=0.5, min_tracking_confidence=0.5) as pose:
    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break

        # Convert the BGR image to RGB
        image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        image.flags.writeable = False

        # Process the image and find pose landmarks
        results = pose.process(image)

        # Convert back to BGR for rendering
        image.flags.writeable = True
        image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)

        # Draw pose landmarks on the image
        if results.pose_landmarks:
            mp_drawing.draw_landmarks(
                image, results.pose_landmarks, mp_pose.POSE_CONNECTIONS)

        cv2.imshow('MediaPipe Pose', image)

        if cv2.waitKey(5) & 0xFF == 27:  # Press ESC to exit
            break

cap.release()
cv2.destroyAllWindows()

Output

A window opens showing webcam video with pose landmarks drawn on the person in real-time.

⚠️

Common Pitfalls

Not converting image color: MediaPipe expects RGB images, but OpenCV reads BGR by default. Forgetting to convert causes wrong results.
Not setting static_image_mode properly: For video, keep it False for better performance; for single images, set True.
Ignoring detection confidence: Always check results.pose_landmarks is not None before using landmarks.
Not releasing resources: Always release video capture and destroy windows to avoid crashes.

python

import cv2
import mediapipe as mp

mp_pose = mp.solutions.pose
pose = mp_pose.Pose()

cap = cv2.VideoCapture(0)

while True:
    ret, frame = cap.read()
    if not ret:
        break

    # WRONG: Not converting BGR to RGB
    results = pose.process(frame)  # This will give poor results

    # CORRECT:
    # image_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    # results = pose.process(image_rgb)

    if results.pose_landmarks:
        print('Pose landmarks detected')

    if cv2.waitKey(5) & 0xFF == 27:
        break

cap.release()
cv2.destroyAllWindows()

📊

Quick Reference

Key parameters and methods for MediaPipe Pose:

Parameter / Method	Description
mp.solutions.pose.Pose()	Creates pose estimation model instance
static_image_mode	Set True for single images, False for video stream
min_detection_confidence	Minimum confidence to detect pose (0-1)
min_tracking_confidence	Minimum confidence to track landmarks (0-1)
pose.process(image)	Run pose estimation on RGB image
results.pose_landmarks	Detected body landmarks if any
mp.solutions.drawing_utils.draw_landmarks()	Draw landmarks and connections on image

✅

Key Takeaways

Always convert images from BGR to RGB before processing with MediaPipe Pose.

Use static_image_mode=False for video streams to improve performance.

Check if pose landmarks are detected before using them to avoid errors.

Use MediaPipe's drawing utilities to visualize pose landmarks easily.

Release video capture and close windows properly to prevent resource leaks.