Bird
Raised Fist0
Computer Visionml~5 mins

Hand and face landmark detection in Computer Vision

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Introduction

Hand and face landmark detection helps computers find important points on your hands and face. This lets machines understand gestures and expressions like a friend would.

To control a game using hand gestures without touching anything.
To add fun face filters like sunglasses or hats in a video chat app.
To help robots recognize human emotions by reading facial expressions.
To count how many fingers are raised for a sign language app.
To track hand movements for virtual reality or fitness coaching.
Syntax
Computer Vision
import mediapipe as mp

mp_hands = mp.solutions.hands
mp_face_mesh = mp.solutions.face_mesh

with mp_hands.Hands() as hands, mp_face_mesh.FaceMesh() as face_mesh:
    results_hands = hands.process(image_rgb)
    results_face = face_mesh.process(image_rgb)

This example uses the MediaPipe library, which has ready-made models for hand and face landmarks.

You need to convert your image to RGB before processing because the models expect that format.

Examples
Detect hand landmarks in a single image.
Computer Vision
import mediapipe as mp
mp_hands = mp.solutions.hands
with mp_hands.Hands() as hands:
    results = hands.process(image_rgb)
Detect face landmarks in a single image.
Computer Vision
import mediapipe as mp
mp_face_mesh = mp.solutions.face_mesh
with mp_face_mesh.FaceMesh() as face_mesh:
    results = face_mesh.process(image_rgb)
Detect both hand and face landmarks in the same image.
Computer Vision
import mediapipe as mp
mp_hands = mp.solutions.hands
mp_face_mesh = mp.solutions.face_mesh
with mp_hands.Hands() as hands, mp_face_mesh.FaceMesh() as face_mesh:
    results_hands = hands.process(image_rgb)
    results_face = face_mesh.process(image_rgb)
Sample Model

This program loads an image, detects hand and face landmarks, and prints how many were found.

Computer Vision
import cv2
import mediapipe as mp

mp_hands = mp.solutions.hands
mp_face_mesh = mp.solutions.face_mesh
mp_drawing = mp.solutions.drawing_utils

# Load an example image
image = cv2.imread('hand_face.jpg')
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

with mp_hands.Hands(static_image_mode=True, max_num_hands=2) as hands, \
     mp_face_mesh.FaceMesh(static_image_mode=True) as face_mesh:
    results_hands = hands.process(image_rgb)
    results_face = face_mesh.process(image_rgb)

    # Print number of hands detected
    num_hands = len(results_hands.multi_hand_landmarks) if results_hands.multi_hand_landmarks else 0
    print(f'Hands detected: {num_hands}')

    # Print number of face landmarks detected
    num_face_landmarks = len(results_face.multi_face_landmarks[0].landmark) if results_face.multi_face_landmarks else 0
    print(f'Face landmarks detected: {num_face_landmarks}')
OutputSuccess
Important Notes

Make sure your input image is clear and well-lit for better detection.

MediaPipe returns landmarks as points with x, y, z coordinates normalized between 0 and 1.

You can draw landmarks on images using MediaPipe's drawing utilities for visualization.

Summary

Hand and face landmark detection finds key points on hands and faces in images or videos.

This helps computers understand gestures and expressions for many fun and useful apps.

MediaPipe is a popular tool that makes it easy to detect these landmarks with just a few lines of code.

Practice

(1/5)
1. What is the main purpose of hand and face landmark detection in computer vision?
easy
A. To compress video files
B. To increase image resolution
C. To change the color of images
D. To find key points on hands and faces in images or videos

Solution

  1. Step 1: Understand the goal of landmark detection

    Landmark detection identifies important points on hands and faces to understand their shape and position.
  2. Step 2: Compare options with the goal

    Only To find key points on hands and faces in images or videos matches this goal by describing key point detection on hands and faces.
  3. Final Answer:

    To find key points on hands and faces in images or videos -> Option D
  4. Quick Check:

    Landmark detection = key points detection [OK]
Hint: Landmark detection means finding important points [OK]
Common Mistakes:
  • Confusing landmark detection with image enhancement
  • Thinking it changes image colors
  • Mixing it up with video compression
2. Which of the following is the correct way to import MediaPipe's hand landmark detection module in Python?
easy
A. import mediapipe.solutions.hands as mp_hands
B. import mediapipe.hands as mp_hands
C. import mediapipe as mp mp.solutions.hands
D. from mediapipe import hands

Solution

  1. Step 1: Recall MediaPipe import syntax

    MediaPipe modules are imported from mediapipe.solutions, e.g., mediapipe.solutions.hands.
  2. Step 2: Check each option

    import mediapipe.solutions.hands as mp_hands correctly imports mediapipe.solutions.hands as mp_hands. Others are incorrect or incomplete.
  3. Final Answer:

    import mediapipe.solutions.hands as mp_hands -> Option A
  4. Quick Check:

    Correct import = mediapipe.solutions.hands [OK]
Hint: MediaPipe modules come from mediapipe.solutions [OK]
Common Mistakes:
  • Using incorrect import paths
  • Trying to import submodules directly without solutions
  • Confusing alias names
3. Given the following Python code using MediaPipe for hand landmarks detection, what will be printed?
import mediapipe as mp
mp_hands = mp.solutions.hands
hands = mp_hands.Hands(static_image_mode=True)
results = hands.process(image_rgb)
print(len(results.multi_hand_landmarks))
Assuming image_rgb contains one clear hand.
medium
A. 1
B. Error
C. None
D. 0

Solution

  1. Step 1: Understand the code flow

    The code processes an RGB image with one hand using MediaPipe Hands in static mode.
  2. Step 2: Interpret the output

    Since one hand is present, results.multi_hand_landmarks will contain one set of landmarks, so its length is 1.
  3. Final Answer:

    1 -> Option A
  4. Quick Check:

    One hand detected = length 1 [OK]
Hint: Length of landmarks list equals number of detected hands [OK]
Common Mistakes:
  • Assuming zero when hand is present
  • Confusing None with empty list
  • Expecting error without checking input
4. You wrote this code to detect face landmarks but get an error:
import mediapipe as mp
mp_face = mp.solutions.face_mesh
face_mesh = mp_face.FaceMesh()
results = face_mesh.process(image_bgr)
print(results.multi_face_landmarks)
What is the likely cause of the error?
medium
A. Missing import for cv2
B. FaceMesh class does not exist
C. Input image should be RGB, not BGR
D. process() method requires grayscale image

Solution

  1. Step 1: Check input image format for MediaPipe FaceMesh

    MediaPipe expects RGB images, but the code uses image_bgr (BGR format).
  2. Step 2: Understand error cause

    Using BGR instead of RGB causes wrong color channels and likely errors in detection.
  3. Final Answer:

    Input image should be RGB, not BGR -> Option C
  4. Quick Check:

    MediaPipe needs RGB input images [OK]
Hint: Always convert BGR to RGB before MediaPipe processing [OK]
Common Mistakes:
  • Passing BGR images directly
  • Assuming FaceMesh class is missing
  • Thinking grayscale is required
5. You want to build a gesture recognition app using hand landmarks. Which approach best improves accuracy when hands are rotated or partially hidden?
hard
A. Only train on perfectly centered and clear hand images
B. Use data augmentation with rotated and occluded hand images during training
C. Ignore landmarks and use raw images directly
D. Use grayscale images instead of color

Solution

  1. Step 1: Understand challenges in gesture recognition

    Hands can appear rotated or partly hidden, so model must handle variations.
  2. Step 2: Choose best method to improve robustness

    Data augmentation with rotated and occluded images teaches model to recognize gestures despite changes.
  3. Final Answer:

    Use data augmentation with rotated and occluded hand images during training -> Option B
  4. Quick Check:

    Augmentation improves model robustness [OK]
Hint: Augment training data to handle rotations and occlusions [OK]
Common Mistakes:
  • Training only on perfect images
  • Ignoring landmarks reduces accuracy
  • Using grayscale loses important info