Jump into concepts and practice - no test required
or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Recall & Review
beginner
What is depth estimation in computer vision?
Depth estimation is the process of determining the distance of objects from the camera or observer in an image or video.
Click to reveal answer
beginner
Name two common methods used for depth estimation.
Two common methods are stereo vision (using two cameras) and monocular depth estimation (using one camera with AI models).
Click to reveal answer
intermediate
Why is stereo vision effective for depth estimation?
Stereo vision uses two cameras to capture images from slightly different angles, allowing the system to calculate depth by comparing differences between the images, similar to how human eyes work.
Click to reveal answer
intermediate
What role do AI models play in monocular depth estimation?
AI models learn patterns from many images with known depths to predict depth from a single image, even without multiple camera views.
Click to reveal answer
beginner
What is a common output format of depth estimation models?
The output is usually a depth map, which is a grayscale image where each pixel's brightness represents the distance of that point from the camera.
Click to reveal answer
What does a depth map represent?
AColor of objects in the image
BSpeed of moving objects
CDistance of objects from the camera
DTexture details of objects
✗ Incorrect
A depth map shows how far each point in the image is from the camera.
Which method uses two cameras to estimate depth?
AStereo vision
BSemantic segmentation
CMonocular depth estimation
DImage classification
✗ Incorrect
Stereo vision uses two cameras to compare images and calculate depth.
Why is monocular depth estimation challenging?
ABecause it uses two cameras
BBecause it only works outdoors
CBecause it needs 3D sensors
DBecause it requires AI to guess depth from one image
✗ Incorrect
Monocular depth estimation must predict depth from a single image, which is harder and needs AI learning.
What is the main input for depth estimation?
AImages or videos
BAudio signals
CText data
DTemperature readings
✗ Incorrect
Depth estimation uses images or videos as input to find distances.
Which of these is NOT a use of depth estimation?
A3D modeling
BSpeech recognition
CRobot navigation
DAugmented reality
✗ Incorrect
Speech recognition does not use depth estimation; it processes audio.
Explain how stereo vision helps in estimating depth.
Think about how your two eyes help you see depth.
You got /4 concepts.
Describe the challenges and solutions of monocular depth estimation.
How can AI guess depth from just one photo?
You got /4 concepts.
Practice
(1/5)
1. What is the main goal of depth estimation in computer vision?
easy
A. To find how far objects are from the camera in an image
B. To detect colors in an image
C. To recognize faces in a photo
D. To increase image resolution
Solution
Step 1: Understand depth estimation purpose
Depth estimation aims to measure distance from the camera to objects in an image.
Step 2: Compare options to definition
Only To find how far objects are from the camera in an image matches this goal; others describe different tasks.
Final Answer:
To find how far objects are from the camera in an image -> Option A
Quick Check:
Depth estimation = distance measurement [OK]
Hint: Depth estimation = measuring distance in images [OK]
Common Mistakes:
Confusing depth estimation with object detection
Thinking it finds colors or faces
Mixing it with image enhancement
2. Which of the following is the correct way to represent a depth map in Python using NumPy?
easy
A. depth_map = np.array([[0.5, 1.2], [2.3, 0.7]])
B. depth_map = np.array(["near", "far"])
C. depth_map = np.array([["red", "blue"], ["green", "yellow"]])
D. depth_map = np.array([True, False])
Solution
Step 1: Identify valid depth map data type
Depth maps store distances as numbers (floats), so arrays with floats are correct.
Step 2: Check options for numeric arrays
depth_map = np.array([[0.5, 1.2], [2.3, 0.7]]) uses floats in a 2D array, suitable for depth maps. Others use strings or booleans, which are incorrect.
Final Answer:
depth_map = np.array([[0.5, 1.2], [2.3, 0.7]]) -> Option A
Quick Check:
Depth map = numeric 2D array [OK]
Hint: Depth maps store numbers, not words or booleans [OK]
Common Mistakes:
Using strings instead of numbers for depth values
Confusing color or label arrays with depth maps
Using 1D arrays instead of 2D for images
3. Given this Python code snippet using a depth estimation model, what will be the shape of the output depth map?
Assuming the model outputs a depth map matching input image size but single channel.
medium
A. (480, 640, 3)
B. (3, 480, 640)
C. (640, 480)
D. (480, 640)
Solution
Step 1: Understand input and output shapes
The input is a color image with shape (480, 640, 3). The model outputs a depth map with one channel per pixel, so shape should be (480, 640).
Step 2: Match output shape to depth map format
Depth maps usually have height and width only, no color channels, so (480, 640) is correct.
Final Answer:
(480, 640) -> Option D
Quick Check:
Depth map shape = height x width [OK]
Hint: Depth maps have one channel, so shape drops color dimension [OK]
Common Mistakes:
Assuming output keeps 3 color channels
Swapping height and width dimensions
Confusing channel order in output
4. You run a depth estimation model but get an error: ValueError: input must be 4D tensor. What is the most likely cause?
medium
A. Model weights are not loaded
B. Output depth map has wrong shape
C. Input image is missing batch dimension
D. Input image has wrong color format
Solution
Step 1: Understand model input requirements
Many deep learning models expect input as 4D tensors: (batch_size, height, width, channels).
Step 2: Identify cause of ValueError
If input is a single image (3D), missing batch dimension causes this error.
Final Answer:
Input image is missing batch dimension -> Option C
Quick Check:
4D input = batch + image dims [OK]
Hint: Add batch dimension to input shape before model call [OK]
Common Mistakes:
Ignoring batch dimension requirement
Blaming model weights or output shape
Confusing color format with tensor shape
5. You want to improve depth estimation accuracy for a robot navigating indoors. Which approach is best?
hard
A. Use a single camera and increase image resolution only
B. Use stereo cameras and combine their images for depth
C. Use random noise as input to the model
D. Ignore depth and rely on color detection
Solution
Step 1: Consider methods to improve depth accuracy
Stereo cameras capture two views, allowing better depth calculation by comparing images.
Step 2: Evaluate options for robot navigation
Use stereo cameras and combine their images for depth uses stereo vision, which is proven to improve depth accuracy indoors. Increasing resolution alone (B) helps little. Noise input (C) and ignoring depth (D) are ineffective.
Final Answer:
Use stereo cameras and combine their images for depth -> Option B
Quick Check:
Stereo vision = better depth accuracy [OK]
Hint: Stereo cameras give real depth by comparing two views [OK]