import torch import torch.nn as nn class SimpleDepthNet(nn.Module): def __init__(self): super().__init__() self.conv = nn.Conv2d(3, 1, kernel_size=3, padding=1) def forward(self, x): return self.conv(x) model = SimpleDepthNet() input_tensor = torch.randn(1, 3, 256, 256) output = model(input_tensor) output.shape

Practice

(1/5)

1. What is the main goal of depth estimation in computer vision?

easy

A. To find how far objects are from the camera in an image

B. To detect colors in an image

C. To recognize faces in a photo

D. To increase image resolution

Solution

Step 1: Understand depth estimation purpose
Depth estimation aims to measure distance from the camera to objects in an image.
Step 2: Compare options to definition
Only To find how far objects are from the camera in an image matches this goal; others describe different tasks.
Final Answer:
To find how far objects are from the camera in an image -> Option A
Quick Check:
Depth estimation = distance measurement [OK]

Hint: Depth estimation = measuring distance in images [OK]

Common Mistakes:

Confusing depth estimation with object detection
Thinking it finds colors or faces
Mixing it with image enhancement

2. Which of the following is the correct way to represent a depth map in Python using NumPy?

easy

A. depth_map = np.array([[0.5, 1.2], [2.3, 0.7]])

B. depth_map = np.array(["near", "far"])

C. depth_map = np.array([["red", "blue"], ["green", "yellow"]])

D. depth_map = np.array([True, False])

Solution

Step 1: Identify valid depth map data type
Depth maps store distances as numbers (floats), so arrays with floats are correct.
Step 2: Check options for numeric arrays
depth_map = np.array([[0.5, 1.2], [2.3, 0.7]]) uses floats in a 2D array, suitable for depth maps. Others use strings or booleans, which are incorrect.
Final Answer:
depth_map = np.array([[0.5, 1.2], [2.3, 0.7]]) -> Option A
Quick Check:
Depth map = numeric 2D array [OK]

Hint: Depth maps store numbers, not words or booleans [OK]

Common Mistakes:

Using strings instead of numbers for depth values
Confusing color or label arrays with depth maps
Using 1D arrays instead of 2D for images

3. Given this Python code snippet using a depth estimation model, what will be the shape of the output depth map?

import numpy as np
input_image = np.zeros((480, 640, 3))  # RGB image
output_depth = model.predict(input_image)
print(output_depth.shape)

Assuming the model outputs a depth map matching input image size but single channel.

medium

A. (480, 640, 3)

B. (3, 480, 640)

C. (640, 480)

D. (480, 640)

Solution

Step 1: Understand input and output shapes
The input is a color image with shape (480, 640, 3). The model outputs a depth map with one channel per pixel, so shape should be (480, 640).
Step 2: Match output shape to depth map format
Depth maps usually have height and width only, no color channels, so (480, 640) is correct.
Final Answer:
(480, 640) -> Option D
Quick Check:
Depth map shape = height x width [OK]

Hint: Depth maps have one channel, so shape drops color dimension [OK]

Common Mistakes:

Assuming output keeps 3 color channels
Swapping height and width dimensions
Confusing channel order in output

4. You run a depth estimation model but get an error: ValueError: input must be 4D tensor. What is the most likely cause?

medium

A. Model weights are not loaded

B. Output depth map has wrong shape

C. Input image is missing batch dimension

D. Input image has wrong color format

Solution

Step 1: Understand model input requirements
Many deep learning models expect input as 4D tensors: (batch_size, height, width, channels).
Step 2: Identify cause of ValueError
If input is a single image (3D), missing batch dimension causes this error.
Final Answer:
Input image is missing batch dimension -> Option C
Quick Check:
4D input = batch + image dims [OK]

Hint: Add batch dimension to input shape before model call [OK]

Common Mistakes:

Ignoring batch dimension requirement
Blaming model weights or output shape
Confusing color format with tensor shape

5. You want to improve depth estimation accuracy for a robot navigating indoors. Which approach is best?

hard

A. Use a single camera and increase image resolution only

B. Use stereo cameras and combine their images for depth

C. Use random noise as input to the model

D. Ignore depth and rely on color detection

Solution

Step 1: Consider methods to improve depth accuracy
Stereo cameras capture two views, allowing better depth calculation by comparing images.
Step 2: Evaluate options for robot navigation
Use stereo cameras and combine their images for depth uses stereo vision, which is proven to improve depth accuracy indoors. Increasing resolution alone (B) helps little. Noise input (C) and ignoring depth (D) are ineffective.
Final Answer:
Use stereo cameras and combine their images for depth -> Option B
Quick Check:
Stereo vision = better depth accuracy [OK]

Hint: Stereo cameras give real depth by comparing two views [OK]

Common Mistakes:

Thinking higher resolution alone improves depth
Using noise as input to improve model
Ignoring depth for color detection

Depth estimation basics in Computer Vision - Practice Problems & Coding Challenges

Start learning this pattern below

Practice

Solution

Step 1: Understand depth estimation purpose

Step 2: Compare options to definition

Final Answer:

Quick Check:

Solution

Step 1: Identify valid depth map data type

Step 2: Check options for numeric arrays

Final Answer:

Quick Check:

Solution

Step 1: Understand input and output shapes

Step 2: Match output shape to depth map format

Final Answer:

Quick Check:

Solution

Step 1: Understand model input requirements

Step 2: Identify cause of ValueError

Final Answer:

Quick Check:

Solution

Step 1: Consider methods to improve depth accuracy

Step 2: Evaluate options for robot navigation

Final Answer:

Quick Check: