We turn images into numbers so computers can understand and work with them. This helps us teach machines to see and recognize things.
Image as numerical data (pixels, channels) in Computer Vision
Start learning this pattern below
Jump into concepts and practice - no test required
image_array = [[[R, G, B], [R, G, B], ...], # row 1 [[R, G, B], [R, G, B], ...], # row 2 ...] # R, G, B are numbers from 0 to 255 representing colors
Images are stored as 3D arrays: height x width x channels.
Each pixel has values for Red, Green, and Blue channels.
pixel = [255, 0, 0] # bright red pixel
image = [[[0, 0, 0], [255, 255, 255]], [[128, 128, 128], [64, 64, 64]]] # 2x2 image
import numpy as np image_np = np.zeros((100, 100, 3), dtype=np.uint8) # 100x100 black image
This code creates a small 3x3 image with different colors. It shows how to check the image size, get a pixel's color, and find the average color of the whole image.
import numpy as np # Create a 3x3 image with 3 color channels (RGB) image = np.array([ [[255, 0, 0], [0, 255, 0], [0, 0, 255]], # red, green, blue [[255, 255, 0], [0, 255, 255], [255, 0, 255]], # yellow, cyan, magenta [[0, 0, 0], [128, 128, 128], [255, 255, 255]] # black, gray, white ], dtype=np.uint8) # Print shape of image print(f"Image shape: {image.shape}") # Access pixel at row 1, column 2 pixel = image[1, 2] print(f"Pixel at (1,2): {pixel}") # Calculate average color of the image avg_color = image.mean(axis=(0,1)) print(f"Average color (RGB): {avg_color.astype(int)}")
Pixel values usually range from 0 to 255 for each color channel.
Images can have more channels, like an alpha channel for transparency.
Converting images to numbers is the first step before feeding them to machine learning models.
Images are stored as numbers in arrays with height, width, and color channels.
Each pixel has values for red, green, and blue colors.
Understanding this helps us prepare images for machine learning tasks.
Practice
Solution
Step 1: Understand pixel representation in color images
Each pixel stores values for red, green, and blue channels to show color.Step 2: Compare options to pixel data
Only A set of numbers for red, green, and blue colors correctly describes pixels as sets of RGB numbers.Final Answer:
A set of numbers for red, green, and blue colors -> Option DQuick Check:
Pixel = RGB values [OK]
- Thinking pixels store text labels
- Confusing pixel with brightness only
- Assuming pixels represent sound
Solution
Step 1: Recall numpy zeros syntax
np.zeros requires a single tuple argument for shape, like (3, 3, 3).Step 2: Check each option's syntax
image = np.zeros((3, 3, 3)) uses correct tuple and function call syntax. Others have syntax errors or missing np.Final Answer:
image = np.zeros((3, 3, 3)) -> Option AQuick Check:
np.zeros((3,3,3)) creates 3x3 RGB image [OK]
- Passing multiple arguments instead of a tuple
- Using square brackets instead of parentheses
- Forgetting np. prefix
import numpy as np
image = np.array([[[255, 0, 0], [0, 255, 0]],
[[0, 0, 255], [255, 255, 0]]])
print(image.shape)What is the output?
Solution
Step 1: Analyze the array structure
The array has 2 rows, each with 2 pixels, each pixel has 3 color values (RGB).Step 2: Determine shape order
Shape is (height=2, width=2, channels=3), so (2, 2, 3).Final Answer:
(2, 2, 3) -> Option CQuick Check:
Shape = (rows, cols, channels) = (2, 2, 3) [OK]
- Mixing up dimensions order
- Counting channels as first dimension
- Assuming square shape without checking
green_channel = image[:, :, 1:2]
Solution
Step 1: Understand slicing with 1:2
Slicing with 1:2 keeps the channel dimension, returning shape (height, width, 1).Step 2: Compare with expected 2D array
To get a 2D array, use index 1 without slice, like image[:, :, 1].Final Answer:
It returns a 3D array instead of 2D -> Option AQuick Check:
Slicing with 1:2 keeps channel dim [OK]
- Using slice returns extra dimension
- Confusing channel indices
- Assuming it changes original image
Solution
Step 1: Understand the goal
We want to create a 3D array where each pixel's grayscale value repeats in 3 channels.Step 2: Check each method
rgb_image = np.stack([gray_image]*3, axis=2) stacks the grayscale image 3 times along new channel axis correctly. rgb_image = np.repeat(gray_image, 3) repeats flattening data, wrong shape. rgb_image = gray_image.reshape(100, 100, 3) reshapes without adding channels, causing error. rgb_image = np.concatenate(gray_image, 3) has wrong syntax.Final Answer:
rgb_image = np.stack([gray_image]*3, axis=2) -> Option BQuick Check:
Stack repeats grayscale across channels [OK]
- Using np.repeat without axis
- Reshaping without adding channel dimension
- Wrong function syntax for concatenation
