Bird
Raised Fist0
Computer Visionml~8 mins

Image properties (shape, dtype, size) in Computer Vision - Model Metrics & Evaluation

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Metrics & Evaluation - Image properties (shape, dtype, size)
Which metric matters for Image properties and WHY

When working with images in machine learning, knowing the image's shape, data type (dtype), and size is key. These properties help us understand the input data before training a model.

Shape tells us the image dimensions (height, width, color channels). This is important because models expect inputs of a certain size.

Dtype shows the type of data stored (like integers or floats). This affects how the image data is processed and stored in memory.

Size is the total number of elements (pixels times channels). It helps us know how much data the image holds.

Checking these properties ensures the model gets the right input format and helps avoid errors during training or prediction.

Confusion matrix or equivalent visualization

For image properties, we don't use a confusion matrix. Instead, we visualize the image shape and dtype like this:

    Image shape: (height, width, channels) = (128, 128, 3)
    Data type: uint8 (unsigned 8-bit integer)
    Size (total pixels): 128 * 128 * 3 = 49,152
    

This simple summary helps us confirm the image data is as expected.

Tradeoff: Image size vs model performance

Larger images (higher shape and size) have more detail but need more memory and time to process.

Smaller images are faster but may lose important details, hurting model accuracy.

Choosing the right image size is a balance: enough detail for the model to learn, but not so big it slows training.

Also, dtype matters: using float32 allows more precise calculations but uses more memory than uint8.

What "good" vs "bad" image properties look like

Good:

  • Shape matches model input (e.g., 224x224x3 for color images)
  • Dtype is consistent (e.g., float32 after normalization)
  • Size is manageable for your hardware

Bad:

  • Shape mismatch causing errors (e.g., grayscale image when model expects color)
  • Dtype mismatch causing wrong calculations (e.g., integers when floats needed)
  • Too large size causing memory errors or slow training
Common pitfalls with image properties
  • Ignoring shape differences leads to model errors or poor predictions.
  • Not converting dtype properly can cause unexpected results or crashes.
  • Assuming all images have the same size without resizing causes batch processing issues.
  • Overlooking the number of channels (e.g., some images have alpha channel) can confuse the model.
Self-check question

Your model expects images of shape (224, 224, 3) with dtype float32. You feed it images of shape (128, 128, 3) with dtype uint8. Is this good? Why or why not?

Answer: No, this is not good. The shape is smaller than expected, so the model may not work well or may error. Also, the dtype is uint8, but the model expects float32, so the data should be converted and normalized before use.

Key Result
Image shape, dtype, and size ensure correct input format and efficient model training.

Practice

(1/5)
1. What does the shape property of an image represent?
easy
A. The file size of the image in bytes
B. The data type of the pixel values
C. The dimensions and number of color channels of the image
D. The compression level of the image

Solution

  1. Step 1: Understand what shape means in images

    The shape of an image is a tuple that shows its height, width, and number of color channels.
  2. Step 2: Differentiate shape from other properties

    File size and data type are different properties; shape specifically refers to dimensions and channels.
  3. Final Answer:

    The dimensions and number of color channels of the image -> Option C
  4. Quick Check:

    Shape = dimensions + channels [OK]
Hint: Shape always shows height, width, and channels [OK]
Common Mistakes:
  • Confusing shape with file size
  • Mixing up data type with shape
  • Thinking shape shows compression
2. Which of the following is the correct way to get the data type of an image stored in a NumPy array named img?
easy
A. img.dtype
B. img.type()
C. img.data_type
D. img.get_dtype()

Solution

  1. Step 1: Recall NumPy syntax for data type

    In NumPy, the data type of an array is accessed using the dtype attribute.
  2. Step 2: Check each option

    Only img.dtype is valid syntax; others are incorrect or do not exist.
  3. Final Answer:

    img.dtype -> Option A
  4. Quick Check:

    Use .dtype to get data type [OK]
Hint: Use .dtype attribute for NumPy array data type [OK]
Common Mistakes:
  • Using parentheses like a function
  • Trying non-existent attributes
  • Confusing dtype with type() function
3. Given the following code:
import numpy as np
img = np.zeros((100, 200, 3), dtype=np.uint8)
print(img.size)

What will be the output?
medium
A. 3
B. 60000
C. 200
D. 100

Solution

  1. Step 1: Understand the shape and size

    The image shape is (100, 200, 3). Size is total number of elements = 100 * 200 * 3 = 60000.
  2. Step 2: Confirm what .size returns

    The size attribute returns total pixels including all channels.
  3. Final Answer:

    60000 -> Option B
  4. Quick Check:

    Size = height * width * channels = 60000 [OK]
Hint: Multiply all shape dimensions for size [OK]
Common Mistakes:
  • Using only height or width as size
  • Ignoring color channels in size
  • Confusing size with shape
4. Consider this code snippet:
import numpy as np
img = np.array([[255, 128], [64, 0]])
print(img.shape)
print(img.dtype)

What is the error in this code if the goal is to represent a color image?
medium
A. The array values are out of range for images
B. The dtype should be float instead of int
C. The shape attribute is called incorrectly
D. The array shape lacks a color channel dimension

Solution

  1. Step 1: Check the array shape

    The array shape is (2, 2), meaning 2 rows and 2 columns, no color channels.
  2. Step 2: Understand color image requirements

    A color image needs 3 dimensions: height, width, and channels (usually 3 for RGB).
  3. Final Answer:

    The array shape lacks a color channel dimension -> Option D
  4. Quick Check:

    Color images need 3D shape [OK]
Hint: Color images need 3D shape (height, width, channels) [OK]
Common Mistakes:
  • Thinking dtype must be float for images
  • Assuming shape attribute is wrong
  • Believing pixel values are out of range
5. You have a grayscale image loaded as a NumPy array with shape (256, 256) and dtype float32. You want to convert it to an 8-bit unsigned integer image suitable for display. Which code snippet correctly does this?
hard
A. img_uint8 = (img * 255).astype(np.uint8)
B. img_uint8 = img.astype(np.uint8)
C. img_uint8 = img / 255
D. img_uint8 = img.astype(np.float64)

Solution

  1. Step 1: Understand dtype conversion needs

    Converting from float32 (0 to 1 range) to uint8 (0 to 255) requires scaling by 255.
  2. Step 2: Check each option

    img_uint8 = (img * 255).astype(np.uint8) scales and converts correctly. img_uint8 = img.astype(np.uint8) converts without scaling, causing wrong values. Options A, B, and D do not convert to uint8 properly.
  3. Final Answer:

    img_uint8 = (img * 255).astype(np.uint8) -> Option A
  4. Quick Check:

    Scale float to 255 then convert to uint8 [OK]
Hint: Multiply floats by 255 before uint8 conversion [OK]
Common Mistakes:
  • Skipping scaling before type conversion
  • Using wrong dtype conversion
  • Dividing instead of multiplying