Practice

(1/5)

1. Which of the following is a main advantage of using a GPU over a CPU for machine learning inference?

easy

A. Lower power consumption for small tasks

B. Cheaper hardware cost

C. Better performance on single-threaded tasks

D. Faster processing for large batches of data

Solution

Step 1: Understand GPU design for parallelism
GPUs have many cores designed to handle many operations at once, making them faster for large data batches.
Step 2: Compare CPU and GPU strengths
CPUs are better for single-threaded or small tasks, but GPUs excel at parallel processing, speeding up large inference jobs.
Final Answer:
Faster processing for large batches of data -> Option D
Quick Check:
GPU parallelism = Faster large batch inference [OK]

Hint: GPUs excel at many tasks at once, CPUs at few tasks fast [OK]

Common Mistakes:

Thinking GPUs always use less power
Assuming CPUs are cheaper for large-scale inference
Confusing single-threaded speed with parallel speed

2. Which command correctly runs a TensorFlow model inference on CPU only, ignoring GPUs?

easy

A. CUDA_VISIBLE_DEVICES=0 python inference.py

B. CUDA_VISIBLE_DEVICES='' python inference.py

C. CUDA_VISIBLE_DEVICES=-1 python inference.py

D. CUDA_VISIBLE_DEVICES=all python inference.py

Solution

Step 1: Understand CUDA_VISIBLE_DEVICES usage
Setting CUDA_VISIBLE_DEVICES to an empty string disables GPU visibility, forcing CPU usage.
Step 2: Check each option's effect
CUDA_VISIBLE_DEVICES='' python inference.py disables GPUs correctly; others either select GPUs or use invalid values.
Final Answer:
CUDA_VISIBLE_DEVICES='' python inference.py -> Option B
Quick Check:
Empty CUDA_VISIBLE_DEVICES disables GPU [OK]

Hint: Empty CUDA_VISIBLE_DEVICES means no GPU used [OK]

Common Mistakes:

Using 0 disables only GPU 0, not all GPUs
Using -1 is invalid for CUDA_VISIBLE_DEVICES
Assuming 'all' enables all GPUs but not disables

3. Given this Python snippet for inference timing:

import time
start = time.time()
# Run model inference here
end = time.time()
print(round(end - start, 2))

If GPU inference takes 0.05 seconds and CPU inference takes 0.5 seconds, what will be printed when running on CPU?

medium

A. 0.05

B. 50.0

C. 0.5

D. 5.0

Solution

Step 1: Understand timing code output
The code prints the elapsed time rounded to 2 decimals, so it shows seconds taken.
Step 2: Match CPU inference time to output
CPU inference takes 0.5 seconds, so the printed output is 0.5.
Final Answer:
0.5 -> Option C
Quick Check:
CPU time = 0.5 seconds printed [OK]

Hint: Printed time matches actual elapsed seconds rounded [OK]

Common Mistakes:

Confusing milliseconds with seconds
Choosing GPU time instead of CPU time
Misreading rounding precision

4. You run inference on a GPU but notice it is slower than CPU. Which fix is most likely to improve GPU inference speed?

medium

A. Increase batch size to better use GPU parallelism

B. Reduce batch size to avoid GPU overload

C. Disable GPU and force CPU usage

D. Use single-threaded CPU mode

Solution

Step 1: Identify GPU performance factors
GPUs perform best with larger batch sizes to utilize many cores efficiently.
Step 2: Evaluate options for improving GPU speed
Increasing batch size improves GPU throughput; reducing batch size or disabling GPU lowers performance.
Final Answer:
Increase batch size to better use GPU parallelism -> Option A
Quick Check:
GPU speed improves with larger batches [OK]

Hint: Bigger batches = better GPU use [OK]

Common Mistakes:

Thinking smaller batches speed up GPU
Disabling GPU to fix GPU slowness
Using single-thread CPU instead of GPU

5. You have a small model and low input volume but a tight budget. Which inference setup is best to minimize cost while maintaining reasonable speed?

hard

A. Use CPU inference with small batch sizes

B. Use GPU inference with large batch sizes

C. Use GPU inference with small batch sizes

D. Use CPU inference with large batch sizes

Solution

Step 1: Analyze model size and input volume impact
Small models and low input do not benefit much from GPU parallelism, so GPU cost is less justified.
Step 2: Consider budget and batch size tradeoffs
CPU inference with small batches reduces cost and matches low volume needs without GPU overhead.
Final Answer:
Use CPU inference with small batch sizes -> Option A
Quick Check:
Small model + low volume + budget = CPU small batch [OK]

Hint: Small model + low volume = CPU for cost savings [OK]

Common Mistakes:

Choosing GPU despite low volume and budget
Using large batches on CPU causing delays
Ignoring cost when selecting GPU

Input Size (n)	Approx. Operations
10	~10/b batches, fast inference
100	~100/b batches, moderate inference time
1000	~1000/b batches, longer inference time

GPU vs CPU inference tradeoffs in MLOps - Performance Comparison

Start learning this pattern below

Practice

Solution

Step 1: Understand GPU design for parallelism

Step 2: Compare CPU and GPU strengths

Final Answer:

Quick Check:

Solution

Step 1: Understand CUDA_VISIBLE_DEVICES usage

Step 2: Check each option's effect

Final Answer:

Quick Check:

Solution

Step 1: Understand timing code output

Step 2: Match CPU inference time to output

Final Answer:

Quick Check:

Solution

Step 1: Identify GPU performance factors

Step 2: Evaluate options for improving GPU speed

Final Answer:

Quick Check:

Solution

Step 1: Analyze model size and input volume impact

Step 2: Consider budget and batch size tradeoffs

Final Answer:

Quick Check: