0
0
MLOpsdevops~20 mins

GPU vs CPU inference tradeoffs in MLOps - Practice Questions

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Inference Hardware Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Understanding latency differences between GPU and CPU inference

Which statement best explains why GPU inference can have higher latency for small batch sizes compared to CPU inference?

AGPUs have higher clock speeds than CPUs, causing delays in small tasks.
BCPUs use more cores than GPUs, making them faster for all batch sizes.
CGPUs require data transfer overhead and kernel launch time, which dominate at small batch sizes.
DGPUs cannot run inference on neural networks with small input sizes.
Attempts:
2 left
💡 Hint

Think about the extra steps GPUs need before starting computation.

💻 Command Output
intermediate
2:00remaining
Comparing CPU and GPU memory usage during inference

Given the command outputs below showing memory usage during inference, which output corresponds to GPU inference?

MLOps
CPU memory usage: 2.5 GB
GPU memory usage: 6.8 GB
ACPU memory usage: 2.5 GB
BGPU memory usage: 2.5 GB
CCPU memory usage: 6.8 GB
DGPU memory usage: 6.8 GB
Attempts:
2 left
💡 Hint

GPUs usually allocate more dedicated memory for model weights and activations.

🔀 Workflow
advanced
3:00remaining
Optimizing inference deployment for mixed CPU/GPU environments

You have a service that must handle both low-latency single requests and high-throughput batch requests. Which deployment strategy best balances GPU and CPU usage?

AUse CPU only to avoid GPU overhead and simplify deployment.
BUse CPU for single requests and GPU for batch requests to optimize latency and throughput.
CRoute all requests to GPU to maximize throughput, ignoring latency.
DUse GPU only for single requests and CPU for batch requests.
Attempts:
2 left
💡 Hint

Consider the strengths of CPU and GPU for different request sizes.

Troubleshoot
advanced
3:00remaining
Diagnosing slow GPU inference despite high GPU utilization

Your GPU inference shows high GPU utilization but slow overall response time. What is the most likely cause?

AData transfer between CPU and GPU is a bottleneck causing delays.
BThe model is too small to benefit from GPU acceleration.
CGPU drivers are outdated causing incorrect utilization reporting.
DThe batch size is too large causing GPU memory overflow.
Attempts:
2 left
💡 Hint

High GPU usage but slow response often means waiting on data movement.

Best Practice
expert
4:00remaining
Choosing hardware for cost-effective inference at scale

You manage a cloud deployment for ML inference with fluctuating demand. Which approach best balances cost and performance?

AUse CPU instances for low demand and scale GPU instances only when batch sizes increase.
BAlways use GPU instances to maximize speed regardless of cost.
CUse only CPU instances to minimize complexity and cost.
DUse GPU instances for all requests to simplify autoscaling.
Attempts:
2 left
💡 Hint

Think about matching hardware to workload patterns to save money.