Bird
Raised Fist0
Prompt Engineering / GenAIml~20 mins

GPU infrastructure planning in Prompt Engineering / GenAI - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
GPU Infrastructure Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Understanding GPU Memory Requirements

You are planning GPU resources for training a deep learning model. The model requires 12GB of GPU memory per training batch. You want to train with a batch size of 64. How much total GPU memory is needed if you want to run the training on a single GPU without memory overflow?

A12 GB
B768 GB
C768 MB
D7680 GB
Attempts:
2 left
💡 Hint

Multiply the memory per batch by the batch size to get total memory needed.

Model Choice
intermediate
2:00remaining
Choosing GPUs for Parallel Training

You want to speed up training by using multiple GPUs in parallel. Which GPU setup is best for minimizing communication overhead between GPUs?

AGPUs connected over a standard Ethernet network
BMultiple GPUs connected via PCIe on the same motherboard
CGPUs in separate machines connected via Wi-Fi
DGPUs connected via USB hubs
Attempts:
2 left
💡 Hint

Consider the speed and latency of connections between GPUs.

Hyperparameter
advanced
2:00remaining
Adjusting Batch Size for GPU Memory Limits

You have a GPU with 24GB memory. Your model uses 8GB per batch of size 32. You want to increase batch size but cannot exceed GPU memory. What is the maximum batch size you can use?

A96
B64
C72
D48
Attempts:
2 left
💡 Hint

Calculate memory per sample from 8GB per 32 samples, then max batch size = 24GB / mem_per_sample.

Metrics
advanced
2:00remaining
Evaluating GPU Utilization Metrics

You monitor GPU utilization during training and see it averages 30%. What does this indicate about your GPU usage?

AGPU is fully utilized; training is optimal
BGPU is overheating; reduce workload
CGPU is underutilized; training could be faster with optimization
DGPU memory is full; reduce batch size
Attempts:
2 left
💡 Hint

Consider what low GPU utilization means for training speed.

🔧 Debug
expert
2:00remaining
Diagnosing Training Slowdown on Multi-GPU Setup

You set up training on 4 GPUs but notice training is slower than on a single GPU. Which is the most likely cause?

AHigh communication overhead between GPUs causing delays
BEach GPU has insufficient memory causing crashes
CThe model is too small to benefit from multiple GPUs
DThe GPUs are running at full utilization
Attempts:
2 left
💡 Hint

Think about what slows down multi-GPU training besides memory or utilization.

Practice

(1/5)
1. Why is it important to plan GPU infrastructure before starting a GenAI project?
easy
A. To reduce the size of the AI model automatically
B. To ensure the GPU has enough memory and speed for the AI model
C. Because GPUs are always cheaper than CPUs
D. To avoid using any GPUs and rely only on CPUs

Solution

  1. Step 1: Understand GPU role in AI projects

    GPUs speed up AI model training and need enough memory to handle data.
  2. Step 2: Importance of matching GPU specs to model needs

    Choosing a GPU with insufficient memory or speed will slow down or fail the project.
  3. Final Answer:

    To ensure the GPU has enough memory and speed for the AI model -> Option B
  4. Quick Check:

    GPU specs must match AI needs = D [OK]
Hint: Match GPU memory and speed to your AI model size [OK]
Common Mistakes:
  • Thinking CPUs can replace GPUs for heavy AI tasks
  • Assuming all GPUs have the same performance
  • Ignoring GPU memory limits
2. Which of the following is the correct way to check GPU memory using Python's PyTorch library?
easy
A. torch.cuda.memory_size()
B. torch.gpu.memory.total()
C. torch.cuda.get_device_properties(0).total_memory
D. torch.device.memory()

Solution

  1. Step 1: Recall PyTorch GPU memory query syntax

    The correct method is torch.cuda.get_device_properties(device_id).total_memory.
  2. Step 2: Check each option for correctness

    Only torch.cuda.get_device_properties(0).total_memory uses the correct PyTorch function and attribute.
  3. Final Answer:

    torch.cuda.get_device_properties(0).total_memory -> Option C
  4. Quick Check:

    Correct PyTorch GPU memory call = C [OK]
Hint: Use torch.cuda.get_device_properties(0).total_memory to check GPU memory [OK]
Common Mistakes:
  • Using non-existent PyTorch functions
  • Confusing device and memory functions
  • Missing the device index argument
3. Given this Python code snippet using PyTorch, what will be printed?
import torch
if torch.cuda.is_available():
    mem = torch.cuda.get_device_properties(0).total_memory
    print(mem > 8_000_000_000)
else:
    print(False)
medium
A. True if GPU memory is more than 8GB, else False
B. Always True
C. Always False
D. Raises an error if no GPU

Solution

  1. Step 1: Understand the code logic

    The code checks if a GPU is available, then compares its memory to 8GB (8 billion bytes).
  2. Step 2: Determine output based on GPU memory

    If GPU memory is greater than 8GB, it prints True; otherwise, False. If no GPU, prints False.
  3. Final Answer:

    True if GPU memory is more than 8GB, else False -> Option A
  4. Quick Check:

    GPU memory check > 8GB = A [OK]
Hint: Check GPU memory size condition to predict output [OK]
Common Mistakes:
  • Assuming always True regardless of GPU
  • Expecting error if no GPU instead of False
  • Confusing bytes with gigabytes
4. Identify the error in this GPU memory check code and select the fix:
import torch
if torch.cuda.is_available():
    mem = torch.cuda.get_device_properties().total_memory
    print(mem)
else:
    print('No GPU')
medium
A. Add device index 0 in get_device_properties: get_device_properties(0)
B. Replace torch.cuda.is_available() with torch.has_cuda()
C. Use torch.cuda.memory_allocated() instead of get_device_properties()
D. No error, code is correct

Solution

  1. Step 1: Check get_device_properties usage

    The function requires a device index argument, e.g., 0 for the first GPU.
  2. Step 2: Identify the fix

    Adding (0) fixes the error. Other options are incorrect or unnecessary.
  3. Final Answer:

    Add device index 0 in get_device_properties: get_device_properties(0) -> Option A
  4. Quick Check:

    Missing device index causes error = B [OK]
Hint: Always provide device index to get_device_properties() [OK]
Common Mistakes:
  • Omitting device index argument
  • Using non-existent torch.has_cuda()
  • Confusing memory functions
5. You plan to train a large GenAI model requiring 24GB GPU memory. Your local GPUs have 16GB each. Which is the best GPU infrastructure planning choice?
hard
A. Ignore memory limits and expect training to succeed
B. Reduce the model size to fit 16GB GPU and train locally
C. Train on CPU only to avoid GPU memory limits
D. Use multiple GPUs with model parallelism or switch to cloud GPUs with 24GB+ memory

Solution

  1. Step 1: Analyze GPU memory requirement vs available hardware

    The model needs 24GB, but local GPUs have only 16GB, so one GPU is insufficient.
  2. Step 2: Consider solutions for insufficient GPU memory

    Using multiple GPUs with model parallelism or cloud GPUs with enough memory solves the problem effectively.
  3. Final Answer:

    Use multiple GPUs with model parallelism or switch to cloud GPUs with 24GB+ memory -> Option D
  4. Quick Check:

    Match GPU memory to model needs with parallelism or cloud = A [OK]
Hint: Use multi-GPU or cloud GPUs for models needing more memory [OK]
Common Mistakes:
  • Trying to train large models on insufficient GPU memory
  • Ignoring cloud GPU options
  • Assuming CPU can replace GPU for large models