0
0
Prompt Engineering / GenAIml~20 mins

Latency optimization in Prompt Engineering / GenAI - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Latency Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Understanding Latency in Model Inference

Which factor most directly affects the latency of a machine learning model during inference?

AThe number of layers and parameters in the model
BThe size of the training dataset
CThe type of optimizer used during training
DThe number of epochs used during training
Attempts:
2 left
💡 Hint

Think about what happens when the model makes a prediction.

Predict Output
intermediate
2:00remaining
Effect of Batch Size on Latency

What is the output of the following code that simulates latency for different batch sizes?

Prompt Engineering / GenAI
import time

def simulate_latency(batch_size):
    base_time = 0.01  # seconds per sample
    total_time = base_time * batch_size
    time.sleep(total_time)
    return total_time

latencies = {b: simulate_latency(b) for b in [1, 5, 10]}
print(latencies)
A{1: 0.01, 5: 0.05, 10: 0.1}
B{1: 0.01, 5: 0.01, 10: 0.01}
C{1: 0.1, 5: 0.5, 10: 1.0}
D{1: 0.01, 5: 0.1, 10: 0.5}
Attempts:
2 left
💡 Hint

Latency scales linearly with batch size in this simulation.

Model Choice
advanced
2:00remaining
Choosing a Model for Low Latency

You need to deploy a model on a device with limited processing power and require very low latency. Which model architecture is best suited?

AA deep convolutional neural network with 50 layers
BA small decision tree model
CA large transformer model with billions of parameters
DA recurrent neural network with multiple LSTM layers
Attempts:
2 left
💡 Hint

Think about model size and computation needed for fast predictions.

Hyperparameter
advanced
2:00remaining
Hyperparameter Impact on Latency

Which hyperparameter adjustment is most likely to reduce inference latency without retraining the model?

AReducing the number of training epochs
BIncreasing the learning rate
CLowering the batch size during inference
DChanging the activation function to ReLU
Attempts:
2 left
💡 Hint

Consider what happens when you process fewer samples at once.

🔧 Debug
expert
2:00remaining
Debugging Latency Bottleneck in Code

Given the code below, which line is the main cause of increased latency during inference?

def predict(model, data):
    results = []
    for sample in data:
        processed = preprocess(sample)
        output = model(processed)
        results.append(output)
    return results

# preprocess is slow due to heavy image resizing
# model is optimized and fast
ALine 3: for sample in data:
BLine 6: results.append(output)
CLine 5: output = model(processed)
DLine 4: processed = preprocess(sample)
Attempts:
2 left
💡 Hint

Focus on which step is described as slow.