Challenge - 5 Problems
Self-hosted LLM Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate2:00remaining
Understanding Model Size Impact on Self-hosted LLMs
Which of the following best explains how increasing the number of parameters in a self-hosted LLM like Llama or Mistral affects its performance and resource requirements?
Attempts:
2 left
💡 Hint
Think about how bigger models usually behave in terms of accuracy and hardware needs.
✗ Incorrect
Larger models generally capture more complex patterns, improving accuracy, but they need more memory and compute, slowing down inference.
❓ Predict Output
intermediate2:00remaining
Output of Token Generation with Temperature in LLM
Given the following pseudocode for generating tokens from a self-hosted LLM with temperature=0.0, what is the expected behavior of the output tokens?
Prompt Engineering / GenAI
tokens = model.generate(input_ids, temperature=0.0, max_length=5) print(tokens)
Attempts:
2 left
💡 Hint
Temperature controls randomness in token selection.
✗ Incorrect
A temperature of 0.0 means the model picks the highest probability token every time, making output deterministic.
❓ Model Choice
advanced2:00remaining
Choosing a Self-hosted LLM for Low-latency Applications
You want to deploy a self-hosted LLM for a chatbot that requires fast responses on limited hardware. Which model choice is best?
Attempts:
2 left
💡 Hint
Consider model size and quantization effects on speed and memory.
✗ Incorrect
Smaller models with quantization reduce memory and speed up inference, suitable for low-latency on limited hardware.
❓ Metrics
advanced2:00remaining
Evaluating Self-hosted LLM Output Quality
Which metric is most appropriate to evaluate the quality of text generated by a self-hosted LLM like Llama or Mistral on a language generation task?
Attempts:
2 left
💡 Hint
Think about metrics used in language modeling tasks.
✗ Incorrect
Perplexity measures how surprised the model is by the next token, indicating prediction quality in language generation.
🔧 Debug
expert3:00remaining
Debugging Memory Error in Self-hosted LLM Inference
You try to run inference on a Llama 13B model but get a CUDA out-of-memory error. Which action will most likely fix this issue?
Attempts:
2 left
💡 Hint
Think about how to reduce GPU memory usage during inference.
✗ Incorrect
Reducing batch size or quantizing the model reduces memory needed, preventing out-of-memory errors during inference.