0
0
Prompt Engineering / GenAIml~20 mins

Self-hosted LLMs (Llama, Mistral) in Prompt Engineering / GenAI - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Self-hosted LLM Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Understanding Model Size Impact on Self-hosted LLMs
Which of the following best explains how increasing the number of parameters in a self-hosted LLM like Llama or Mistral affects its performance and resource requirements?
AIncreasing parameters reduces accuracy but speeds up inference and lowers memory use.
BIncreasing parameters decreases both accuracy and resource requirements.
CIncreasing parameters has no effect on accuracy but increases training time only.
DIncreasing parameters improves model accuracy but requires more memory and slower inference.
Attempts:
2 left
💡 Hint
Think about how bigger models usually behave in terms of accuracy and hardware needs.
Predict Output
intermediate
2:00remaining
Output of Token Generation with Temperature in LLM
Given the following pseudocode for generating tokens from a self-hosted LLM with temperature=0.0, what is the expected behavior of the output tokens?
Prompt Engineering / GenAI
tokens = model.generate(input_ids, temperature=0.0, max_length=5)
print(tokens)
AThe model outputs tokens only from a fixed vocabulary subset.
BThe model outputs the most likely tokens deterministically.
CThe model outputs random tokens with equal probability.
DThe model outputs tokens with high randomness and diversity.
Attempts:
2 left
💡 Hint
Temperature controls randomness in token selection.
Model Choice
advanced
2:00remaining
Choosing a Self-hosted LLM for Low-latency Applications
You want to deploy a self-hosted LLM for a chatbot that requires fast responses on limited hardware. Which model choice is best?
AA smaller Mistral 7B model quantized to 4-bit precision.
BA large Llama 70B model with full precision weights.
CA large Mistral 30B model with no quantization.
DA medium Llama 13B model with float32 precision.
Attempts:
2 left
💡 Hint
Consider model size and quantization effects on speed and memory.
Metrics
advanced
2:00remaining
Evaluating Self-hosted LLM Output Quality
Which metric is most appropriate to evaluate the quality of text generated by a self-hosted LLM like Llama or Mistral on a language generation task?
APerplexity measuring how well the model predicts the next token.
BAccuracy measuring exact token matches with ground truth.
CMean Squared Error between predicted and actual token embeddings.
DF1 score measuring classification correctness.
Attempts:
2 left
💡 Hint
Think about metrics used in language modeling tasks.
🔧 Debug
expert
3:00remaining
Debugging Memory Error in Self-hosted LLM Inference
You try to run inference on a Llama 13B model but get a CUDA out-of-memory error. Which action will most likely fix this issue?
AAdd more layers to the model to distribute memory load.
BIncrease learning rate to speed up training and reduce memory.
CReduce batch size or use model quantization to lower memory use.
DDisable GPU and run inference on CPU only.
Attempts:
2 left
💡 Hint
Think about how to reduce GPU memory usage during inference.