Challenge - 5 Problems

🎖️

Self-hosted LLM Mastery

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

2:00remaining

Understanding Model Size Impact on Self-hosted LLMs

Which of the following best explains how increasing the number of parameters in a self-hosted LLM like Llama or Mistral affects its performance and resource requirements?

AIncreasing parameters reduces accuracy but speeds up inference and lowers memory use.

BIncreasing parameters decreases both accuracy and resource requirements.

CIncreasing parameters has no effect on accuracy but increases training time only.

DIncreasing parameters improves model accuracy but requires more memory and slower inference.

Attempts:

2 left

❓ Predict Output

intermediate

2:00remaining

Output of Token Generation with Temperature in LLM

Given the following pseudocode for generating tokens from a self-hosted LLM with temperature=0.0, what is the expected behavior of the output tokens?

Prompt Engineering / GenAI

tokens = model.generate(input_ids, temperature=0.0, max_length=5)
print(tokens)

AThe model outputs tokens only from a fixed vocabulary subset.

BThe model outputs the most likely tokens deterministically.

CThe model outputs random tokens with equal probability.

DThe model outputs tokens with high randomness and diversity.

Attempts:

2 left

❓ Model Choice

advanced

2:00remaining

Choosing a Self-hosted LLM for Low-latency Applications

You want to deploy a self-hosted LLM for a chatbot that requires fast responses on limited hardware. Which model choice is best?

AA smaller Mistral 7B model quantized to 4-bit precision.

BA large Llama 70B model with full precision weights.

CA large Mistral 30B model with no quantization.

DA medium Llama 13B model with float32 precision.

Attempts:

2 left

❓ Metrics

advanced

2:00remaining

Evaluating Self-hosted LLM Output Quality

Which metric is most appropriate to evaluate the quality of text generated by a self-hosted LLM like Llama or Mistral on a language generation task?

APerplexity measuring how well the model predicts the next token.

BAccuracy measuring exact token matches with ground truth.

CMean Squared Error between predicted and actual token embeddings.

DF1 score measuring classification correctness.

Attempts:

2 left

🔧 Debug

expert

3:00remaining

Debugging Memory Error in Self-hosted LLM Inference

You try to run inference on a Llama 13B model but get a CUDA out-of-memory error. Which action will most likely fix this issue?

AAdd more layers to the model to distribute memory load.

BIncrease learning rate to speed up training and reduce memory.

CReduce batch size or use model quantization to lower memory use.

DDisable GPU and run inference on CPU only.

Attempts:

2 left

Practice

(1/5)

1. What is the main advantage of using self-hosted LLMs like Llama or Mistral?

easy

A. You keep full control and privacy over your data

B. They always run faster than cloud models

C. They require no installation or setup

D. They provide unlimited free internet access

Self-hosted LLMs (Llama, Mistral) in Prompt Engineering / GenAI - Practice Problems & Coding Challenges

Start learning this pattern below

Practice

Solution

Step 1: Understand self-hosted LLMs purpose

Step 2: Compare options

Final Answer:

Quick Check:

Solution

Step 1: Identify correct library and class

Step 2: Check method to load model

Final Answer:

Quick Check:

Solution

Step 1: Understand model.generate output

Step 2: Decode tokens to string

Final Answer:

Quick Check:

Solution

Step 1: Check method names in Transformers

Step 2: Identify error cause

Final Answer:

Quick Check:

Solution

Step 1: Understand memory constraints

Step 2: Apply quantization

Step 3: Evaluate other options

Final Answer:

Quick Check: