Practice

(1/5)

1. What do LLM scaling laws primarily describe in language model training?

easy

A. The syntax rules for writing code in AI frameworks

B. How model size, data amount, and compute resources affect performance

C. The best way to label data for supervised learning

D. How to deploy models on mobile devices

Solution

Step 1: Understand the purpose of scaling laws
LLM scaling laws explain the relationship between model size, data, and compute with model performance.
Step 2: Match the description to options
Only How model size, data amount, and compute resources affect performance correctly describes this relationship, while others talk about unrelated topics.
Final Answer:
How model size, data amount, and compute resources affect performance -> Option B
Quick Check:
Scaling laws = model size, data, compute impact [OK]

Hint: Focus on model size, data, and compute impact keywords [OK]

Common Mistakes:

Confusing scaling laws with coding syntax
Thinking scaling laws are about data labeling
Assuming scaling laws relate to deployment

2. Which of the following is the correct formula representing a simplified LLM scaling law for loss L as a function of model parameters N and dataset size D?

easy

A. L = a / (N + D)

B. L = a + b * N + c * D

C. L = a * log(N) + b * log(D)

D. L = a * N^(-b) + c * D^(-d)

Solution

Step 1: Recall the typical scaling law form
Scaling laws often show loss decreases as power laws of model size and data, like L = a * N^(-b) + c * D^(-d).
Step 2: Compare options to this form
L = a * N^(-b) + c * D^(-d) matches the power law form; others use linear or logarithmic forms which are incorrect.
Final Answer:
L = a * N^(-b) + c * D^(-d) -> Option D
Quick Check:
Loss decreases as power laws of N and D [OK]

Hint: Look for power law (exponent) form in the formula [OK]

Common Mistakes:

Choosing linear formulas instead of power laws
Confusing logarithmic with power law forms
Ignoring the negative exponents for loss decrease

3. Consider this Python code simulating a simplified LLM loss calculation:

def loss(N, D, a=1.0, b=0.5, c=1.0, d=0.3):
    return a * N**(-b) + c * D**(-d)

print(round(loss(1000, 10000), 4))

What is the output?

medium

A. 0.0947

B. 0.1265

C. 0.0316

D. 1.0000

Solution

Step 1: Calculate each term separately
N=1000, b=0.5: 1000**(-0.5) = 1/sqrt(1000) ≈ 0.0316
D=10000, d=0.3: 10000**(-0.3) ≈ 0.0631
Step 2: Sum the terms and round to 4 decimals
1.0 * 0.0316 + 1.0 * 0.0631 = 0.0947
Final Answer:
0.0947 -> Option A
Quick Check:
N**(-0.5) + D**(-0.3) ≈ 0.0316 + 0.0631 = 0.0947 [OK]

Hint: Calculate each power term separately, then sum [OK]

Common Mistakes:

Calculating only one term instead of sum
Mixing up exponents or signs
Rounding too early causing errors

4. The following code aims to compute loss using LLM scaling laws but has a bug:

def loss(N, D, a=1.0, b=0.5, c=1.0, d=0.3):
    return a * N**b + c * D**d

print(round(loss(1000, 10000), 4))

What is the main error?

medium

A. Function should return a tuple, not a single value

B. Missing multiplication operator between variables

C. Exponents should be negative to show loss decreases with size

D. Parameters a and c should be integers only

Solution

Step 1: Identify the intended formula
LLM scaling laws show loss decreases as model size and data increase, so exponents must be negative.
Step 2: Check the code exponents
The code uses positive exponents (N**b and D**d), which incorrectly increase loss with size.
Final Answer:
Exponents should be negative to show loss decreases with size -> Option C
Quick Check:
Negative exponents mean loss decreases as size grows [OK]

Hint: Remember loss decreases, so exponents must be negative [OK]

Common Mistakes:

Thinking multiplication is missing
Believing return type must be tuple
Assuming parameter types must be integers

5. You want to reduce the loss of a large language model efficiently. According to LLM scaling laws, which strategy is best if you have limited compute but can increase data or model size?

hard

A. Increase dataset size moderately while keeping model size fixed

B. Increase model size drastically without adding data

C. Keep both model size and data fixed and train longer

D. Reduce dataset size to speed up training

Solution

Step 1: Understand compute constraints and scaling laws
Scaling laws show loss improves with both model size and data, but compute limits large model increases.
Step 2: Choose strategy fitting limited compute
Increasing data moderately is cheaper than drastically increasing model size, so Increase dataset size moderately while keeping model size fixed is best.
Final Answer:
Increase dataset size moderately while keeping model size fixed -> Option A
Quick Check:
Limited compute favors data increase over big model growth [OK]

Hint: With limited compute, grow data before model size [OK]

Common Mistakes:

Thinking bigger model always better regardless of compute
Ignoring compute limits and training time
Reducing data harms performance

Epoch	Loss ↓	Accuracy ↑	Observation
1	5.0	10%	Model starts with high loss and low accuracy
5	3.2	25%	Loss decreases and accuracy improves as model learns
10	2.5	40%	Model shows steady improvement
15	2.1	50%	Loss continues to decrease, accuracy rises
20	1.9	55%	Training converges with better performance

LLM scaling laws in Prompt Engineering / GenAI - Model Pipeline Trace

Start learning this pattern below

Practice

Solution

Step 1: Understand the purpose of scaling laws

Step 2: Match the description to options

Final Answer:

Quick Check:

Solution

Step 1: Recall the typical scaling law form

Step 2: Compare options to this form

Final Answer:

Quick Check:

Solution

Step 1: Calculate each term separately

Step 2: Sum the terms and round to 4 decimals

Final Answer:

Quick Check:

Solution

Step 1: Identify the intended formula

Step 2: Check the code exponents

Final Answer:

Quick Check:

Solution

Step 1: Understand compute constraints and scaling laws

Step 2: Choose strategy fitting limited compute

Final Answer:

Quick Check: