Bird
Raised Fist0
Prompt Engineering / GenAIml~5 mins

LLM scaling laws in Prompt Engineering / GenAI - Cheat Sheet & Quick Revision

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Recall & Review
beginner
What are LLM scaling laws?
LLM scaling laws describe how the performance of large language models improves predictably as we increase model size, data, or compute.
Click to reveal answer
beginner
Why does increasing model size help LLMs perform better?
Bigger models can learn more patterns and details from data, which helps them understand and generate language more accurately.
Click to reveal answer
beginner
What three main factors do LLM scaling laws relate to?
They relate to model size (parameters), amount of training data, and compute power used for training.
Click to reveal answer
intermediate
How does training data size affect LLM performance according to scaling laws?
More training data generally improves performance, but the benefit grows slower as data size increases.
Click to reveal answer
intermediate
What is a practical takeaway from LLM scaling laws for building better models?
To improve LLMs, balance increasing model size, training data, and compute rather than focusing on just one.
Click to reveal answer
Which factor is NOT part of LLM scaling laws?
ATraining data size
BUser interface design
CCompute power
DModel size
What happens to LLM performance when you double the model size, keeping other factors fixed?
APerformance improves predictably but not necessarily doubles
BPerformance stays the same
CPerformance decreases
DPerformance doubles exactly
According to scaling laws, what is the effect of increasing training data size a lot?
APerformance improves linearly forever
BPerformance gets worse
CPerformance improves but with diminishing returns
DNo effect on performance
Which is a key insight from LLM scaling laws for training models?
AOnly increase model size, ignore data and compute
BTrain on small data sets repeatedly
CUse less compute to save money
DBalance model size, data, and compute for best results
LLM scaling laws help predict how performance changes when you change what?
AModel size, data amount, and compute
BTraining environment temperature
CModel architecture only
DUser feedback
Explain in your own words what LLM scaling laws are and why they matter.
Think about how bigger models and more data help language models get better.
You got /3 concepts.
    Describe how balancing model size, data, and compute can lead to better LLM performance.
    Imagine you want to bake a cake: you need the right amount of ingredients, oven heat, and time.
    You got /3 concepts.

      Practice

      (1/5)
      1. What do LLM scaling laws primarily describe in language model training?
      easy
      A. The syntax rules for writing code in AI frameworks
      B. How model size, data amount, and compute resources affect performance
      C. The best way to label data for supervised learning
      D. How to deploy models on mobile devices

      Solution

      1. Step 1: Understand the purpose of scaling laws

        LLM scaling laws explain the relationship between model size, data, and compute with model performance.
      2. Step 2: Match the description to options

        Only How model size, data amount, and compute resources affect performance correctly describes this relationship, while others talk about unrelated topics.
      3. Final Answer:

        How model size, data amount, and compute resources affect performance -> Option B
      4. Quick Check:

        Scaling laws = model size, data, compute impact [OK]
      Hint: Focus on model size, data, and compute impact keywords [OK]
      Common Mistakes:
      • Confusing scaling laws with coding syntax
      • Thinking scaling laws are about data labeling
      • Assuming scaling laws relate to deployment
      2. Which of the following is the correct formula representing a simplified LLM scaling law for loss L as a function of model parameters N and dataset size D?
      easy
      A. L = a / (N + D)
      B. L = a + b * N + c * D
      C. L = a * log(N) + b * log(D)
      D. L = a * N^(-b) + c * D^(-d)

      Solution

      1. Step 1: Recall the typical scaling law form

        Scaling laws often show loss decreases as power laws of model size and data, like L = a * N^(-b) + c * D^(-d).
      2. Step 2: Compare options to this form

        L = a * N^(-b) + c * D^(-d) matches the power law form; others use linear or logarithmic forms which are incorrect.
      3. Final Answer:

        L = a * N^(-b) + c * D^(-d) -> Option D
      4. Quick Check:

        Loss decreases as power laws of N and D [OK]
      Hint: Look for power law (exponent) form in the formula [OK]
      Common Mistakes:
      • Choosing linear formulas instead of power laws
      • Confusing logarithmic with power law forms
      • Ignoring the negative exponents for loss decrease
      3. Consider this Python code simulating a simplified LLM loss calculation:
      def loss(N, D, a=1.0, b=0.5, c=1.0, d=0.3):
          return a * N**(-b) + c * D**(-d)
      
      print(round(loss(1000, 10000), 4))

      What is the output?
      medium
      A. 0.0947
      B. 0.1265
      C. 0.0316
      D. 1.0000

      Solution

      1. Step 1: Calculate each term separately

        N=1000, b=0.5: 1000**(-0.5) = 1/sqrt(1000) ≈ 0.0316
        D=10000, d=0.3: 10000**(-0.3) ≈ 0.0631
      2. Step 2: Sum the terms and round to 4 decimals

        1.0 * 0.0316 + 1.0 * 0.0631 = 0.0947
      3. Final Answer:

        0.0947 -> Option A
      4. Quick Check:

        N**(-0.5) + D**(-0.3) ≈ 0.0316 + 0.0631 = 0.0947 [OK]
      Hint: Calculate each power term separately, then sum [OK]
      Common Mistakes:
      • Calculating only one term instead of sum
      • Mixing up exponents or signs
      • Rounding too early causing errors
      4. The following code aims to compute loss using LLM scaling laws but has a bug:
      def loss(N, D, a=1.0, b=0.5, c=1.0, d=0.3):
          return a * N**b + c * D**d
      
      print(round(loss(1000, 10000), 4))

      What is the main error?
      medium
      A. Function should return a tuple, not a single value
      B. Missing multiplication operator between variables
      C. Exponents should be negative to show loss decreases with size
      D. Parameters a and c should be integers only

      Solution

      1. Step 1: Identify the intended formula

        LLM scaling laws show loss decreases as model size and data increase, so exponents must be negative.
      2. Step 2: Check the code exponents

        The code uses positive exponents (N**b and D**d), which incorrectly increase loss with size.
      3. Final Answer:

        Exponents should be negative to show loss decreases with size -> Option C
      4. Quick Check:

        Negative exponents mean loss decreases as size grows [OK]
      Hint: Remember loss decreases, so exponents must be negative [OK]
      Common Mistakes:
      • Thinking multiplication is missing
      • Believing return type must be tuple
      • Assuming parameter types must be integers
      5. You want to reduce the loss of a large language model efficiently. According to LLM scaling laws, which strategy is best if you have limited compute but can increase data or model size?
      hard
      A. Increase dataset size moderately while keeping model size fixed
      B. Increase model size drastically without adding data
      C. Keep both model size and data fixed and train longer
      D. Reduce dataset size to speed up training

      Solution

      1. Step 1: Understand compute constraints and scaling laws

        Scaling laws show loss improves with both model size and data, but compute limits large model increases.
      2. Step 2: Choose strategy fitting limited compute

        Increasing data moderately is cheaper than drastically increasing model size, so Increase dataset size moderately while keeping model size fixed is best.
      3. Final Answer:

        Increase dataset size moderately while keeping model size fixed -> Option A
      4. Quick Check:

        Limited compute favors data increase over big model growth [OK]
      Hint: With limited compute, grow data before model size [OK]
      Common Mistakes:
      • Thinking bigger model always better regardless of compute
      • Ignoring compute limits and training time
      • Reducing data harms performance