Introduction
Building large language models is expensive and complex. Understanding how increasing size, data, and computing power affects their performance helps guide smarter development choices.
Jump into concepts and practice - no test required
Imagine training for a marathon. You need good shoes (model size), enough practice runs (training data), and time to train (compute power). Having only one without the others won't prepare you well for the race.
┌───────────────┐
│ Model Size │
└──────┬────────┘
│
┌──────▼────────┐
│ Training Data │
└──────┬────────┘
│
┌──────▼────────┐
│ Compute Power │
└──────┬────────┘
│
┌──────▼────────┐
│ Balanced │
│ Scaling │
└───────────────┘L = a * N^(-b) + c * D^(-d) matches the power law form; others use linear or logarithmic forms which are incorrect.def loss(N, D, a=1.0, b=0.5, c=1.0, d=0.3):
return a * N**(-b) + c * D**(-d)
print(round(loss(1000, 10000), 4))def loss(N, D, a=1.0, b=0.5, c=1.0, d=0.3):
return a * N**b + c * D**d
print(round(loss(1000, 10000), 4))