Prompt Engineering / GenAIml~20 mins

LLM scaling laws in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - LLM scaling laws

Problem:You want to understand how increasing the size of a large language model (LLM) affects its performance on a text prediction task.

Current Metrics:Model with ~0.1 million parameters achieves ~60% accuracy on validation data.

Issue:The model is too small and does not reach desired accuracy. You want to see how scaling up parameters improves results.

Your Task

Train LLMs of increasing sizes (~0.1M, ~0.7M, ~3M parameters) and observe how validation accuracy improves. Target: validation accuracy >85% with larger models.

Use the same dataset and training procedure for all models.

Only change model size (number of parameters).

Keep training epochs and batch size fixed.

Hint 1

Hint 2

Hint 3

Solution

Prompt Engineering / GenAI

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset, random_split
import matplotlib.pyplot as plt

# Dummy dataset: simple text classification with tokenized inputs
X = torch.randint(0, 1000, (1000, 50))  # 1000 samples, 50 tokens each
Y = (X.sum(dim=1) % 2).long()           # Learnable binary labels based on input
dataset = TensorDataset(X, Y)
train_size = int(0.8 * len(dataset))
val_size = len(dataset) - train_size
train_ds, val_ds = random_split(dataset, [train_size, val_size])
train_loader = DataLoader(train_ds, batch_size=32, shuffle=True)
val_loader = DataLoader(val_ds, batch_size=32, shuffle=False)

class SimpleLLM(nn.Module):
    def __init__(self, vocab_size, embed_dim, num_layers, hidden_dim):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embed_dim)
        self.layers = nn.ModuleList([
            nn.TransformerEncoderLayer(d_model=embed_dim, nhead=4, dim_feedforward=hidden_dim)
            for _ in range(num_layers)
        ])
        self.classifier = nn.Linear(embed_dim, 2)

    def forward(self, x):
        x = self.embedding(x)  # (batch, seq_len, embed_dim)
        x = x.permute(1, 0, 2)  # Transformer expects (seq_len, batch, embed_dim)
        for layer in self.layers:
            x = layer(x)
        x = x.mean(dim=0)  # average over sequence length
        out = self.classifier(x)
        return out

# Training function
def train_model(model, dataloader, epochs=5):
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=0.001)
    model.train()
    for epoch in range(epochs):
        for xb, yb in dataloader:
            optimizer.zero_grad()
            preds = model(xb)
            loss = criterion(preds, yb)
            loss.backward()
            optimizer.step()
    return model

# Evaluation function
def evaluate_model(model, dataloader):
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for xb, yb in dataloader:
            preds = model(xb)
            predicted = preds.argmax(dim=1)
            correct += (predicted == yb).sum().item()
            total += yb.size(0)
    return correct / total * 100

# Parameter counting
def count_params(model):
    return sum(p.numel() for p in model.parameters()) / 1e6

# Model sizes to test
model_configs = [
    {"num_layers": 2, "hidden_dim": 128, "embed_dim": 64},  # ~0.1M params
    {"num_layers": 4, "hidden_dim": 256, "embed_dim": 128}, # ~0.7M params
    {"num_layers": 6, "hidden_dim": 512, "embed_dim": 256}  # ~3M params
]

results = []
for config in model_configs:
    model = SimpleLLM(vocab_size=1000, embed_dim=config["embed_dim"], num_layers=config["num_layers"], hidden_dim=config["hidden_dim"])
    num_params = count_params(model)
    model = train_model(model, train_loader, epochs=5)
    acc = evaluate_model(model, val_loader)
    results.append({"params_approx": f"{num_params:.1f}M", "accuracy": acc})

# Print results
for r in results:
    print(f"Model size approx: {r["params_approx"]} params, Validation accuracy: {r["accuracy"]:.2f}%")

# Plot
sizes = [float(r["params_approx"][:-1]) for r in results]
accs = [r["accuracy"] for r in results]
plt.plot(sizes, accs, marker='o')
plt.xlabel('Model size (millions of params)')
plt.ylabel('Validation Accuracy (%)')
plt.title('LLM Scaling Law: Accuracy vs Model Size')
plt.grid(True)
plt.show()

Increased number of layers from 2 to 6 to scale model size.

Increased hidden dimension and embedding size to increase parameters.

Kept training epochs and batch size fixed to isolate effect of model size.

Results Interpretation

Before scaling: 0.1M params model accuracy ~60%
After scaling: 3M params model accuracy ~85%

This shows that larger LLMs perform better on the same task.

Increasing model size improves performance, demonstrating the LLM scaling law principle that bigger models generally learn better representations and achieve higher accuracy.

Bonus Experiment

Try adding dropout layers to the largest model to reduce overfitting and see if validation accuracy improves further.

💡 Hint

Add nn.Dropout layers after Transformer layers and tune dropout rate between 0.1 and 0.3.

Practice

(1/5)

1. What do LLM scaling laws primarily describe in language model training?

easy

A. The syntax rules for writing code in AI frameworks

B. How model size, data amount, and compute resources affect performance

C. The best way to label data for supervised learning

D. How to deploy models on mobile devices

LLM scaling laws in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand the purpose of scaling laws

Step 2: Match the description to options

Final Answer:

Quick Check:

Solution

Step 1: Recall the typical scaling law form

Step 2: Compare options to this form

Final Answer:

Quick Check:

Solution

Step 1: Calculate each term separately

Step 2: Sum the terms and round to 4 decimals

Final Answer:

Quick Check:

Solution

Step 1: Identify the intended formula

Step 2: Check the code exponents

Final Answer:

Quick Check:

Solution

Step 1: Understand compute constraints and scaling laws

Step 2: Choose strategy fitting limited compute

Final Answer:

Quick Check: