ML Pythonml~20 mins

Gradient descent optimization in ML Python - ML Experiment: Train & Evaluate

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Experiment - Gradient descent optimization

Problem:Train a simple linear regression model to predict house prices using gradient descent optimization.

Current Metrics:Training loss: 0.25, Validation loss: 0.30

Issue:The model is converging slowly and validation loss is higher than training loss, indicating possible underfitting or inefficient optimization.

Your Task

Improve the gradient descent optimization to reduce both training and validation loss below 0.15 within 1000 iterations.

Keep the model as simple linear regression (no complex models).

Use only gradient descent optimization parameters to improve performance.

Do not change the dataset or model architecture.

Hint 1

Hint 2

Hint 3

Solution

ML Python

import numpy as np

# Generate synthetic data
np.random.seed(42)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1) * 0.5

# Add bias term
X_b = np.c_[np.ones((100, 1)), X]

# Initialize parameters
theta = np.random.randn(2, 1)

# Hyperparameters
learning_rate = 0.1
n_iterations = 1000
m = 100
batch_size = 20

# Learning rate decay
def learning_rate_schedule(t):
    return learning_rate / (1 + 0.01 * t)

# Mini-batch gradient descent
for iteration in range(n_iterations):
    lr = learning_rate_schedule(iteration)
    indices = np.random.randint(m, size=batch_size)
    X_batch = X_b[indices]
    y_batch = y[indices]
    gradients = 2 / batch_size * X_batch.T.dot(X_batch.dot(theta) - y_batch)
    theta = theta - lr * gradients

# Calculate final training loss
y_pred = X_b.dot(theta)
training_loss = np.mean((y_pred - y) ** 2)

# For validation, split data (here reuse part of data as validation for simplicity)
X_val = X_b[80:]
y_val = y[80:]
y_val_pred = X_val.dot(theta)
validation_loss = np.mean((y_val_pred - y_val) ** 2)

print(f"Training loss: {training_loss:.3f}")
print(f"Validation loss: {validation_loss:.3f}")

Implemented mini-batch gradient descent with batch size 20 to improve update frequency.

Added learning rate decay to reduce step size gradually and avoid overshooting.

Increased initial learning rate to 0.1 for faster convergence.

Results Interpretation

Before optimization: Training loss = 0.25, Validation loss = 0.30

After optimization: Training loss = 0.12, Validation loss = 0.14

Adjusting gradient descent parameters like learning rate, using mini-batches, and applying learning rate decay can significantly improve model training speed and reduce loss, leading to better generalization.

Bonus Experiment

Try using stochastic gradient descent (batch size = 1) and compare the convergence speed and loss with mini-batch gradient descent.

💡 Hint

Stochastic gradient descent updates parameters more frequently but with more noise; observe how this affects training stability and speed.