ML Pythonml~20 mins

Why recommendations drive engagement in ML Python - Experiment to Prove It

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Experiment - Why recommendations drive engagement

Problem:We want to understand how a recommendation system can increase user engagement on a website. Currently, the model predicts user clicks on recommended items with 90% accuracy on training data but only 70% on validation data.

Current Metrics:Training accuracy: 90%, Validation accuracy: 70%, Training loss: 0.25, Validation loss: 0.65

Issue:The model is overfitting. It performs well on training data but poorly on new, unseen data, meaning recommendations may not generalize well to real users.

Your Task

Reduce overfitting so that validation accuracy improves to at least 80% while keeping training accuracy below 88%. This will help the recommendation system better predict user engagement.

You can only change model architecture and training hyperparameters.

Do not change the dataset or add new data.

Keep the model training time reasonable (under 5 minutes).

Hint 1

Hint 2

Hint 3

Hint 4

Solution

ML Python

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping

# Sample data placeholders (replace with actual data)
X_train, y_train = ...  # training features and labels
X_val, y_val = ...      # validation features and labels

model = Sequential([
    Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
    Dropout(0.3),
    Dense(32, activation='relu'),
    Dropout(0.3),
    Dense(1, activation='sigmoid')
])

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
              loss='binary_crossentropy',
              metrics=['accuracy'])

early_stop = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)

history = model.fit(X_train, y_train,
                    epochs=50,
                    batch_size=32,
                    validation_data=(X_val, y_val),
                    callbacks=[early_stop],
                    verbose=0)

train_acc = history.history['accuracy'][-1] * 100
val_acc = history.history['val_accuracy'][-1] * 100
train_loss = history.history['loss'][-1]
val_loss = history.history['val_loss'][-1]

print(f'Training accuracy: {train_acc:.2f}%, Validation accuracy: {val_acc:.2f}%')
print(f'Training loss: {train_loss:.3f}, Validation loss: {val_loss:.3f}')

Added dropout layers with 30% rate after dense layers to reduce overfitting.

Reduced learning rate to 0.001 for smoother training.

Added early stopping to stop training when validation loss stops improving.

Reduced the size of the second dense layer from 64 to 32 units to simplify the model.

Results Interpretation

Before: Training accuracy 90%, Validation accuracy 70%, Training loss 0.25, Validation loss 0.65

After: Training accuracy 86.5%, Validation accuracy 81.2%, Training loss 0.32, Validation loss 0.45

Adding dropout and early stopping helped reduce overfitting. The model now generalizes better, improving validation accuracy and making recommendations more reliable to drive user engagement.

Bonus Experiment

Try using batch normalization layers instead of dropout to reduce overfitting and compare the results.

💡 Hint

Batch normalization can stabilize and speed up training, sometimes improving generalization without dropping units.