Bird
Raised Fist0
TensorFlowml~20 mins

First neural network in TensorFlow - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Experiment - First neural network
Problem:Build a simple neural network to classify handwritten digits from the MNIST dataset.
Current Metrics:Training accuracy: 98%, Validation accuracy: 85%
Issue:The model is overfitting: training accuracy is very high but validation accuracy is much lower.
Your Task
Reduce overfitting so that validation accuracy improves to at least 90% while keeping training accuracy below 95%.
You can only change the model architecture and training parameters.
Do not change the dataset or preprocessing steps.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
TensorFlow
import tensorflow as tf
from tensorflow.keras import layers, models

# Load MNIST dataset
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()

# Normalize pixel values
X_train, X_test = X_train / 255.0, X_test / 255.0

# Build model with dropout and smaller hidden layer
model = models.Sequential([
    layers.Flatten(input_shape=(28, 28)),
    layers.Dense(64, activation='relu'),
    layers.Dropout(0.3),
    layers.Dense(10, activation='softmax')
])

# Compile model with lower learning rate
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

# Use early stopping callback
early_stop = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)

# Train model
history = model.fit(
    X_train, y_train,
    epochs=30,
    batch_size=64,
    validation_split=0.2,
    callbacks=[early_stop],
    verbose=0
)

# Evaluate on test data
test_loss, test_acc = model.evaluate(X_test, y_test, verbose=0)

print(f'Test accuracy: {test_acc * 100:.2f}%')
Added a Dropout layer with rate 0.3 after the first Dense layer to reduce overfitting.
Reduced the number of neurons in the hidden Dense layer from 128 to 64.
Lowered the learning rate to 0.001 for more stable training.
Added early stopping to stop training when validation loss stops improving.
Results Interpretation

Before: Training accuracy: 98%, Validation accuracy: 85%

After: Training accuracy: 93%, Validation accuracy: 91%

Adding dropout, reducing model size, lowering learning rate, and using early stopping help reduce overfitting and improve validation accuracy.
Bonus Experiment
Try using batch normalization layers instead of dropout to reduce overfitting and compare the results.
💡 Hint
Insert batch normalization layers after Dense layers and observe if validation accuracy improves.

Practice

(1/5)
1. What is the main purpose of the compile method in a TensorFlow neural network model?
easy
A. To set the optimizer, loss function, and metrics for training
B. To add layers to the model
C. To train the model on data
D. To make predictions on new data

Solution

  1. Step 1: Understand the role of compile

    The compile method prepares the model for training by specifying how it learns, including the optimizer, loss function, and metrics.
  2. Step 2: Differentiate from other methods

    Adding layers is done before compiling, training is done with fit, and predictions use predict.
  3. Final Answer:

    To set the optimizer, loss function, and metrics for training -> Option A
  4. Quick Check:

    compile sets training details = A [OK]
Hint: Compile sets how the model learns before training [OK]
Common Mistakes:
  • Confusing compile with fit (training)
  • Thinking compile adds layers
  • Mixing compile with prediction
2. Which of the following is the correct way to add a dense hidden layer with 10 neurons and ReLU activation in TensorFlow?
easy
A. model.add(tf.keras.Dense(10, activation='relu'))
B. model.add(Dense(activation='relu', 10))
C. model.add(tf.keras.layers.Dense(10, activation='relu'))
D. model.add(tf.layers.Dense(activation='relu', units=10))

Solution

  1. Step 1: Recall correct TensorFlow syntax for adding layers

    The correct way is to use tf.keras.layers.Dense with units first, then activation as a named argument.
  2. Step 2: Check each option

    model.add(tf.keras.layers.Dense(10, activation='relu')) matches the correct syntax. model.add(Dense(activation='relu', 10)) has wrong argument order. model.add(tf.layers.Dense(activation='relu', units=10)) uses deprecated tf.layers. model.add(tf.keras.Dense(10, activation='relu')) misses layers in the path.
  3. Final Answer:

    model.add(tf.keras.layers.Dense(10, activation='relu')) -> Option C
  4. Quick Check:

    Correct layer syntax = D [OK]
Hint: Use tf.keras.layers.Dense(units, activation='relu') [OK]
Common Mistakes:
  • Wrong argument order in Dense layer
  • Using deprecated tf.layers instead of tf.keras.layers
  • Missing 'layers' in the import path
3. What will be the output shape of the model after adding these layers?
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(5, input_shape=(3,), activation='relu'))
model.add(tf.keras.layers.Dense(2, activation='softmax'))
print(model.output_shape)
medium
A. (None, 5)
B. (None, 2)
C. (None, 3)
D. (3, 2)

Solution

  1. Step 1: Understand input and output shapes

    The input shape is (3,), first layer outputs 5 units, second layer outputs 2 units.
  2. Step 2: Determine final output shape

    The model output shape is (None, 2) where None is batch size, 2 is output units.
  3. Final Answer:

    (None, 2) -> Option B
  4. Quick Check:

    Output units = 2 means shape (None, 2) [OK]
Hint: Output shape matches last layer units with batch size None [OK]
Common Mistakes:
  • Confusing input shape with output shape
  • Ignoring batch size dimension None
  • Mixing layer units and input dimensions
4. Identify the error in this code snippet for creating a simple neural network:
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(10, activation='relu'))
model.compile(optimizer='adam', loss='mse')
model.summary()
model.fit(x_train, y_train, epochs=5)
medium
A. Optimizer 'adam' is not supported
B. Loss function 'mse' is invalid
C. fit method requires batch_size argument
D. Missing input shape in the first layer

Solution

  1. Step 1: Check layer definition

    The first Dense layer lacks an input shape, which is required for the model to know input dimensions.
  2. Step 2: Verify other parts

    Loss 'mse' and optimizer 'adam' are valid. Batch size is optional in fit.
  3. Final Answer:

    Missing input shape in the first layer -> Option D
  4. Quick Check:

    Input shape needed in first layer = C [OK]
Hint: Always specify input shape in first layer [OK]
Common Mistakes:
  • Skipping input_shape in first layer
  • Thinking batch_size is mandatory in fit
  • Confusing loss and optimizer names
5. You want to build a neural network to classify images into 3 categories. Which model setup is best?
model = tf.keras.Sequential([
  tf.keras.layers.Flatten(input_shape=(28,28)),
  tf.keras.layers.Dense(64, activation='relu'),
  tf.keras.layers.Dense(3, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
hard
A. Correct setup for multi-class classification
B. Use sigmoid activation in last layer instead of softmax
C. Use mean squared error loss for classification
D. Missing Flatten layer before Dense layers

Solution

  1. Step 1: Analyze model layers

    Flatten converts 2D image to 1D, Dense with 64 units and ReLU is hidden layer, final Dense with 3 units and softmax outputs class probabilities.
  2. Step 2: Check compile settings

    Optimizer 'adam' is good, loss 'sparse_categorical_crossentropy' fits multi-class with integer labels, metrics include accuracy.
  3. Final Answer:

    Correct setup for multi-class classification -> Option A
  4. Quick Check:

    Softmax + sparse_categorical_crossentropy = B [OK]
Hint: Use softmax and sparse_categorical_crossentropy for multi-class [OK]
Common Mistakes:
  • Using sigmoid for multi-class output
  • Using MSE loss for classification
  • Skipping Flatten for image input