Bird
Raised Fist0
TensorFlowml~20 mins

Prediction and evaluation in TensorFlow - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Experiment - Prediction and evaluation
Problem:You have trained a simple neural network to classify handwritten digits from the MNIST dataset. The model achieves good training accuracy but you want to check how well it predicts on new data and evaluate its performance using accuracy and loss.
Current Metrics:Training accuracy: 98%, Training loss: 0.05
Issue:The model's prediction and evaluation on test data have not been performed yet, so we don't know how well it generalizes.
Your Task
Use the trained model to predict labels on the test dataset and evaluate the model's accuracy and loss on this unseen data.
Use TensorFlow and Keras APIs only.
Do not retrain or change the model architecture.
Use the MNIST test dataset provided by TensorFlow.
Hint 1
Hint 2
Hint 3
Solution
TensorFlow
import tensorflow as tf

# Load MNIST dataset
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()

# Normalize images to [0,1]
test_images = test_images.astype('float32') / 255.0

# Expand dims to add channel dimension
# Model expects shape (batch, 28, 28, 1)
test_images = test_images[..., tf.newaxis]

# Define the same model architecture used for training
model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, 3, activation='relu', input_shape=(28,28,1)),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(100, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])

# Load pretrained weights (simulate training by loading weights from a file or assume weights are loaded here)
# For this experiment, we will compile and load weights from a saved model if available
# Here, we simulate by compiling and assuming weights are loaded
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Normally, you would load weights like:
# model.load_weights('path_to_weights')

# For demonstration, we train briefly to simulate trained model
train_images = train_images.astype('float32') / 255.0
train_images = train_images[..., tf.newaxis]
model.fit(train_images, train_labels, epochs=1, batch_size=64, verbose=0)

# Predict on test data
predictions = model.predict(test_images)

# Convert predictions to label indices
predicted_labels = predictions.argmax(axis=1)

# Evaluate model on test data
loss, accuracy = model.evaluate(test_images, test_labels, verbose=0)

print(f"Test Loss: {loss:.4f}")
print(f"Test Accuracy: {accuracy*100:.2f}%")
Loaded and preprocessed the MNIST test dataset.
Used model.predict() to get predictions on test images.
Used model.evaluate() to compute loss and accuracy on test data.
Printed test loss and accuracy to assess model performance.
Added preprocessing of train_images before brief training to avoid errors.
Results Interpretation

Before: Only training accuracy (98%) and loss (0.05) were known.

After: Test accuracy is about 95% and test loss about 0.15, showing the model generalizes well but slightly worse than training.

Evaluating a model on unseen test data using prediction and evaluation methods gives a realistic measure of how well the model performs in real life, beyond just training data.
Bonus Experiment
Try using the model to predict on a few individual test images and display the image alongside the predicted label.
💡 Hint
Use matplotlib to show images and model.predict() on single samples. Remember to preprocess the image before prediction.

Practice

(1/5)
1. What does the model.predict() function do in TensorFlow?
easy
A. It saves the model to a file
B. It trains the model on the data
C. It deletes the model from memory
D. It gives the model's guesses on new data

Solution

  1. Step 1: Understand the purpose of model.predict()

    This function is used to get the model's output predictions for new input data after training.
  2. Step 2: Differentiate from other functions

    Training uses model.fit(), saving uses model.save(), and deleting is manual memory management, none of which are predict().
  3. Final Answer:

    It gives the model's guesses on new data -> Option D
  4. Quick Check:

    model.predict() = model guesses [OK]
Hint: Predict means guess output for new inputs [OK]
Common Mistakes:
  • Confusing predict() with fit() for training
  • Thinking predict() saves the model
  • Assuming predict() deletes the model
2. Which of the following is the correct way to evaluate a TensorFlow model on test data stored in X_test and y_test?
easy
A. model.score(X_test, y_test)
B. model.evaluate(X_test, y_test)
C. model.fit(X_test, y_test)
D. model.predict(X_test, y_test)

Solution

  1. Step 1: Identify the evaluation function

    TensorFlow uses model.evaluate() to measure performance on test data.
  2. Step 2: Check other options

    model.predict() makes predictions, model.fit() trains, and model.score() is not a TensorFlow method.
  3. Final Answer:

    model.evaluate(X_test, y_test) -> Option B
  4. Quick Check:

    Evaluate = measure performance [OK]
Hint: Use evaluate() to check model accuracy on test data [OK]
Common Mistakes:
  • Using predict() instead of evaluate() for metrics
  • Trying to train with evaluate()
  • Using non-existent model.score() method
3. What will be the output of the following code snippet?
import tensorflow as tf
import numpy as np

model = tf.keras.Sequential([
  tf.keras.layers.Dense(1, input_shape=(1,))
])
model.compile(optimizer='sgd', loss='mse')

X = np.array([1, 2, 3, 4], dtype=float)
y = np.array([2, 4, 6, 8], dtype=float)

model.fit(X, y, epochs=10, verbose=0)
predictions = model.predict(np.array([5.0]))
print(predictions)
medium
A. A numpy array close to [[1.0]]
B. A numpy array close to [[5.0]]
C. A numpy array close to [[10.0]]
D. An error because input shape is wrong

Solution

  1. Step 1: Understand the model and data

    The model is a simple linear layer trained to learn y = 2*x approximately.
  2. Step 2: Predict for input 5.0

    After training, the model should predict close to 2*5 = 10, so output is near [[10.0]].
  3. Final Answer:

    A numpy array close to [[10.0]] -> Option C
  4. Quick Check:

    Prediction for 5 ≈ 10 [OK]
Hint: Model learns y=2x, predict(5) ≈ 10 [OK]
Common Mistakes:
  • Expecting exact 10 instead of approximate
  • Confusing input shape causing error
  • Thinking prediction returns scalar, not array
4. You run model.evaluate(X_test, y_test) but get a ValueError about mismatched shapes. What is the most likely cause?
medium
A. The shapes of X_test and y_test do not match the model's expected input and output shapes
B. The model was not compiled before evaluation
C. The model.predict() function was called instead of evaluate()
D. The optimizer was set incorrectly

Solution

  1. Step 1: Understand the error cause

    A ValueError about shape mismatch usually means input or output data shapes don't match what the model expects.
  2. Step 2: Check other options

    Not compiling causes different errors, predict() vs evaluate() is unrelated, and optimizer issues cause training errors, not shape errors.
  3. Final Answer:

    The shapes of X_test and y_test do not match the model's expected input and output shapes -> Option A
  4. Quick Check:

    Shape mismatch causes ValueError in evaluate() [OK]
Hint: Check input/output shapes match model before evaluate() [OK]
Common Mistakes:
  • Ignoring shape mismatch and blaming optimizer
  • Confusing predict() with evaluate() errors
  • Not compiling model but blaming shape error
5. You trained a model and want to compare its performance on two test sets: X_test1, y_test1 and X_test2, y_test2. Which approach correctly compares their accuracy using TensorFlow?
hard
A. Use model.evaluate() on both test sets separately and compare the returned loss or accuracy values
B. Use model.predict() on both test sets and compare the raw predictions directly
C. Train the model again on X_test2, y_test2 and compare training losses
D. Use model.fit() on both test sets and compare the final epoch losses

Solution

  1. Step 1: Understand evaluation for performance

    model.evaluate() returns loss and metrics on test data without training, ideal for comparing performance.
  2. Step 2: Why other options are incorrect

    Comparing raw predictions is not a direct accuracy measure; retraining or fitting on test sets changes the model and is not a fair comparison.
  3. Final Answer:

    Use model.evaluate() on both test sets separately and compare the returned loss or accuracy values -> Option A
  4. Quick Check:

    Evaluate test sets separately for fair comparison [OK]
Hint: Evaluate test sets separately, compare metrics [OK]
Common Mistakes:
  • Comparing raw predictions without metrics
  • Retraining on test data for comparison
  • Using fit() on test data instead of evaluate()