Imagine you want to teach a computer to recognize pictures of cats. What role does the training data play in this process?
Think about what the model needs to see to learn.
Training data is the set of examples the model uses to learn patterns. It helps the model understand what features relate to the target outcome.
Consider this simple code that simulates training a model by counting how many times a value appears in training data.
training_data = ['cat', 'dog', 'cat', 'bird', 'dog', 'cat'] model = {} for item in training_data: model[item] = model.get(item, 0) + 1 print(model)
Count how many times each animal appears in the list.
The code counts occurrences of each item in the training data. 'cat' appears 3 times, 'dog' 2 times, and 'bird' once.
You have 1000 images, each resized to 28x28 pixels and converted to grayscale. The images are stored as a NumPy array for training.
What will be the shape of the training data array?
Think about flattening each 28x28 image into a single row vector.
Each 28x28 image has 784 pixels. Flattening means each image becomes a 1D array of length 784. With 1000 images, the shape is (1000, 784).
You trained a model for 10 epochs and recorded accuracy after each epoch. Which plot type best visualizes this change over time?
Think about how to show change over a sequence of steps.
A line plot is ideal to show how accuracy changes over epochs because it connects points in order and shows trends clearly.
Look at this code snippet where features and labels have different lengths.
features = [[1, 2], [3, 4], [5, 6]] labels = [0, 1] from sklearn.linear_model import LogisticRegression model = LogisticRegression() model.fit(features, labels)
Check if the number of feature rows matches the number of labels.
Scikit-learn requires the number of samples in features and labels to be the same. Here, features has 3 samples but labels has 2, causing a ValueError.