Challenge - 5 Problems
NumPy ML Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of NumPy array shape after sklearn train_test_split
What is the shape of X_train after running the following code?
NumPy
import numpy as np from sklearn.model_selection import train_test_split X = np.arange(20).reshape(10, 2) y = np.arange(10) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) print(X_train.shape)
Attempts:
2 left
💡 Hint
Remember test_size=0.3 means 30% data is for testing, rest for training.
✗ Incorrect
The original data has 10 samples. 30% of 10 is 3 samples for testing, so training has 7 samples. Each sample has 2 features, so shape is (7, 2).
❓ data_output
intermediate2:00remaining
Result of NumPy array after StandardScaler transform
What is the output array after applying StandardScaler to the data below?
NumPy
import numpy as np from sklearn.preprocessing import StandardScaler data = np.array([[1, 2], [3, 4], [5, 6]]) scaler = StandardScaler() scaled_data = scaler.fit_transform(data) print(np.round(scaled_data, 2))
Attempts:
2 left
💡 Hint
StandardScaler centers data to mean 0 and scales to unit variance.
✗ Incorrect
StandardScaler subtracts mean and divides by standard deviation for each feature. The rounded output matches option B.
🔧 Debug
advanced2:00remaining
Identify the error when using NumPy array with sklearn LinearRegression
What error will this code raise when fitting the model?
NumPy
import numpy as np from sklearn.linear_model import LinearRegression X = np.array([1, 2, 3, 4, 5]) y = np.array([2, 4, 6, 8, 10]) model = LinearRegression() model.fit(X, y)
Attempts:
2 left
💡 Hint
Check the shape of X passed to fit method.
✗ Incorrect
sklearn expects X to be 2D (samples, features). Passing 1D array causes ValueError.
❓ visualization
advanced2:00remaining
Interpret the plot of PCA components from NumPy data
Given the PCA plot below, which statement is true about the data?
NumPy
import numpy as np import matplotlib.pyplot as plt from sklearn.decomposition import PCA np.random.seed(0) data = np.dot(np.random.rand(2, 2), np.random.randn(2, 200)).T pca = PCA(n_components=2) components = pca.fit_transform(data) plt.scatter(components[:, 0], components[:, 1]) plt.xlabel('PC1') plt.ylabel('PC2') plt.title('PCA of Data') plt.show()
Attempts:
2 left
💡 Hint
PCA orders components by explained variance descending.
✗ Incorrect
PCA sorts components by variance explained. The first component always explains the most variance.
🧠 Conceptual
expert2:00remaining
Why use NumPy arrays with scikit-learn instead of Python lists?
Which is the main reason scikit-learn prefers NumPy arrays over Python lists for input data?
Attempts:
2 left
💡 Hint
Think about performance and data structure requirements.
✗ Incorrect
NumPy arrays are optimized for numerical computations and use less memory, which helps scikit-learn run faster.