Jump into concepts and practice - no test required
or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Recall & Review
beginner
What is a random forest in machine learning?
A random forest is a group of decision trees working together. Each tree makes a prediction, and the forest picks the most common answer. This helps make better and more stable predictions.
Click to reveal answer
beginner
Why does random forest use many decision trees instead of one?
Using many trees reduces mistakes from any single tree. It lowers errors by averaging many opinions, making the final prediction more accurate and less likely to be wrong.
Click to reveal answer
intermediate
What is 'bagging' in the context of random forests?
Bagging means making many trees from different random samples of the data. Each tree sees a slightly different set of data, which helps the forest learn better and avoid overfitting.
Click to reveal answer
intermediate
How does random forest select features when splitting nodes?
At each split, random forest picks a random small group of features and chooses the best split only from them. This randomness helps trees be different and improves the forest's overall strength.
Click to reveal answer
beginner
What metrics can we use to check a random forest's performance?
We can use accuracy, precision, recall, F1 score for classification tasks, and mean squared error or R-squared for regression tasks. These metrics tell us how well the forest predicts.
Click to reveal answer
What does each tree in a random forest use to make splits?
AA random subset of features
BAll features every time
COnly the most important feature
DFeatures selected by the user
✗ Incorrect
Random forest selects a random subset of features at each split to create diverse trees.
What is the main benefit of using many trees in a random forest?
ATo use more memory
BTo make the model slower
CTo confuse the user
DTo reduce overfitting and improve accuracy
✗ Incorrect
Many trees help reduce overfitting and improve the model's accuracy by averaging predictions.
What is 'bagging' short for in random forests?
ABasic aggregation
BBagging groceries
CBootstrap aggregating
DBinary aggregation
✗ Incorrect
Bagging stands for bootstrap aggregating, which means training trees on random samples with replacement.
Which metric is NOT typically used to evaluate a random forest classifier?
AMean squared error
BAccuracy
CPrecision
DRecall
✗ Incorrect
Mean squared error is used for regression, not classification.
How does random forest help prevent overfitting?
ABy using only one tree
BBy averaging many trees built on random data and features
CBy ignoring data points
DBy using all features at every split
✗ Incorrect
Random forest averages many trees trained on random samples and features, reducing overfitting.
Explain how random forest builds its model and why it is more reliable than a single decision tree.
Think about how many opinions together make a better decision.
You got /5 concepts.
Describe the role of randomness in random forest and how it improves model performance.
Randomness helps trees learn different things.
You got /5 concepts.
Practice
(1/5)
1. What is the main advantage of using a random forest over a single decision tree?
easy
A. It reduces overfitting by averaging multiple trees.
B. It always runs faster than a single tree.
C. It requires less data to train.
D. It uses only one feature for splitting.
Solution
Step 1: Understand decision tree limitations
A single decision tree can easily overfit, meaning it learns noise and performs poorly on new data.
Step 2: How random forest improves
Random forest builds many trees on random subsets of data and features, then averages their results to reduce overfitting.
Final Answer:
It reduces overfitting by averaging multiple trees. -> Option A
Quick Check:
Random forest reduces overfitting = B [OK]
Hint: Random forest averages trees to avoid overfitting [OK]
Common Mistakes:
Thinking random forest is always faster than one tree
Believing it uses fewer data than a single tree
Assuming it splits on only one feature
2. Which of the following is the correct way to create a random forest classifier in Python using scikit-learn?
easy
A. from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(n_estimators=100)
B. from sklearn.tree import RandomForest
model = RandomForest(100)
C. import randomforest
model = randomforest.RandomForestClassifier(100)
D. from sklearn.ensemble import RandomForest
model = RandomForest(n_trees=100)
Solution
Step 1: Identify correct import
The random forest classifier is in sklearn.ensemble as RandomForestClassifier.
Step 2: Check constructor usage
We create it by calling RandomForestClassifier with n_estimators=100 to set number of trees.
Final Answer:
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(n_estimators=100) -> Option A
Quick Check:
Correct import and parameter = A [OK]
Hint: Use sklearn.ensemble.RandomForestClassifier with n_estimators [OK]
Common Mistakes:
Importing from sklearn.tree instead of sklearn.ensemble
Using wrong class names like RandomForest
Passing wrong parameter names like n_trees
3. Consider this Python code using scikit-learn's random forest:
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(n_estimators=3, max_depth=2, random_state=42)
X = [[0, 0], [1, 1], [0, 1], [1, 0]]
y = [0, 1, 1, 0]
model.fit(X, y)
preds = model.predict([[0, 0], [1, 1]])
print(list(preds))
What is the output?
medium
A. [0, 0]
B. [0, 1]
C. [1, 0]
D. [1, 1]
Solution
Step 1: Understand training data and labels
Input points [0,0] and [1,1] have labels 0 and 1 respectively.
Step 2: Predict on same points with trained model
Random forest with 3 trees and max depth 2 will learn simple splits and predict correctly on these points.
Final Answer:
[0, 1] -> Option B
Quick Check:
Predictions match training labels = C [OK]
Hint: Predictions on training points usually match labels [OK]
Common Mistakes:
Confusing input order and labels
Assuming random forest predicts opposite labels
Ignoring max_depth effect
4. You wrote this code but get an error:
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(n_estimators='100')
model.fit(X_train, y_train)
What is the problem?
medium
A. fit method requires extra parameters.
B. RandomForestClassifier does not have n_estimators parameter.
C. n_estimators should be an integer, not a string.
D. You must import RandomForestRegressor instead.
Solution
Step 1: Check parameter type for n_estimators
n_estimators expects an integer number of trees, not a string.
Step 2: Identify error cause
Passing '100' as a string causes a type error during model creation or training.
Final Answer:
n_estimators should be an integer, not a string. -> Option C
Quick Check:
Parameter type mismatch = A [OK]
Hint: Use integer for n_estimators, not string [OK]
Common Mistakes:
Passing numbers as strings in parameters
Confusing classifier and regressor classes
Thinking fit needs extra arguments
5. You want to improve your random forest model's accuracy on a complex dataset. Which combination of hyperparameters is best to try first?
hard
A. Set max_depth to 1 and keep n_estimators low
B. Decrease n_estimators and decrease max_depth
C. Increase max_features to total features and decrease n_estimators
D. Increase n_estimators and increase max_depth
Solution
Step 1: Understand effect of n_estimators
More trees (higher n_estimators) usually improve accuracy by reducing variance.
Step 2: Understand effect of max_depth
Increasing max_depth allows trees to learn more complex patterns, improving accuracy on complex data.
Final Answer:
Increase n_estimators and increase max_depth -> Option D
Quick Check:
More trees + deeper trees = better accuracy [OK]
Hint: More trees and deeper trees usually improve accuracy [OK]
Common Mistakes:
Reducing trees and depth lowers accuracy
Setting max_depth too low causes underfitting
Increasing max_features too much can cause overfitting