Random forests build many decision trees and combine their results. Why does this help reduce overfitting compared to using just one tree?
Think about how averaging multiple guesses can make the final guess more stable.
Random forests reduce overfitting by averaging many trees trained on different random subsets of data and features. This averaging lowers variance and prevents the model from fitting noise.
What is the output of this code snippet that trains a random forest and prints feature importances?
from sklearn.datasets import load_iris from sklearn.ensemble import RandomForestClassifier iris = load_iris() X, y = iris.data, iris.target model = RandomForestClassifier(n_estimators=10, random_state=42) model.fit(X, y) importances = model.feature_importances_ print([round(i, 2) for i in importances])
Feature importances sum to 1 and reflect how useful each feature is for splitting.
The printed list shows the importance of each feature rounded to two decimals. The last two features have higher importance in this trained model.
When training a random forest, which hyperparameter decides how many features are randomly selected to consider for splitting at each node?
This parameter controls randomness in feature selection per split.
max_features controls how many features are randomly chosen at each split, adding randomness and diversity to trees.
What does the out-of-bag (OOB) error estimate in a random forest represent?
OOB samples are those left out when bootstrapping data for each tree.
OOB error is calculated using samples not included in the bootstrap sample for each tree, giving an unbiased estimate of model performance without needing a separate validation set.
Consider this code snippet:
from sklearn.ensemble import RandomForestClassifier model = RandomForestClassifier(n_estimators=100, max_features='auto') model.fit(X_train, y_train)
Why does this code raise a ValueError?
Check the allowed values for max_features in sklearn 1.2+.
In recent sklearn versions, 'auto' is deprecated for max_features in RandomForestClassifier. Use 'sqrt' instead.