Challenge - 5 Problems

🎖️

Random Forest Mastery

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

1:30remaining

How does a random forest reduce overfitting compared to a single decision tree?

Random forests use many decision trees to make predictions. Which of the following best explains how this helps reduce overfitting?

ABy pruning each tree to a single node, it simplifies the model and prevents overfitting.

BBy using only one tree with the deepest splits, it captures all data patterns perfectly, avoiding overfitting.

CBy training all trees on the same data and features, it ensures consistent predictions and reduces overfitting.

DBy averaging predictions from many trees trained on different random subsets of data and features, it reduces variance and overfitting.

Attempts:

2 left

❓ Predict Output

intermediate

2:00remaining

Output of random forest prediction probabilities

What is the output of the following Python code using scikit-learn's RandomForestClassifier?

ML Python

from sklearn.ensemble import RandomForestClassifier
import numpy as np

X_train = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
y_train = np.array([0, 1, 0, 1])
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)

X_test = np.array([[2, 3]])
pred_probs = model.predict_proba(X_test)
print(pred_probs)

A[[0.5 0.5]]

B[[0. 1.]]

C[[1. 0.]]

D[[0.75 0.25]]

Attempts:

2 left

❓ Hyperparameter

advanced

1:30remaining

Effect of increasing n_estimators in RandomForestClassifier

What is the most likely effect of increasing the n_estimators parameter (number of trees) in a RandomForestClassifier?

AIt generally improves model stability and accuracy but increases training time.

BIt has no effect on model performance or training time.

CIt reduces training time by using fewer trees to make predictions.

DIt decreases model accuracy because too many trees cause overfitting.

Attempts:

2 left

❓ Metrics

advanced

2:00remaining

Choosing the best metric for imbalanced classification with Random Forest

You train a RandomForestClassifier on a dataset where 95% of samples belong to class 0 and 5% to class 1. Which metric is best to evaluate your model's performance?

AF1-score, because it balances precision and recall for imbalanced data.

BPrecision, because it measures how many predicted positives are correct.

CRecall, because it measures how many actual positives are found.

DAccuracy, because it shows overall correct predictions.

Attempts:

2 left

🔧 Debug

expert

2:00remaining

Debugging RandomForestClassifier training error

What error will this code raise when training a RandomForestClassifier, and why?

ML Python

from sklearn.ensemble import RandomForestClassifier

X_train = [[1, 2], [3, 4], [5, 6]]
y_train = [0, 1]
model = RandomForestClassifier()
model.fit(X_train, y_train)

AAttributeError: 'RandomForestClassifier' object has no attribute 'fit'

BTypeError: 'list' object is not callable

CValueError: Found input variables with inconsistent numbers of samples

DNo error, model trains successfully

Attempts:

2 left