Challenge - 5 Problems

🎖️

Label Encoding Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

Output of label encoding with unseen category

What is the output of this code snippet using LabelEncoder from sklearn.preprocessing?

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
categories = ['red', 'green', 'blue']
le.fit(categories)
encoded = le.transform(['green', 'blue', 'yellow'])
print(encoded)

Data Analysis Python

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
categories = ['red', 'green', 'blue']
le.fit(categories)
encoded = le.transform(['green', 'blue', 'yellow'])
print(encoded)

A[1 2 3]

BRaises a ValueError because 'yellow' was not seen during fit

C[0 1 2]

D[1 2 0]

Attempts:

2 left

❓ data_output

intermediate

2:00remaining

Resulting encoded array from label encoding

Given this code, what is the printed output?

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
colors = ['yellow', 'red', 'blue', 'red', 'yellow']
le.fit(colors)
encoded = le.transform(colors)
print(encoded)

Data Analysis Python

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
colors = ['yellow', 'red', 'blue', 'red', 'yellow']
le.fit(colors)
encoded = le.transform(colors)
print(encoded)

A[0 2 1 2 0]

B[0 1 2 1 0]

C[1 2 0 2 1]

D[2 1 0 1 2]

Attempts:

2 left

❓ visualization

advanced

3:00remaining

Visualizing label encoding effect on categorical data

You have a DataFrame with a column 'Fruit' containing ['Apple', 'Banana', 'Apple', 'Cherry', 'Banana']. You apply label encoding to this column. Which plot best shows the encoded values distribution?

Data Analysis Python

import pandas as pd
from sklearn.preprocessing import LabelEncoder
import matplotlib.pyplot as plt

df = pd.DataFrame({'Fruit': ['Apple', 'Banana', 'Apple', 'Cherry', 'Banana']})
le = LabelEncoder()
df['Fruit_encoded'] = le.fit_transform(df['Fruit'])

plt.bar(le.classes_, df['Fruit_encoded'].value_counts().sort_index())
plt.xlabel('Fruit')
plt.ylabel('Encoded Value Count')
plt.title('Count of Encoded Fruit Labels')
plt.show()

AScatter plot of original fruit names vs encoded values

BLine plot showing encoded values over index

CBar chart with fruits on x-axis and counts of encoded labels on y-axis

DPie chart of encoded label counts

Attempts:

2 left

🧠 Conceptual

advanced

1:30remaining

Understanding label encoding limitations

Which of the following is a key limitation of label encoding when used on categorical features for machine learning?

AIt introduces an unintended ordinal relationship between categories

BIt cannot handle numerical data

CIt always increases the dimensionality of the dataset

DIt requires categories to be sorted alphabetically

Attempts:

2 left

🔧 Debug

expert

2:30remaining

Debugging label encoding with mixed data types

What error does this code raise?

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
data = ['cat', 1, 'dog', 2]
le.fit(data)
encoded = le.transform(data)
print(encoded)

Data Analysis Python

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
data = ['cat', 1, 'dog', 2]
le.fit(data)
encoded = le.transform(data)
print(encoded)

ATypeError: unorderable types: int() < str()

BValueError: y contains previously unseen labels

CNo error, prints encoded array

DAttributeError: 'int' object has no attribute 'lower'

Attempts:

2 left