What is the output of this code snippet using LabelEncoder from sklearn.preprocessing?
from sklearn.preprocessing import LabelEncoder le = LabelEncoder() categories = ['red', 'green', 'blue'] le.fit(categories) encoded = le.transform(['green', 'blue', 'yellow']) print(encoded)
from sklearn.preprocessing import LabelEncoder le = LabelEncoder() categories = ['red', 'green', 'blue'] le.fit(categories) encoded = le.transform(['green', 'blue', 'yellow']) print(encoded)
Think about what happens if you try to transform a category that the encoder did not learn.
The LabelEncoder only knows categories it was fit on. Trying to transform an unseen category like 'yellow' causes a ValueError.
Given this code, what is the printed output?
from sklearn.preprocessing import LabelEncoder le = LabelEncoder() colors = ['yellow', 'red', 'blue', 'red', 'yellow'] le.fit(colors) encoded = le.transform(colors) print(encoded)
from sklearn.preprocessing import LabelEncoder le = LabelEncoder() colors = ['yellow', 'red', 'blue', 'red', 'yellow'] le.fit(colors) encoded = le.transform(colors) print(encoded)
LabelEncoder assigns labels in alphabetical order.
The categories sorted alphabetically are ['blue', 'red', 'yellow'], so 'blue' → 0, 'red' → 1, 'yellow' → 2.
You have a DataFrame with a column 'Fruit' containing ['Apple', 'Banana', 'Apple', 'Cherry', 'Banana']. You apply label encoding to this column. Which plot best shows the encoded values distribution?
import pandas as pd from sklearn.preprocessing import LabelEncoder import matplotlib.pyplot as plt df = pd.DataFrame({'Fruit': ['Apple', 'Banana', 'Apple', 'Cherry', 'Banana']}) le = LabelEncoder() df['Fruit_encoded'] = le.fit_transform(df['Fruit']) plt.bar(le.classes_, df['Fruit_encoded'].value_counts().sort_index()) plt.xlabel('Fruit') plt.ylabel('Encoded Value Count') plt.title('Count of Encoded Fruit Labels') plt.show()
Think about how to show counts of each encoded label clearly.
A bar chart with fruit names on the x-axis and counts of their encoded labels on the y-axis clearly shows the distribution of encoded categories.
Which of the following is a key limitation of label encoding when used on categorical features for machine learning?
Think about what the numbers assigned by label encoding imply to some algorithms.
Label encoding assigns integer values to categories, which can make some algorithms think there is an order or ranking, even if none exists.
What error does this code raise?
from sklearn.preprocessing import LabelEncoder le = LabelEncoder() data = ['cat', 1, 'dog', 2] le.fit(data) encoded = le.transform(data) print(encoded)
from sklearn.preprocessing import LabelEncoder le = LabelEncoder() data = ['cat', 1, 'dog', 2] le.fit(data) encoded = le.transform(data) print(encoded)
Consider how LabelEncoder sorts categories internally.
LabelEncoder tries to sort the categories, but Python 3 cannot compare strings and integers, causing a TypeError.