One-hot encoding turns words into simple lists of zeros and ones. This helps computers understand text by showing which words appear.
0
0
One-hot encoding for text in NLP
Introduction
When you want to convert words into numbers for a machine learning model.
When you have a small set of words and want a simple way to represent them.
When you need to prepare text data for basic classification tasks.
When you want to check if certain words appear in a sentence.
When you want to compare texts by their word presence.
Syntax
NLP
from sklearn.preprocessing import OneHotEncoder # Create encoder encoder = OneHotEncoder(sparse_output=False) # Fit and transform text data encoded = encoder.fit_transform(text_data)
Input text data must be reshaped to 2D array like [['word1'], ['word2'], ...]
Output is a 2D array where each row is a one-hot vector for a word.
Examples
This example encodes a list of words. Each unique word gets a position with 1, others 0.
NLP
from sklearn.preprocessing import OneHotEncoder words = [['cat'], ['dog'], ['cat'], ['bird']] encoder = OneHotEncoder(sparse_output=False) encoded = encoder.fit_transform(words) print(encoded)
Shows the unique words found by the encoder.
NLP
from sklearn.preprocessing import OneHotEncoder words = [['apple'], ['banana'], ['apple'], ['orange']] encoder = OneHotEncoder(sparse_output=False) encoded = encoder.fit_transform(words) print(encoder.categories_)
Sample Model
This program shows how to convert a list of words into one-hot vectors. It prints the unique words and their encoded forms.
NLP
from sklearn.preprocessing import OneHotEncoder # List of words to encode words = [['hello'], ['world'], ['hello'], ['machine'], ['learning']] # Create the encoder encoder = OneHotEncoder(sparse_output=False) # Fit and transform the words encoded_words = encoder.fit_transform(words) # Print the unique words found print('Unique words:', encoder.categories_) # Print the one-hot encoded vectors print('One-hot encoded vectors:') for word, vector in zip(words, encoded_words): print(f'{word[0]}: {vector}')
OutputSuccess
Important Notes
One-hot encoding creates very sparse vectors if you have many unique words.
For large text data, other methods like word embeddings are more efficient.
Always reshape your text data to 2D before using OneHotEncoder.
Summary
One-hot encoding turns words into simple vectors with one '1' and rest '0's.
It helps machine learning models understand which words appear.
Best for small sets of words or simple text tasks.