0
0
NLPml~5 mins

Why machines need numerical text representation in NLP

Choose your learning style9 modes available
Introduction

Machines understand numbers better than words. To teach machines with text, we must change words into numbers.

When building a chatbot that talks with people.
When sorting emails into spam or not spam.
When translating languages automatically.
When analyzing customer reviews to find feelings.
When searching for important information in documents.
Syntax
NLP
numeric_sequences = tokenizer.texts_to_sequences(texts)
numeric_data = vectorizer.fit_transform(texts)

Text must be converted to numbers before feeding into machine learning models.

Common methods include tokenizing words and turning them into sequences or vectors.

Examples
This example turns text into a matrix of word counts.
NLP
from sklearn.feature_extraction.text import CountVectorizer
texts = ['I love AI', 'AI loves me']
vectorizer = CountVectorizer()
numeric_data = vectorizer.fit_transform(texts).toarray()
print(numeric_data)
This example converts words to sequences of numbers based on word index.
NLP
from tensorflow.keras.preprocessing.text import Tokenizer
texts = ['Hello world', 'Hello AI']
tokenizer = Tokenizer()
tokenizer.fit_on_texts(texts)
numeric_sequences = tokenizer.texts_to_sequences(texts)
print(numeric_sequences)
Sample Model

This program shows how text is changed into numbers using word counts. The vocabulary shows which word matches which number.

NLP
from sklearn.feature_extraction.text import CountVectorizer

texts = ['I love machine learning', 'Machine learning loves me']

vectorizer = CountVectorizer()
numeric_data = vectorizer.fit_transform(texts).toarray()

print('Vocabulary:', vectorizer.vocabulary_)
print('Numeric representation:')
print(numeric_data)
OutputSuccess
Important Notes

Different methods of text to numbers capture different information.

Simple counts ignore word order but are easy to use.

More advanced methods keep word order or meaning but need more computing.

Summary

Machines need numbers, not words, to learn from text.

Text can be changed into numbers by counting words or assigning indexes.

This step is important before using text in machine learning models.