Bird
Raised Fist0
NlpConceptBeginner · 3 min read

What is Bag of Words in NLP: Simple Explanation and Example

The Bag of Words model in NLP is a way to represent text by counting how many times each word appears, ignoring grammar and order. It turns text into a simple list of word counts that machines can understand for tasks like classification or search.
⚙️

How It Works

Imagine you have a basket and you throw in all the words from a sentence or document, but you don't care about the order or grammar—just the words themselves. This is what the Bag of Words model does: it treats text like a bag full of words.

Each unique word in your whole collection of texts becomes a feature, and for each text, you count how many times each word appears. This creates a simple number list that shows the presence and frequency of words.

Think of it like counting ingredients in a recipe without worrying about the order you add them. This makes it easy for computers to compare texts based on word counts.

💻

Example

This example shows how to create a Bag of Words representation from two simple sentences using Python's CountVectorizer from sklearn.

python
from sklearn.feature_extraction.text import CountVectorizer

texts = ["I love machine learning", "Machine learning is fun"]
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)

print("Feature names:", vectorizer.get_feature_names_out())
print("Bag of Words matrix:\n", X.toarray())
Output
Feature names: ['fun' 'is' 'learning' 'love' 'machine'] Bag of Words matrix: [[0 0 1 1 1] [1 1 1 0 1]]
🎯

When to Use

Bag of Words is useful when you want a simple way to turn text into numbers for machine learning models. It works well for tasks like spam detection, sentiment analysis, or topic classification where the order of words is less important than their presence.

However, it doesn't capture word order or meaning, so it's best for straightforward problems or as a starting point before using more advanced methods.

Key Points

  • Bag of Words counts word frequency ignoring order and grammar.
  • It converts text into a simple numeric format for machine learning.
  • Easy to implement and understand, good for basic text tasks.
  • Does not capture word meaning or context.

Key Takeaways

Bag of Words turns text into word count lists ignoring order and grammar.
It is simple and effective for basic text classification tasks.
Use it when word presence matters more than word order or meaning.
It is a good starting point before trying complex NLP models.