What is Word embeddings concept (Word2Vec) in ML Python?

ML Pythonml~5 mins

Word embeddings concept (Word2Vec) in ML Python

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Introduction

Word embeddings turn words into numbers so computers can understand language better. Word2Vec is a popular way to do this by learning word meanings from lots of text.

When you want to find similar words, like 'king' and 'queen'.

When building chatbots that understand user questions.

When analyzing text to find topics or emotions.

When improving search engines to find related words.

When creating recommendation systems based on text data.

Syntax

ML Python

from gensim.models import Word2Vec

# sentences is a list of tokenized sentences
model = Word2Vec(sentences, vector_size=100, window=5, min_count=1, workers=4)

# Get vector for a word
vector = model.wv['word']

# Find similar words
similar = model.wv.most_similar('word')

sentences must be a list of lists, where each inner list is a sentence split into words.

vector_size controls the length of the word vectors (usually 50-300).

Examples

This creates a Word2Vec model with smaller vectors and a smaller context window, ignoring words that appear less than twice.

ML Python

model = Word2Vec(sentences, vector_size=50, window=3, min_count=2)

Gets the vector (list of numbers) that represents the word 'apple'.

ML Python

vector = model.wv['apple']

Finds the top 3 words most similar to 'king' based on learned word vectors.

ML Python

similar_words = model.wv.most_similar('king', topn=3)

Sample Model

This code trains a Word2Vec model on a few simple sentences. It then shows the vector for the word 'learning' and finds the top 3 words that are most similar to 'learning' based on the model.

ML Python

from gensim.models import Word2Vec

# Sample sentences
sentences = [
    ['I', 'love', 'machine', 'learning'],
    ['machine', 'learning', 'is', 'fun'],
    ['I', 'enjoy', 'learning', 'new', 'things'],
    ['deep', 'learning', 'is', 'a', 'part', 'of', 'machine', 'learning']
]

# Train Word2Vec model
model = Word2Vec(sentences, vector_size=50, window=2, min_count=1, workers=1, seed=42)

# Get vector for 'learning'
vector = model.wv['learning']

# Find words similar to 'learning'
similar_words = model.wv.most_similar('learning', topn=3)

print(f"Vector for 'learning' (first 5 values): {vector[:5]}")
print("Top 3 words similar to 'learning':")
for word, score in similar_words:
    print(f"{word}: {score:.3f}")

OutputSuccess

Important Notes

Word2Vec learns word meanings by looking at nearby words in sentences.

Words with similar meanings end up with similar vectors.

More training data usually means better word vectors.

Summary

Word2Vec converts words into numbers that capture meaning.

It learns from the context words appear in.

These vectors help computers understand and compare words.