0
0
NLPml~5 mins

Word2Vec (CBOW and Skip-gram) in NLP

Choose your learning style9 modes available
Introduction

Word2Vec helps computers understand words by turning them into numbers based on their meaning. It learns which words appear together in sentences.

When you want to find similar words, like 'king' and 'queen'.
When you need to turn words into numbers for machine learning.
When you want to understand the meaning of words in a text.
When building chatbots that understand language better.
When grouping or clustering words by their meaning.
Syntax
NLP
from gensim.models import Word2Vec

model = Word2Vec(sentences, vector_size=100, window=5, min_count=1, sg=0)

# sg=0 means CBOW, sg=1 means Skip-gram

sentences is a list of tokenized sentences (list of word lists).

vector_size sets the size of the word vectors (like how many numbers represent each word).

Examples
This creates a CBOW model with smaller vectors and a smaller window size.
NLP
model = Word2Vec(sentences, vector_size=50, window=3, sg=0)
This creates a Skip-gram model with bigger vectors and a bigger window size.
NLP
model = Word2Vec(sentences, vector_size=100, window=5, sg=1)
Sample Model

This code trains two Word2Vec models: one using CBOW and one using Skip-gram. It then shows the vector for the word 'machine' and finds words similar to 'machine' in both models.

NLP
from gensim.models import Word2Vec

# Sample sentences
sentences = [
    ['I', 'love', 'machine', 'learning'],
    ['Word2Vec', 'helps', 'understand', 'words'],
    ['Skip', 'gram', 'and', 'CBOW', 'are', 'models'],
    ['Machine', 'learning', 'is', 'fun']
]

# Train CBOW model (sg=0)
model_cbow = Word2Vec(sentences, vector_size=20, window=2, min_count=1, sg=0)

# Train Skip-gram model (sg=1)
model_sg = Word2Vec(sentences, vector_size=20, window=2, min_count=1, sg=1)

# Get vector for word 'machine'
vec_cbow = model_cbow.wv['machine']
vec_sg = model_sg.wv['machine']

# Find most similar words to 'machine' in CBOW
similar_cbow = model_cbow.wv.most_similar('machine')

# Find most similar words to 'machine' in Skip-gram
similar_sg = model_sg.wv.most_similar('machine')

print('CBOW vector for machine:', vec_cbow)
print('Skip-gram vector for machine:', vec_sg)
print('CBOW most similar to machine:', similar_cbow)
print('Skip-gram most similar to machine:', similar_sg)
OutputSuccess
Important Notes

CBOW predicts a word from its surrounding words, so it works well with frequent words.

Skip-gram predicts surrounding words from a given word, so it works better with rare words.

Word vectors are lists of numbers that capture word meaning based on context.

Summary

Word2Vec turns words into numbers that show their meaning.

CBOW and Skip-gram are two ways Word2Vec learns word meanings.

Use Word2Vec to find similar words or prepare text for machine learning.