Word2Vec helps computers understand words by turning them into numbers based on their meaning. It learns which words appear together in sentences.
Word2Vec (CBOW and Skip-gram) in NLP
from gensim.models import Word2Vec model = Word2Vec(sentences, vector_size=100, window=5, min_count=1, sg=0) # sg=0 means CBOW, sg=1 means Skip-gram
sentences is a list of tokenized sentences (list of word lists).
vector_size sets the size of the word vectors (like how many numbers represent each word).
model = Word2Vec(sentences, vector_size=50, window=3, sg=0)
model = Word2Vec(sentences, vector_size=100, window=5, sg=1)
This code trains two Word2Vec models: one using CBOW and one using Skip-gram. It then shows the vector for the word 'machine' and finds words similar to 'machine' in both models.
from gensim.models import Word2Vec # Sample sentences sentences = [ ['I', 'love', 'machine', 'learning'], ['Word2Vec', 'helps', 'understand', 'words'], ['Skip', 'gram', 'and', 'CBOW', 'are', 'models'], ['Machine', 'learning', 'is', 'fun'] ] # Train CBOW model (sg=0) model_cbow = Word2Vec(sentences, vector_size=20, window=2, min_count=1, sg=0) # Train Skip-gram model (sg=1) model_sg = Word2Vec(sentences, vector_size=20, window=2, min_count=1, sg=1) # Get vector for word 'machine' vec_cbow = model_cbow.wv['machine'] vec_sg = model_sg.wv['machine'] # Find most similar words to 'machine' in CBOW similar_cbow = model_cbow.wv.most_similar('machine') # Find most similar words to 'machine' in Skip-gram similar_sg = model_sg.wv.most_similar('machine') print('CBOW vector for machine:', vec_cbow) print('Skip-gram vector for machine:', vec_sg) print('CBOW most similar to machine:', similar_cbow) print('Skip-gram most similar to machine:', similar_sg)
CBOW predicts a word from its surrounding words, so it works well with frequent words.
Skip-gram predicts surrounding words from a given word, so it works better with rare words.
Word vectors are lists of numbers that capture word meaning based on context.
Word2Vec turns words into numbers that show their meaning.
CBOW and Skip-gram are two ways Word2Vec learns word meanings.
Use Word2Vec to find similar words or prepare text for machine learning.