Word2Vec helps computers understand words by turning them into numbers that keep their meaning. Training Word2Vec with Gensim lets you create these word numbers from your own text.
0
0
Training Word2Vec with Gensim in NLP
Introduction
You want to find similar words in a collection of documents.
You need to convert words into numbers for a machine learning model.
You want to explore relationships between words, like 'king' and 'queen'.
You have a custom text dataset and want to learn word meanings from it.
Syntax
NLP
from gensim.models import Word2Vec model = Word2Vec(sentences, vector_size=100, window=5, min_count=1, workers=4, epochs=10)
sentences is a list of tokenized sentences (list of lists of words).
vector_size sets the size of the word vectors (default 100).
Examples
Train Word2Vec on two simple sentences with smaller vector size and fewer epochs.
NLP
sentences = [['hello', 'world'], ['machine', 'learning', 'is', 'fun']] model = Word2Vec(sentences, vector_size=50, window=3, min_count=1, workers=2, epochs=5)
Ignore words that appear less than twice by setting
min_count=2.NLP
model = Word2Vec(sentences, vector_size=100, window=5, min_count=2, workers=4, epochs=10)
Sample Model
This code trains Word2Vec on a few example sentences, then shows the vector for the word 'learning' and the top 3 similar words.
NLP
from gensim.models import Word2Vec # Sample sentences sentences = [ ['I', 'love', 'machine', 'learning'], ['Word2Vec', 'creates', 'word', 'embeddings'], ['Gensim', 'makes', 'training', 'easy'], ['I', 'enjoy', 'learning', 'new', 'things'] ] # Train Word2Vec model model = Word2Vec(sentences, vector_size=50, window=3, min_count=1, workers=1, epochs=10) # Get vector for word 'learning' vector = model.wv['learning'] # Find most similar words to 'learning' similar_words = model.wv.most_similar('learning', topn=3) print(f"Vector for 'learning' (first 5 values): {vector[:5]}") print('Top 3 words similar to learning:', similar_words)
OutputSuccess
Important Notes
Make sure your sentences are tokenized (split into words) before training.
More epochs usually improve the quality but take longer to train.
Use model.wv.most_similar(word) to find words close in meaning.
Summary
Word2Vec turns words into numbers that keep their meaning.
Gensim makes training Word2Vec easy with simple code.
Try different settings like vector size and window to get better results.