Word embeddings turn words into numbers so computers can understand language better. Word2Vec is a popular way to do this by learning word meanings from lots of text.
0
0
Word embeddings concept (Word2Vec) in ML Python
Introduction
When you want to find similar words, like 'king' and 'queen'.
When building chatbots that understand user questions.
When analyzing text to find topics or emotions.
When improving search engines to find related words.
When creating recommendation systems based on text data.
Syntax
ML Python
from gensim.models import Word2Vec # sentences is a list of tokenized sentences model = Word2Vec(sentences, vector_size=100, window=5, min_count=1, workers=4) # Get vector for a word vector = model.wv['word'] # Find similar words similar = model.wv.most_similar('word')
sentences must be a list of lists, where each inner list is a sentence split into words.
vector_size controls the length of the word vectors (usually 50-300).
Examples
This creates a Word2Vec model with smaller vectors and a smaller context window, ignoring words that appear less than twice.
ML Python
model = Word2Vec(sentences, vector_size=50, window=3, min_count=2)
Gets the vector (list of numbers) that represents the word 'apple'.
ML Python
vector = model.wv['apple']Finds the top 3 words most similar to 'king' based on learned word vectors.
ML Python
similar_words = model.wv.most_similar('king', topn=3)
Sample Model
This code trains a Word2Vec model on a few simple sentences. It then shows the vector for the word 'learning' and finds the top 3 words that are most similar to 'learning' based on the model.
ML Python
from gensim.models import Word2Vec # Sample sentences sentences = [ ['I', 'love', 'machine', 'learning'], ['machine', 'learning', 'is', 'fun'], ['I', 'enjoy', 'learning', 'new', 'things'], ['deep', 'learning', 'is', 'a', 'part', 'of', 'machine', 'learning'] ] # Train Word2Vec model model = Word2Vec(sentences, vector_size=50, window=2, min_count=1, workers=1, seed=42) # Get vector for 'learning' vector = model.wv['learning'] # Find words similar to 'learning' similar_words = model.wv.most_similar('learning', topn=3) print(f"Vector for 'learning' (first 5 values): {vector[:5]}") print("Top 3 words similar to 'learning':") for word, score in similar_words: print(f"{word}: {score:.3f}")
OutputSuccess
Important Notes
Word2Vec learns word meanings by looking at nearby words in sentences.
Words with similar meanings end up with similar vectors.
More training data usually means better word vectors.
Summary
Word2Vec converts words into numbers that capture meaning.
It learns from the context words appear in.
These vectors help computers understand and compare words.