Embeddings turn words into numbers so computers can understand their meaning. They group similar words close together, showing their related ideas.
Why embeddings capture semantic meaning in NLP
embedding = Embedding(input_dim, output_dim) vector = embedding(word_index)
input_dim is the size of your vocabulary (number of unique words).
output_dim is the size of the vector that represents each word.
embedding = Embedding(10000, 50) vector = embedding(42)
embedding = Embedding(5000, 100) vector = embedding(7)
This code shows how embeddings represent words as vectors. It calculates similarity between related words. 'cat' and 'dog' are animals, so their vectors are closer. 'apple' and 'orange' are fruits, so their vectors are also close.
import numpy as np # Simple example of word embeddings using random vectors vocab = ['cat', 'dog', 'apple', 'orange'] # Assign random 3D vectors to each word embeddings = { 'cat': np.array([0.9, 0.1, 0.3]), 'dog': np.array([0.8, 0.2, 0.4]), 'apple': np.array([0.1, 0.9, 0.7]), 'orange': np.array([0.2, 0.8, 0.6]) } # Function to find similarity (cosine similarity) def cosine_similarity(vec1, vec2): return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2)) # Compare similarity between 'cat' and 'dog' sim_cat_dog = cosine_similarity(embeddings['cat'], embeddings['dog']) # Compare similarity between 'apple' and 'orange' sim_apple_orange = cosine_similarity(embeddings['apple'], embeddings['orange']) print(f"Similarity between 'cat' and 'dog': {sim_cat_dog:.2f}") print(f"Similarity between 'apple' and 'orange': {sim_apple_orange:.2f}")
Embeddings capture meaning because similar words appear in similar contexts, so their vectors become close.
Training embeddings on lots of text helps the model learn these relationships automatically.
Cosine similarity is a common way to measure how close two word vectors are.
Embeddings turn words into numbers that show their meaning.
Words with similar meanings have vectors close together.
This helps computers understand language better.