0
0
NLPml~12 mins

Word2Vec (CBOW and Skip-gram) in NLP - Model Pipeline Trace

Choose your learning style9 modes available
Model Pipeline - Word2Vec (CBOW and Skip-gram)

This pipeline trains a Word2Vec model to learn word meanings by looking at words around them. It uses two methods: CBOW predicts a word from its neighbors, and Skip-gram predicts neighbors from a word.

Data Flow - 5 Stages
1Raw Text Input
1000 sentences x variable lengthCollect sentences from text corpus1000 sentences x variable length
"The cat sat on the mat"
2Tokenization
1000 sentences x variable lengthSplit sentences into words (tokens)1000 sentences x variable length tokens
["The", "cat", "sat", "on", "the", "mat"]
3Context Window Creation
1000 sentences x variable length tokensCreate pairs of target and context words using window size 2Approx. 6000 word pairs (target, context)
Target: "sat", Context: ["The", "cat", "on", "the"]
4One-hot Encoding
6000 word pairsConvert words to one-hot vectors of vocabulary size 50006000 pairs of vectors (5000-dim each)
Target vector: [0,0,1,0,...], Context vectors: [[0,1,0,...], ...]
5Model Training (CBOW or Skip-gram)
6000 pairs of vectorsTrain neural network to predict target from context (CBOW) or context from target (Skip-gram)Trained word embeddings matrix (5000 words x 100 dims)
Embedding vector for "cat": [0.12, -0.05, ..., 0.33]
Training Trace - Epoch by Epoch
Loss
5.0 |****
4.0 |*** 
3.0 |**  
2.0 |*   
1.0 |*   
    +----
     1 5 Epochs
EpochLoss ↓Accuracy ↑Observation
14.50.15Initial loss high, accuracy low as model starts learning word relations
23.20.35Loss decreases, accuracy improves as embeddings start capturing context
32.10.55Model learns better word associations, accuracy rises
41.50.70Loss continues to drop, embeddings become more meaningful
51.10.80Training converges, good accuracy achieved
Prediction Trace - 5 Layers
Layer 1: Input Context Words (CBOW)
Layer 2: Embedding Layer
Layer 3: Average Embeddings
Layer 4: Output Layer (Softmax)
Layer 5: Prediction
Model Quiz - 3 Questions
Test your understanding
In the CBOW model, what is the input to the neural network?
AThe target word itself
BContext words around the target word
CRandom noise vectors
DThe entire sentence
Key Insight
Word2Vec learns word meanings by predicting words from their neighbors or vice versa. This helps capture relationships like synonyms or related concepts in a simple vector form.