Model Pipeline - Text feature basics (CountVectorizer, TF-IDF)
This pipeline converts text into numbers using CountVectorizer and TF-IDF. Then, it trains a simple model to classify text based on these features.
Jump into concepts and practice - no test required
This pipeline converts text into numbers using CountVectorizer and TF-IDF. Then, it trains a simple model to classify text based on these features.
Loss
0.7 | *
0.6 | *
0.5 | *
0.4 | *
0.3 | * *
+---------
1 2 3 4 Epoch| Epoch | Loss ↓ | Accuracy ↑ | Observation |
|---|---|---|---|
| 1 | 0.65 | 0.50 | Model starts with random guesses |
| 2 | 0.48 | 0.75 | Model learns basic word patterns |
| 3 | 0.35 | 0.85 | Model improves classification accuracy |
| 4 | 0.30 | 0.90 | Model converges with good accuracy |
CountVectorizer do in text processing?sentences = ["I love cats", "Cats love me"] vectorizer = CountVectorizer() X = vectorizer.fit_transform(sentences) print(X.shape)
from sklearn.feature_extraction.text import TfidfVectorizer texts = ["apple banana apple", "banana fruit"] tfidf = TfidfVectorizer() X = tfidf.fit_transform(texts) print(tfidf.get_feature_names())