Model Pipeline - Why text classification categorizes documents
Text classification is a process where a computer learns to sort documents into groups based on their content. It helps organize information so we can find or use it easily.
Jump into concepts and practice - no test required
Text classification is a process where a computer learns to sort documents into groups based on their content. It helps organize information so we can find or use it easily.
Loss
0.9 |****
0.8 |***
0.7 |**
0.6 |**
0.5 |*
0.4 |*
0.3 |
1 2 3 4 5 Epochs
| Epoch | Loss ↓ | Accuracy ↑ | Observation |
|---|---|---|---|
| 1 | 0.85 | 0.60 | Model starts learning, accuracy is low |
| 2 | 0.65 | 0.72 | Loss decreases, accuracy improves |
| 3 | 0.50 | 0.80 | Model learns important patterns |
| 4 | 0.40 | 0.85 | Good balance of learning, accuracy rising |
| 5 | 0.35 | 0.87 | Model converges, small improvements |
from sklearn.feature_extraction.text import CountVectorizer from sklearn.naive_bayes import MultinomialNB texts = ['I love cats', 'I hate rain', 'Cats are great', 'Rain is bad'] labels = ['positive', 'negative', 'positive', 'negative'] vectorizer = CountVectorizer() X = vectorizer.fit_transform(texts) model = MultinomialNB() model.fit(X, labels) new_text = ['I love rain'] X_new = vectorizer.transform(new_text) prediction = model.predict(X_new) print(prediction[0])
from sklearn.feature_extraction.text import CountVectorizer from sklearn.naive_bayes import MultinomialNB texts = ['happy day', 'sad night'] labels = ['positive', 'negative'] vectorizer = CountVectorizer() X = vectorizer.fit_transform(texts) model = MultinomialNB() model.fit(texts, labels) # Error here new_text = ['happy night'] X_new = vectorizer.transform(new_text) prediction = model.predict(X_new) print(prediction[0])