Which of the following best describes a hybrid approach in Natural Language Processing (NLP)?
Think about mixing different techniques to get better results.
Hybrid approaches combine rule-based and machine learning methods to leverage the strengths of both, improving NLP tasks.
You want to build a sentiment analysis system that uses both a lexicon-based method and a machine learning classifier. Which combination below fits a hybrid approach?
Look for a mix of dictionary and machine learning.
Combining a lexicon-based scoring with a machine learning classifier is a classic hybrid approach for sentiment analysis.
What is the output of the following Python code that combines TF-IDF features with a rule-based keyword count for classification?
from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.linear_model import LogisticRegression import numpy as np texts = ["I love sunny days", "I hate rain", "Sunny weather is great", "Rainy days are gloomy"] labels = [1, 0, 1, 0] # Rule-based feature: count of positive words positive_words = {'love', 'sunny', 'great'} rule_features = np.array([[sum(word in positive_words for word in text.lower().split())] for text in texts]) # TF-IDF features vectorizer = TfidfVectorizer() tfidf_features = vectorizer.fit_transform(texts).toarray() # Combine features X = np.hstack((tfidf_features, rule_features)) model = LogisticRegression().fit(X, labels) predictions = model.predict(X) print(predictions.tolist())
Check how the rule-based feature and TF-IDF features help the logistic regression model.
The model learns from both TF-IDF and the positive word count, correctly predicting the original labels.
In a hybrid NLP model combining a rule-based sentiment score and a neural network, which hyperparameter adjustment is most likely to improve the balance between the two components?
Think about how to control the influence of each part in the combined output.
Adjusting the weight of the rule-based score helps balance its contribution with the neural network's output.
Consider this hybrid NLP pipeline code snippet that combines a rule-based feature with a machine learning model. It raises a ValueError: shapes (4,5) and (4,1) not aligned. What is the cause?
import numpy as np from sklearn.linear_model import LogisticRegression texts = ["happy day", "sad night", "joyful morning", "gloomy evening"] labels = [1, 0, 1, 0] # Rule-based feature: count of positive words positive_words = {'happy', 'joyful'} rule_features = np.array([[sum(word in positive_words for word in text.split())] for text in texts]) # Dummy TF-IDF features with wrong shape tfidf_features = np.random.rand(4, 5) # Incorrect feature combination X = np.dot(tfidf_features, rule_features) model = LogisticRegression().fit(X, labels)
Check how features are combined and their shapes.
np.dot requires matching inner dimensions; here (4,5) and (4,1) do not align, causing the ValueError.