Imagine you have a sentence and you want to label each word with its role, like noun or verb. What is the main goal of part-of-speech tagging in this context?
Think about labeling words with their roles in a sentence.
Part-of-speech tagging assigns grammatical categories to each word, helping understand sentence structure.
What is the output of this Python code using NLTK for POS tagging?
import nltk nltk.download('averaged_perceptron_tagger') sentence = 'The quick brown fox jumps over the lazy dog'.split() tagged = nltk.pos_tag(sentence) print(tagged)
Look for common POS tags: DT (determiner), JJ (adjective), NN (noun), VBZ (verb, 3rd person singular present), IN (preposition).
The NLTK POS tagger assigns tags like DT for 'The', JJ for adjectives like 'quick', NN for nouns like 'fox', and VBZ for verbs like 'jumps'.
You want to build a part-of-speech tagger that works well on new sentences it has never seen before. Which model type is generally best for this task?
Think about models that learn patterns from lots of examples and generalize well.
Neural network models with word embeddings capture complex patterns and context, making them best for accurate POS tagging on new text.
After training a POS tagger, you want to evaluate how well it labels words correctly. Which metric is most appropriate?
Think about how many words got the right tag out of all words.
Overall accuracy measures the percentage of words correctly tagged, which is the standard metric for POS tagging.
Consider this code snippet using spaCy for POS tagging. Why does it raise an AttributeError?
import spacy nlp = spacy.load('en_core_web_sm') doc = nlp('I love machine learning') for token in doc: print(token.pos)
Check the spaCy documentation for token.pos and what it returns.
token.pos returns an integer enum code, not a string. To get the string tag, use token.pos_ (with underscore).