0
0
NLPml~10 mins

Handling out-of-vocabulary words in NLP - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to replace unknown words with a special token.

NLP
def replace_oov(word, vocab):
    if word not in vocab:
        return [1]
    return word
Drag options to blanks, or click blank then click option'
A"<UNK>"
B"<PAD>"
C"<EOS>"
D"<SOS>"
Attempts:
3 left
💡 Hint
Common Mistakes
Using padding token '' instead of unknown token.
Using end-of-sequence token '' incorrectly.
2fill in blank
medium

Complete the code to convert words to indices, using the unknown token index for out-of-vocabulary words.

NLP
def word_to_index(word, word_index, unk_index):
    return word_index.get(word, [1])
Drag options to blanks, or click blank then click option'
Aunk_index
B0
C-1
DNone
Attempts:
3 left
💡 Hint
Common Mistakes
Returning 0 which might be padding index.
Returning None which causes errors in model input.
3fill in blank
hard

Fix the error in the code that handles out-of-vocabulary words by filling the blank.

NLP
def preprocess_sentence(sentence, vocab, unk_token):
    return [word if word in vocab else [1] for word in sentence]
Drag options to blanks, or click blank then click option'
Aword
Bunk_token
C"<PAD>"
Dvocab[word]
Attempts:
3 left
💡 Hint
Common Mistakes
Returning the original word even if it's not in vocab.
Using padding token instead of unknown token.
4fill in blank
hard

Fill both blanks to create a dictionary mapping words to indices, assigning the unknown token index for out-of-vocabulary words.

NLP
def create_index(sentence, vocab, unk_index):
    return {word: vocab.get(word, [1]) for word in sentence if word [2] vocab}
Drag options to blanks, or click blank then click option'
Aunk_index
Bin
Cnot in
D==
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'in' instead of 'not in' in the condition.
Using wrong index for unknown words.
5fill in blank
hard

Fill all three blanks to build a list of indices for a sentence, replacing out-of-vocabulary words with the unknown token index.

NLP
def sentence_to_indices(sentence, vocab, unk_index):
    return [vocab.get([1], [2]) for [3] in sentence]
Drag options to blanks, or click blank then click option'
Aword
Bunk_index
Dtoken
Attempts:
3 left
💡 Hint
Common Mistakes
Using a different variable name inconsistently.
Not providing the default index for missing words.