0
0
NLPml~10 mins

Unicode handling in NLP - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to read a Unicode text file correctly.

NLP
with open('text.txt', encoding=[1]) as f:
    content = f.read()
Drag options to blanks, or click blank then click option'
Aascii
Butf-8
Clatin-1
Dutf-16
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'ascii' which cannot handle many Unicode characters.
2fill in blank
medium

Complete the code to normalize Unicode text to NFC form.

NLP
import unicodedata
normalized_text = unicodedata.normalize([1], text)
Drag options to blanks, or click blank then click option'
A'NFD'
B'NFKC'
C'NFC'
D'NFKD'
Attempts:
3 left
💡 Hint
Common Mistakes
Using NFD which decomposes characters instead of composing them.
3fill in blank
hard

Fix the error in decoding bytes to a Unicode string.

NLP
byte_data = b'caf\xc3\xa9'
text = byte_data.[1]('utf-8')
Drag options to blanks, or click blank then click option'
Adecode
Btransform
Cencode
Dconvert
Attempts:
3 left
💡 Hint
Common Mistakes
Using encode on bytes which causes an error.
4fill in blank
hard

Fill both blanks to create a dictionary of word lengths for words longer than 3 characters.

NLP
word_lengths = {word: [1] for word in words if len(word) [2] 3}
Drag options to blanks, or click blank then click option'
Alen(word)
B>
C<
Dword
Attempts:
3 left
💡 Hint
Common Mistakes
Using word instead of len(word) for the dictionary value.
Using '<' instead of '>' in the condition.
5fill in blank
hard

Fill all three blanks to filter and transform a dictionary with Unicode keys and values.

NLP
filtered = {{ [1]: [2] for k, v in data.items() if v [3] 0 }}
Drag options to blanks, or click blank then click option'
Ak.upper()
Bv
C>
Dk.lower()
Attempts:
3 left
💡 Hint
Common Mistakes
Using k.lower() instead of k.upper().
Using '<' instead of '>' in the condition.