0
0
Data Analysis Pythondata~20 mins

Why text data requires special handling in Data Analysis Python - Challenge Your Understanding

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Text Data Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Why is text data different from numeric data?

Text data often needs special handling in data science. Which reason below best explains why text data is different from numeric data?

AText data is stored in binary format, unlike numeric data which is stored as characters.
BText data can have many variations and meanings, making it hard to analyze directly like numbers.
CText data is always shorter than numeric data, so it needs compression.
DText data never contains missing values, unlike numeric data.
Attempts:
2 left
💡 Hint

Think about how words can have different forms and meanings compared to simple numbers.

Predict Output
intermediate
2:00remaining
Output of tokenizing text data

What is the output of this Python code that splits a sentence into words?

Data Analysis Python
sentence = "Data science is fun!"
tokens = sentence.split()
print(tokens)
A['Data science is fun!']
B['Data', 'science', 'is', 'fun', '!']
C['Data', 'science', 'is', 'fun']
D['Data', 'science', 'is', 'fun!']
Attempts:
2 left
💡 Hint

Remember that split() splits by spaces and keeps punctuation attached to words.

data_output
advanced
2:00remaining
Result of converting text to lowercase

What is the resulting list after converting all words in this list to lowercase?

Data Analysis Python
words = ['Python', 'Data', 'SCIENCE', 'Fun']
lower_words = [w.lower() for w in words]
print(lower_words)
A['python', 'data', 'science', 'fun']
B['pYTHON', 'dATA', 'sCIENCE', 'fUN']
C['PYTHON', 'DATA', 'SCIENCE', 'FUN']
D['Python', 'Data', 'SCIENCE', 'Fun']
Attempts:
2 left
💡 Hint

Think about what the lower() method does to each string.

🔧 Debug
advanced
2:00remaining
Identify the error in text preprocessing code

What error will this code raise when trying to remove punctuation from a text string?

Data Analysis Python
import string
text = "Hello, world!"
clean_text = text.replace(string.punctuation, '')
print(clean_text)
ANameError: name 'string' is not defined
BAttributeError: 'str' object has no attribute 'replace'
CTypeError: replace() argument 1 must be str, not 'string.punctuation'
DNo error, output: 'Hello, world!'
Attempts:
2 left
💡 Hint

Check what type string.punctuation is and what replace() expects.

🚀 Application
expert
3:00remaining
Choosing the right method to handle text data for sentiment analysis

You want to prepare customer reviews for sentiment analysis. Which approach below best handles the text data before feeding it into a machine learning model?

AConvert all text to lowercase, remove punctuation, tokenize words, and remove common stopwords.
BKeep the text as is, including uppercase letters and punctuation, to preserve original meaning.
CReplace all words with their length numbers to simplify the data.
DRemove all vowels from the text to reduce size.
Attempts:
2 left
💡 Hint

Think about common steps in text cleaning for machine learning.