Challenge - 5 Problems

🎖️

Text Data Mastery

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

2:00remaining

Why is text data different from numeric data?

Text data often needs special handling in data science. Which reason below best explains why text data is different from numeric data?

AText data is stored in binary format, unlike numeric data which is stored as characters.

BText data can have many variations and meanings, making it hard to analyze directly like numbers.

CText data is always shorter than numeric data, so it needs compression.

DText data never contains missing values, unlike numeric data.

Attempts:

2 left

❓ Predict Output

intermediate

2:00remaining

Output of tokenizing text data

What is the output of this Python code that splits a sentence into words?

Data Analysis Python

sentence = "Data science is fun!"
tokens = sentence.split()
print(tokens)

A['Data science is fun!']

B['Data', 'science', 'is', 'fun', '!']

C['Data', 'science', 'is', 'fun']

D['Data', 'science', 'is', 'fun!']

Attempts:

2 left

❓ data_output

advanced

2:00remaining

Result of converting text to lowercase

What is the resulting list after converting all words in this list to lowercase?

Data Analysis Python

words = ['Python', 'Data', 'SCIENCE', 'Fun']
lower_words = [w.lower() for w in words]
print(lower_words)

A['python', 'data', 'science', 'fun']

B['pYTHON', 'dATA', 'sCIENCE', 'fUN']

C['PYTHON', 'DATA', 'SCIENCE', 'FUN']

D['Python', 'Data', 'SCIENCE', 'Fun']

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Identify the error in text preprocessing code

What error will this code raise when trying to remove punctuation from a text string?

Data Analysis Python

import string
text = "Hello, world!"
clean_text = text.replace(string.punctuation, '')
print(clean_text)

ANameError: name 'string' is not defined

BAttributeError: 'str' object has no attribute 'replace'

CTypeError: replace() argument 1 must be str, not 'string.punctuation'

DNo error, output: 'Hello, world!'

Attempts:

2 left

🚀 Application

expert

3:00remaining

Choosing the right method to handle text data for sentiment analysis

You want to prepare customer reviews for sentiment analysis. Which approach below best handles the text data before feeding it into a machine learning model?

AConvert all text to lowercase, remove punctuation, tokenize words, and remove common stopwords.

BKeep the text as is, including uppercase letters and punctuation, to preserve original meaning.

CReplace all words with their length numbers to simplify the data.

DRemove all vowels from the text to reduce size.

Attempts:

2 left