beginner

Why is text data different from numerical data in analysis?

Text data is unstructured and made of words, not numbers. It needs special steps to turn it into numbers before analysis.

Click to reveal answer

beginner

What does 'tokenization' mean in text data processing?

Tokenization means breaking text into small pieces like words or sentences to analyze them easily.

Click to reveal answer

beginner

Why do we remove stop words in text analysis?

Stop words like 'the' or 'and' appear a lot but add little meaning, so removing them helps focus on important words.

Click to reveal answer

intermediate

What is 'vectorization' in handling text data?

Vectorization converts text into numbers (vectors) so computers can understand and analyze it.

Click to reveal answer

intermediate

Why is handling text data more complex than numbers?

Text has many forms, meanings, and context. It needs cleaning, transforming, and understanding before analysis.

Click to reveal answer

What is the first step in preparing text data for analysis?

AVectorization

BNormalization

CModel training

DTokenization

Why do we convert text into vectors?

ATo allow computers to process text mathematically

BTo make text readable by humans

CTo store text in databases

DTo translate text into other languages

Which of these is NOT a reason text data needs special handling?

AText is unstructured

BText contains many languages

CText is always numeric

DText has context and meaning

What are stop words?

ACommon words with little meaning

BImportant keywords

CMisspelled words

DNumbers in text

Which process helps reduce different forms of a word to a base form?

ATokenization

BStemming or Lemmatization

CVectorization

DNormalization

Explain why text data requires special handling before analysis.

Describe the main steps involved in preparing text data for analysis.