Recall & Review
beginner
What is a Document-term matrix (DTM)?
A Document-term matrix is a table that shows how often each word appears in each document. Rows are documents, columns are words, and the cells have counts of word appearances.
Click to reveal answer
beginner
Why do we use a Document-term matrix in text analysis?
We use a Document-term matrix to turn text into numbers so computers can understand and analyze it, like finding patterns or training machine learning models.
Click to reveal answer
beginner
What does each row and column represent in a Document-term matrix?
Each row represents a document, and each column represents a unique word (term) from all documents.
Click to reveal answer
intermediate
How can the values in a Document-term matrix be weighted besides simple counts?
Values can be weighted using methods like TF-IDF, which gives more importance to words that are common in one document but rare across others.
Click to reveal answer
intermediate
What is a common problem with Document-term matrices and how is it handled?
DTMs can be very large and sparse (mostly zeros). We handle this by removing rare words, using dimensionality reduction, or applying sparse matrix storage.
Click to reveal answer
What does a cell value in a Document-term matrix usually represent?
✗ Incorrect
Each cell shows how many times a specific word appears in a specific document.
In a Document-term matrix, what do the rows represent?
✗ Incorrect
Rows correspond to individual documents in the collection.
Which technique can improve the usefulness of a Document-term matrix by weighting words?
✗ Incorrect
TF-IDF weights words based on their importance across documents.
What is a common issue with Document-term matrices?
✗ Incorrect
DTMs often have many zeros because most words do not appear in most documents.
How can we reduce the size of a Document-term matrix?
✗ Incorrect
Removing rare words and applying techniques like PCA help reduce matrix size.
Explain what a Document-term matrix is and why it is useful in text analysis.
Think about how text is turned into numbers for computers.
You got /5 concepts.
Describe common challenges with Document-term matrices and how to address them.
Consider what happens when many words appear rarely.
You got /5 concepts.