Practice - 5 Tasks

Answer the questions below

1fill in blank

easy

Complete the code to tokenize the input text into words.

NLP

tokens = text.[1]()

Drag options to blanks, or click blank then click option'

Asplit

Bjoin

Creplace

Dstrip

Attempts:

3 left

💡 Hint

Common Mistakes

Using join() instead of split()

Using replace() which changes characters but does not split

Using strip() which only removes whitespace from ends

✗ Incorrect

The split() method breaks the text into words by spaces, which is the basic step in information retrieval.

2fill in blank

medium

Complete the code to count the frequency of each word in the list.

NLP

from collections import [1]
word_counts = [1](words)

Drag options to blanks, or click blank then click option'

Adeque

BCounter

COrderedDict

Ddefaultdict

Attempts:

3 left

💡 Hint

Common Mistakes

Using defaultdict which needs manual counting

Using OrderedDict which keeps order but doesn't count

Using deque which is for queues, not counting

✗ Incorrect

Counter is a special dictionary to count hashable objects, perfect for word frequency.

3fill in blank

hard

Fix the error in the code to compute the term frequency (TF) for a word.

NLP

tf = word_counts[[1]] / sum(word_counts.values())

Drag options to blanks, or click blank then click option'

Aword_counts

Bword

Cwords

D'word'

Attempts:

3 left

💡 Hint

Common Mistakes

Using variable word without quotes causes a NameError

Using words or word_counts which are not keys

✗ Incorrect

The key to access the count must be a string representing the word, so it needs quotes.

4fill in blank

hard

Fill both blanks to create a dictionary of words with frequency greater than 1.

NLP

freq_words = {word: count for word, count in word_counts.items() if count [1] [2]

Drag options to blanks, or click blank then click option'

C>=

Attempts:

3 left

💡 Hint

Common Mistakes

Using >= 0 includes all words

Using > 0 includes words with count 1

Using wrong operators like < or ==

✗ Incorrect

The condition filters words with count greater than 1, so use > and 1.

5fill in blank

hard

Fill all three blanks to compute inverse document frequency (IDF) for a word.

NLP

import math
idf = math.log([1] / (1 + [2][[3]]))

Drag options to blanks, or click blank then click option'

Atotal_docs

Bdoc_freq

C'word'

Dword_counts

Attempts:

3 left

💡 Hint

Common Mistakes

Using word_counts instead of doc_freq for document frequency

Not quoting the word key

Forgetting to add 1 in denominator

✗ Incorrect

IDF is log of total documents divided by (1 + document frequency of the word). The word key must be a string.

Practice

(1/5)

1. What is the main goal of information retrieval in natural language processing?

easy

A. To translate text from one language to another

B. To find relevant documents based on a user's query

C. To generate new text automatically

D. To summarize long documents into short ones

Solution

Step 1: Understand the purpose of information retrieval
Information retrieval is about searching and finding documents that match a user's query.
Step 2: Compare with other NLP tasks
Translation, text generation, and summarization are different tasks unrelated to searching documents.
Final Answer:
To find relevant documents based on a user's query -> Option B
Quick Check:
Information retrieval = finding relevant documents [OK]

Hint: Remember: retrieval means finding, not creating [OK]

Common Mistakes:

Confusing retrieval with translation
Thinking retrieval generates new text
Mixing retrieval with summarization

2. Which of the following Python code snippets correctly checks if the word 'apple' is in a document string doc (case-insensitive)?

easy

A. if 'Apple' == doc:

B. if doc.contains('apple'):

C. if 'apple' in doc.lower():

D. if doc.find('apple') == -1:

Solution

Step 1: Understand case-insensitive search
To ignore case, convert the document to lowercase and check if 'apple' is in it.
Step 2: Analyze each option
if 'apple' in doc.lower(): uses doc.lower() and checks membership correctly. if doc.contains('apple'): uses a non-existent method contains. if 'Apple' == doc: compares whole string, not membership. if doc.find('apple') == -1: checks if find returns -1, which means not found, so logic is reversed.
Final Answer:
if 'apple' in doc.lower(): -> Option C
Quick Check:
Use lower() + in for case-insensitive check [OK]

Hint: Use lower() before checking membership [OK]

Common Mistakes:

Using non-existent string methods
Comparing whole string instead of membership
Misinterpreting find() return values

3. Given the following Python code, what will be the output?

documents = ['Apple pie recipe', 'Banana smoothie', 'apple tart']
query = 'apple'
results = [doc for doc in documents if query.lower() in doc.lower()]
print(results)

medium

A. []

B. ['apple tart']

C. ['Apple pie recipe']

D. ['Apple pie recipe', 'apple tart']

Solution

Step 1: Understand the list comprehension filtering
The code checks each document if the lowercase query 'apple' is in the lowercase document string.
Step 2: Check each document
'Apple pie recipe' contains 'apple' ignoring case, so included. 'Banana smoothie' does not contain 'apple'. 'apple tart' contains 'apple'. So results are the first and third documents.
Final Answer:
['Apple pie recipe', 'apple tart'] -> Option D
Quick Check:
Case-insensitive filter returns matching docs [OK]

Hint: Check each document with lowercase query and doc [OK]

Common Mistakes:

Ignoring case and missing matches
Including documents without the query word
Confusing list comprehension output

4. The following code is intended to find documents containing the word 'data' (case-insensitive), but it returns an empty list. What is the error?

docs = ['Data science', 'Big Data', 'Machine learning']
query = 'data'
results = [d for d in docs if d.find(query) != -1]
print(results)

medium

A. The find method is case-sensitive, so it misses 'Data science'

B. The find method returns -1 when found, so condition is wrong

C. The list comprehension syntax is incorrect

D. The variable query is not defined

Solution

Step 1: Understand find behavior
The find method is case-sensitive, so searching 'data' in 'Data science' returns -1 (not found).
Step 2: Identify why results is empty
The find method is case-sensitive. 'Data science'.find('data') returns -1 because of uppercase 'D'. Similarly, 'Big Data'.find('data') returns -1. 'Machine learning' doesn't contain 'data'. So results is empty.
Final Answer:
The find method is case-sensitive, so it misses 'Data science' -> Option A
Quick Check:
find() is case-sensitive [OK]

Hint: Remember find() is case-sensitive; use lower() [OK]

Common Mistakes:

Assuming find() ignores case
Misunderstanding find() return values
Thinking list comprehension syntax is wrong

5. You have a list of documents:

docs = ['Data Science is fun', 'I love machine learning', 'Deep learning and data']

You want to create a dictionary where keys are unique words (case-insensitive) from all documents, and values are lists of document indices where the word appears. Which code snippet correctly does this?

hard

A. word_docs = {} for i, doc in enumerate(docs): for word in doc.lower().split(): word_docs.setdefault(word, []).append(i)

B. word_docs = {} for i, doc in enumerate(docs): for word in doc.split(): word_docs[word].append(i)

C. word_docs = {word: i for i, doc in enumerate(docs) for word in doc.lower().split()}

D. word_docs = {} for doc in docs: for word in doc.lower().split(): word_docs[word] = doc

Solution

Step 1: Understand the goal
Create a dictionary mapping each unique lowercase word to a list of document indices where it appears.
Step 2: Analyze each option
word_docs = {} for i, doc in enumerate(docs): for word in doc.lower().split(): word_docs.setdefault(word, []).append(i) uses setdefault to initialize lists and appends indices correctly with lowercase words. word_docs = {} for i, doc in enumerate(docs): for word in doc.split(): word_docs[word].append(i) misses initializing lists and ignores case. word_docs = {word: i for i, doc in enumerate(docs) for word in doc.lower().split()} creates a dict with last index only, not lists. word_docs = {} for doc in docs: for word in doc.lower().split(): word_docs[word] = doc overwrites values with document strings, not indices.
Final Answer:
word_docs = {} for i, doc in enumerate(docs): for word in doc.lower().split(): word_docs.setdefault(word, []).append(i) -> Option A
Quick Check:
Use setdefault and lowercase words for correct mapping [OK]

Hint: Use setdefault to build lists for each word [OK]

Common Mistakes:

Not initializing lists before appending
Ignoring case normalization
Overwriting dictionary values instead of appending

Information retrieval basics in NLP - Interactive Code Practice

Start learning this pattern below

Practice

Solution

Step 1: Understand the purpose of information retrieval

Step 2: Compare with other NLP tasks

Final Answer:

Quick Check:

Solution

Step 1: Understand case-insensitive search

Step 2: Analyze each option

Final Answer:

Quick Check:

Solution

Step 1: Understand the list comprehension filtering

Step 2: Check each document

Final Answer:

Quick Check:

Solution

Step 1: Understand `find` behavior

Step 2: Identify why results is empty

Final Answer:

Quick Check:

Solution

Step 1: Understand the goal

Step 2: Analyze each option

Final Answer:

Quick Check:

Start learning this pattern below

Practice

Solution

Step 1: Understand the purpose of information retrieval

Step 2: Compare with other NLP tasks

Final Answer:

Quick Check:

Solution

Step 1: Understand case-insensitive search

Step 2: Analyze each option

Final Answer:

Quick Check:

Solution

Step 1: Understand the list comprehension filtering

Step 2: Check each document

Final Answer:

Quick Check:

Solution

Step 1: Understand find behavior

Step 2: Identify why results is empty

Final Answer:

Quick Check:

Solution

Step 1: Understand the goal

Step 2: Analyze each option

Final Answer:

Quick Check:

Step 1: Understand `find` behavior