0
0
NLPml~10 mins

LDA with Gensim in NLP - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to create a dictionary from tokenized documents.

NLP
from gensim.corpora import Dictionary

docs = [['apple', 'banana', 'apple'], ['banana', 'orange']]
dictionary = Dictionary([1])
Drag options to blanks, or click blank then click option'
A['apple', 'orange']
B['apple', 'banana']
C['banana', 'orange']
Ddocs
Attempts:
3 left
💡 Hint
Common Mistakes
Passing a single list of words instead of a list of tokenized documents.
Passing a string instead of a list.
2fill in blank
medium

Complete the code to convert documents into a bag-of-words corpus using the dictionary.

NLP
corpus = [[1] for doc in docs]
Drag options to blanks, or click blank then click option'
Adictionary.doc2bow(docs)
Bdictionary.doc2bow('doc')
Cdictionary.doc2bow(doc)
Ddictionary.doc2bow(['doc'])
Attempts:
3 left
💡 Hint
Common Mistakes
Passing the whole list of documents instead of a single document.
Passing a string instead of a list of tokens.
3fill in blank
hard

Fix the error in the code to create an LDA model with 2 topics.

NLP
from gensim.models import LdaModel

lda = LdaModel(corpus=corpus, id2word=[1], num_topics=2, random_state=42)
Drag options to blanks, or click blank then click option'
Adictionary
Bcorpus
Cdocs
DLdaModel
Attempts:
3 left
💡 Hint
Common Mistakes
Passing the corpus instead of the dictionary.
Passing the list of documents instead of the dictionary.
4fill in blank
hard

Fill both blanks to print the top 3 words for each topic in the LDA model.

NLP
for i in range([1]):
    print(f"Topic {i}:", lda.show_topic(i, topn=[2]))
Drag options to blanks, or click blank then click option'
A2
B3
C5
D10
Attempts:
3 left
💡 Hint
Common Mistakes
Using the wrong number of topics in the range.
Requesting more or fewer top words than needed.
5fill in blank
hard

Fill all three blanks to get the topic distribution for the first document and print the dominant topic index.

NLP
doc_bow = corpus[0]
topic_dist = lda.get_document_topics([1])
dominant_topic = max(topic_dist, key=lambda x: x[[2]])[[3]]
print(f"Dominant topic index: {dominant_topic}")
Drag options to blanks, or click blank then click option'
Adoc_bow
B1
C0
Dcorpus
Attempts:
3 left
💡 Hint
Common Mistakes
Passing the whole corpus instead of a single document.
Mixing up the tuple indices for topic id and probability.