Complete the code to create a dictionary from tokenized documents.
from gensim.corpora import Dictionary docs = [['apple', 'banana', 'apple'], ['banana', 'orange']] dictionary = Dictionary([1])
The Dictionary class expects a list of tokenized documents to create the mapping of word IDs.
Complete the code to convert documents into a bag-of-words corpus using the dictionary.
corpus = [[1] for doc in docs]
The doc2bow method converts a tokenized document into a bag-of-words format using the dictionary.
Fix the error in the code to create an LDA model with 2 topics.
from gensim.models import LdaModel lda = LdaModel(corpus=corpus, id2word=[1], num_topics=2, random_state=42)
The id2word parameter expects the dictionary object that maps word IDs to words.
Fill both blanks to print the top 3 words for each topic in the LDA model.
for i in range([1]): print(f"Topic {i}:", lda.show_topic(i, topn=[2]))
We created 2 topics, so the range should be 2. We want the top 3 words, so topn=3.
Fill all three blanks to get the topic distribution for the first document and print the dominant topic index.
doc_bow = corpus[0] topic_dist = lda.get_document_topics([1]) dominant_topic = max(topic_dist, key=lambda x: x[[2]])[[3]] print(f"Dominant topic index: {dominant_topic}")
We pass the bag-of-words of the first document to get_document_topics. The topic distribution is a list of (topic_id, probability) tuples. To find the dominant topic, we use max with a key on the probability (index 1) and then get the topic id (index 0).