Complete the code to load text data for a multimodal model.
text_data = load_[1]('data/text_samples.txt')
The function should load text data, so the correct word is text.
Complete the code to combine image and audio features for multimodal input.
combined_features = concatenate([image_features, [1]_features])We combine image features with audio features, so the blank should be audio.
Fix the error in the code that processes multimodal inputs by selecting the correct modality.
if modality == '[1]': process_text(data) elif modality == 'image': process_image(data) else: process_audio(data)
The code processes text data when modality is 'text', so the blank must be text.
Fill both blanks to create a dictionary mapping modalities to their processing functions.
processors = {
'text': [1],
'image': [2]
}The 'text' key maps to process_text and 'image' maps to process_image.
Fill all three blanks to create a multimodal input pipeline combining text, image, and audio.
inputs = {
'text': [1],
'image': [2],
'audio': [3]
}
combined = combine_features([inputs['text'], inputs['image'], inputs['audio']])Text input uses text_features, image uses image_features, and audio uses audio_features.