Complete the code to load an image using PIL.
from PIL import Image img = Image.[1]('example.jpg')
The Image.open function loads an image file into memory for processing.
Complete the code to convert an image to a tensor for model input.
import torchvision.transforms as transforms transform = transforms.Compose([ transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ]) tensor_img = transform([1])
The variable img holds the loaded image, which we convert to a tensor.
Fix the error in the code to generate image captions using a pretrained model.
from transformers import VisionEncoderDecoderModel, ViTImageProcessor, AutoTokenizer model = VisionEncoderDecoderModel.from_pretrained('nlpconnect/vit-gpt2-image-captioning') processor = ViTImageProcessor.from_pretrained('nlpconnect/vit-gpt2-image-captioning') tokenizer = AutoTokenizer.from_pretrained('nlpconnect/vit-gpt2-image-captioning') pixel_values = processor(images=img, return_tensors='pt').[1]() output_ids = model.generate(pixel_values) caption = tokenizer.decode(output_ids[0], skip_special_tokens=True) print(caption)
The processor output needs a batch dimension, so unsqueeze() adds it.
Fill both blanks to create a dictionary of image features and their lengths.
features = {img_id: [1] for img_id, img in images.items() if len([2]) > 0}We extract pixel values from each image and check the length of the image object.
Fill all three blanks to filter captions longer than 5 words and create a summary dictionary.
summary = {img_id: caption for img_id, caption in captions.items() if len(caption.[1]()) > [2] and caption.[3](' ') > 0}We split captions into words, check if length is greater than 5, and count spaces to ensure words exist.