When creating embeddings for text data, why might you choose a higher dimensionality?
Think about what more dimensions allow the model to represent.
Higher embedding dimensions allow the model to capture more detailed and subtle features of the data, improving representation quality. However, it may increase computational cost and risk overfitting.
You have two sets of embeddings: one with 50 dimensions and one with 300 dimensions. Both sets are normalized. Which statement about cosine similarity between embeddings is true?
Consider how dimensionality affects vector distribution and sparsity.
Higher dimensional embeddings often become sparser, causing cosine similarity values to cluster closer to zero, making distinctions less pronounced.
Consider the following PyTorch code snippet creating an embedding layer and passing input indices:
import torch import torch.nn as nn embedding = nn.Embedding(num_embeddings=1000, embedding_dim=128) input_indices = torch.tensor([[1, 2, 3], [4, 5, 6]]) output = embedding(input_indices) print(output.shape)
Think about how embedding layers map indices to vectors.
The input tensor has shape (2, 3). Each index is replaced by a 128-dimensional vector, so output shape is (2, 3, 128).
You have a small dataset with only 500 unique words. Which embedding dimension is most appropriate to avoid overfitting while maintaining useful representation?
Think about the trade-off between model complexity and data size.
For small datasets, moderate embedding sizes like 50 help avoid overfitting while still capturing meaningful features.
After increasing embedding dimension from 100 to 10000 in a neural network, training loss becomes NaN immediately. What is the most likely cause?
Consider hardware limits and how large parameters affect training.
Very large embeddings increase memory use and can cause unstable gradients or overflow, leading to NaN losses.