Performance: Why embeddings capture semantic meaning
MEDIUM IMPACT
This concept affects the speed and efficiency of semantic search and similarity calculations in applications.
cached_embeddings = precompute_embeddings(documents) query_embedding = model.encode(query) similarity = cosine_similarity(query_embedding, cached_embeddings)
embedding = model.encode(text)
similarity = cosine_similarity(embedding, all_embeddings)
# Recompute embeddings for every query without caching| Pattern | CPU Usage | Latency | Memory Usage | Verdict |
|---|---|---|---|---|
| Recompute embeddings every query | High CPU spikes | High latency (~100ms+) | High memory for temporary data | [X] Bad |
| Cache embeddings and reuse | Low CPU per query | Low latency (~20ms) | Moderate memory for cache | [OK] Good |