Challenge - 5 Problems
Caching Master for LLMs
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate1:30remaining
Understanding Cache Hit in LLMs
In the context of caching strategies for large language models (LLMs), what does a cache hit mean?
Attempts:
2 left
💡 Hint
Think about what happens when the model can reuse previous results quickly.
✗ Incorrect
A cache hit means the model finds the requested output already stored in the cache, so it can return it immediately without extra computation.
❓ Model Choice
intermediate2:00remaining
Choosing a Cache Type for LLMs
Which caching strategy is best suited for storing frequently requested LLM outputs to reduce latency?
Attempts:
2 left
💡 Hint
Think about which method keeps the most useful outputs available.
✗ Incorrect
LRU cache keeps the most recently used outputs, which are more likely to be requested again, reducing latency effectively.
❓ Metrics
advanced1:30remaining
Evaluating Cache Effectiveness
Given an LLM caching system, which metric best measures how often the cache successfully provides outputs without recomputation?
Attempts:
2 left
💡 Hint
This metric shows how often the cache helps avoid extra work.
✗ Incorrect
Cache hit rate measures the fraction of requests served directly from cache, indicating cache effectiveness.
🔧 Debug
advanced2:00remaining
Identifying Cache Invalidation Issue
An LLM caching system returns outdated responses after the underlying model is updated. What is the most likely cause?
Attempts:
2 left
💡 Hint
Think about what happens when cached data does not reflect model changes.
✗ Incorrect
Without proper cache invalidation, the system keeps serving outdated outputs even after model updates.
❓ Predict Output
expert2:30remaining
Output of LRU Cache Simulation Code
What is the output of this Python code simulating an LRU cache for LLM outputs?
Prompt Engineering / GenAI
from collections import OrderedDict class LRUCache: def __init__(self, capacity): self.cache = OrderedDict() self.capacity = capacity def get(self, key): if key not in self.cache: return -1 self.cache.move_to_end(key) return self.cache[key] def put(self, key, value): if key in self.cache: self.cache.move_to_end(key) self.cache[key] = value if len(self.cache) > self.capacity: self.cache.popitem(last=False) cache = LRUCache(2) cache.put('a', 'output1') cache.put('b', 'output2') print(cache.get('a')) cache.put('c', 'output3') print(cache.get('b')) print(cache.get('c'))
Attempts:
2 left
💡 Hint
Remember that the cache capacity is 2 and least recently used items get removed.
✗ Incorrect
Initially, 'a' and 'b' are cached. Accessing 'a' moves it to recent. Adding 'c' evicts 'b'. So 'b' returns -1, 'a' and 'c' return their outputs.