Introduction
When using large language models (LLMs), responses can take time and computing power. Caching helps by saving answers to repeated questions, so the model doesn't have to work from scratch every time.
Jump into concepts and practice - no test required
Imagine a busy coffee shop where customers often order the same drinks. Instead of making each drink from scratch every time, the barista keeps some popular drinks ready or remembers how to quickly prepare them. This saves time and keeps customers happy.
┌─────────────────────────────┐
│ User Input │
└─────────────┬───────────────┘
│
┌────────▼────────┐
│ Check Response │
│ Cache │
└───────┬─────────┘
│ Yes
▼
┌───────────────────┐
│ Return Cached │
│ Response │
└───────────────────┘
│ No
▼
┌───────────────────┐
│ Process Input │
│ (Context, Embeds) │
└───────┬───────────┘
│
┌───────▼───────────┐
│ Generate Response │
└───────┬───────────┘
│
┌───────▼───────────┐
│ Cache Results │
└───────┬───────────┘
│
▼
┌───────────────────┐
│ Return Response │
└───────────────────┘cache = {}
def get_response(input_text):
if input_text in cache:
return cache[input_text]
response = f"Answer for {input_text}"
cache[input_text] = response
return response
print(get_response('hello'))
print(get_response('hello'))cache = {}
def get_response(input_text):
if input_text in cache:
return cache[input_text]
response = f"Answer for {input_text}"
cache = {input_text: response}
return response
print(get_response('test'))
print(get_response('test'))