Practice

(1/5)

1. What is the main purpose of caching in large language models (LLMs)?

easy

A. To save previous answers and avoid repeating work

B. To increase the size of the model

C. To change the model's training data

D. To make the model forget old information

Solution

Step 1: Understand caching concept
Caching stores previous results so the system can reuse them instead of recalculating.
Step 2: Apply to LLMs context
In LLMs, caching saves time and resources by reusing answers for repeated inputs.
Final Answer:
To save previous answers and avoid repeating work -> Option A
Quick Check:
Caching = Save and reuse answers [OK]

Hint: Caching means saving past answers to reuse them [OK]

Common Mistakes:

Thinking caching changes model size
Confusing caching with training data updates
Believing caching deletes old info

2. Which Python tool is commonly used for simple caching in LLM applications?

easy

A. os.listdir

B. functools.lru_cache

C. math.sqrt

D. random.shuffle

Solution

Step 1: Identify caching tools in Python
functools.lru_cache is a built-in decorator for caching function results.
Step 2: Check other options
random.shuffle shuffles lists, math.sqrt calculates square roots, os.listdir lists files; none cache results.
Final Answer:
functools.lru_cache -> Option B
Quick Check:
Python caching tool = lru_cache [OK]

Hint: lru_cache is Python's simple caching decorator [OK]

Common Mistakes:

Choosing random.shuffle as caching
Confusing math functions with caching
Picking file system functions

3. Given this Python code using a dictionary cache for LLM responses, what will be printed?

cache = {}
def get_response(input_text):
    if input_text in cache:
        return cache[input_text]
    response = f"Answer for {input_text}"
    cache[input_text] = response
    return response

print(get_response('hello'))
print(get_response('hello'))

medium

A. None\nAnswer for hello

B. Answer for hello\nNone

C. Error: KeyError

D. Answer for hello\nAnswer for hello

Solution

Step 1: Analyze first call get_response('hello')
Cache is empty, so it creates 'Answer for hello', stores it, and returns it.
Step 2: Analyze second call get_response('hello')
Input is in cache, so it returns cached 'Answer for hello' without recomputing.
Final Answer:
Answer for hello Answer for hello -> Option D
Quick Check:
Cache hit returns saved answer [OK]

Hint: Cache returns saved answer on repeated input [OK]

Common Mistakes:

Assuming second call returns None
Expecting error on repeated key
Thinking cache clears automatically

4. This code tries to cache LLM outputs but has a bug. What is the error?

cache = {}
def get_response(input_text):
    if input_text in cache:
        return cache[input_text]
    response = f"Answer for {input_text}"
    cache = {input_text: response}
    return response

print(get_response('test'))
print(get_response('test'))

medium

A. Cache is reset each call, losing previous entries

B. generate_answer function is undefined

C. Syntax error in dictionary assignment

D. Infinite recursion in get_response

Solution

Step 1: Check cache update line
cache = {input_text: response} replaces whole cache dict, losing old data.
Step 2: Understand effect on repeated calls
Each call resets cache, so repeated inputs are not cached properly.
Final Answer:
Cache is reset each call, losing previous entries -> Option A
Quick Check:
Cache replaced, not updated [OK]

Hint: Use cache[key] = value to update, not assign new dict [OK]

Common Mistakes:

Thinking generate_answer is missing
Assuming syntax error in dict
Believing recursion happens

5. You want to cache partial results of LLM calls to speed up responses when inputs share common prefixes. Which caching strategy best fits this need?

hard

A. Use random sampling to cache some inputs

B. Cache only full input strings as dictionary keys

C. Use a trie (prefix tree) to store cached outputs by input prefixes

D. Clear cache after every call to save memory

Solution

Step 1: Understand prefix sharing in inputs
Inputs sharing prefixes can reuse partial results if cached by prefix.
Step 2: Identify suitable data structure
A trie (prefix tree) efficiently stores and retrieves data by prefixes, ideal for this case.
Final Answer:
Use a trie (prefix tree) to store cached outputs by input prefixes -> Option C
Quick Check:
Prefix caching = trie structure [OK]

Hint: Trie caches shared prefixes efficiently [OK]

Common Mistakes:

Caching only full inputs misses prefix reuse
Random caching is inefficient
Clearing cache wastes saved data

Why Caching strategies for LLMs in Prompt Engineering / GenAI? - Purpose & Use Cases

Start learning this pattern below

Practice

Solution

Step 1: Understand caching concept

Step 2: Apply to LLMs context

Final Answer:

Quick Check:

Solution

Step 1: Identify caching tools in Python

Step 2: Check other options

Final Answer:

Quick Check:

Solution

Step 1: Analyze first call get_response('hello')

Step 2: Analyze second call get_response('hello')

Final Answer:

Quick Check:

Solution

Step 1: Check cache update line

Step 2: Understand effect on repeated calls

Final Answer:

Quick Check:

Solution

Step 1: Understand prefix sharing in inputs

Step 2: Identify suitable data structure

Final Answer:

Quick Check: