0
0
LangChainframework~5 mins

Caching strategies for cost reduction in LangChain

Choose your learning style9 modes available
Introduction

Caching saves previous results so you don't pay or wait for the same work twice. It helps reduce costs and speeds up your app.

When you call an expensive API multiple times with the same input
When you want faster responses for repeated questions or tasks
When you want to limit usage of paid services to save money
When you build a chatbot that often repeats answers
When you want to avoid unnecessary processing for unchanged data
Syntax
LangChain
from langchain.cache import InMemoryCache

cache = InMemoryCache()

# Use cache to store and retrieve results
result = cache.get_or_set(key, lambda: expensive_function(input))

Use get_or_set to check cache first, then run function if missing.

LangChain supports different cache backends like memory, Redis, or custom stores.

Examples
This caches the answer to 'What is AI?' so next time it returns instantly without calling the model.
LangChain
from langchain.cache import InMemoryCache

cache = InMemoryCache()

key = 'user_question_1'
result = cache.get_or_set(key, lambda: llm.generate('What is AI?'))
This uses Redis to cache results, which works well for multiple app instances sharing the cache.
LangChain
from langchain.cache import RedisCache

cache = RedisCache(redis_url='redis://localhost:6379')

key = 'expensive_call'
result = cache.get_or_set(key, lambda: expensive_api_call())
Sample Program

This example shows caching an expensive calculation. The first call runs the function and caches the result. The second call returns the cached result instantly without running the function again.

LangChain
from langchain.cache import InMemoryCache

cache = InMemoryCache()

# Simulate an expensive function

def expensive_function(x):
    print('Running expensive function...')
    return x * x

key = 'square_4'

# First call runs the function
result1 = cache.get_or_set(key, lambda: expensive_function(4))
print('First call result:', result1)

# Second call uses cache, no print from function
result2 = cache.get_or_set(key, lambda: expensive_function(4))
print('Second call result:', result2)
OutputSuccess
Important Notes

Cache keys should be unique and descriptive to avoid collisions.

Cached data may become stale; consider cache expiration if needed.

Choose cache backend based on your app scale and persistence needs.

Summary

Caching stores results to avoid repeated work and reduce costs.

Use get_or_set to check cache before running expensive calls.

Pick the right cache type for your app, like in-memory or Redis.