LLM Prompt Caching with Git to Cut API Costs
Reduce LLM API costs in CI/CD with a simple prompt cache. Learn how to use a Git repository as a database-free key-value store for LLM prompts and responses.
If your CI/CD pipelines call LLM APIs like OpenAI’s GPT-4, you’ve probably noticed the token costs. Automated systems that generate documentation or review code often run the same prompts repeatedly, leading to high bills. You can reduce these costs significantly by implementing a simple prompt cache using a tool you already have: Git.
This article explains how to use a dedicated Git repository as a database-free key-value cache for LLM prompts and responses. Before calling an expensive API, your script checks a local Git clone for a cached answer. If found, it uses the saved response, avoiding the API call entirely. This method can cut costs by over 50% in CI/CD environments where prompts are frequently repeated.
How Git-Based Caching Works
The approach treats a Git repository as a key-value store. You simply create a new, dedicated repository to act as the cache.
- Key: A SHA256 hash of the prompt’s content. Hashing ensures that even a one-character difference creates a unique key.
- Value: The LLM’s response, stored as a plain text file. The filename is the key, for example,
5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8.
When your script needs an LLM response, it first calculates the prompt’s hash. It then checks if a file with that name exists in its local clone of the cache repository. If it does, that’s a cache hit. If not, it’s a miss.
The Caching Workflow in Action
The logic for your application or CI script follows a “check-miss-write” pattern.
- Clone/Pull: Before running, ensure your script has an up-to-date local clone of the cache repository. A quick
$ git pullis all you need. - Generate Hash: Take the full prompt string and generate its SHA256 hash. This becomes your cache key.
- Check for Key: Look for a file named after the hash in the local cache repository.
- Handle the Result:
- Cache Hit: If the file exists, read its contents. This is your LLM response. No API call is made.
- Cache Miss: If the file does not exist, call the actual LLM API to get a new response.
- Write to Cache: On a cache miss, save the new response to a file named after the prompt’s hash. Then, commit and push this new file to the remote cache repository.
# Example of a cache repository's structure
$ ls -1 llm-cache/
0a3b...
1c5d...
5e88...
This workflow ensures that the next time the same prompt is encountered by any user or pipeline with access to the repo, it will be a cache hit. This is particularly effective in CI pipelines that build AI agents for Kubernetes deployments, where environment setup prompts are often identical across runs.
A Python Implementation Example
Here is a simple Python function that implements this caching logic. It uses the standard hashlib and os libraries. You can consult the official Python hashlib documentation for more details on hashing.
import hashlib
import os
import subprocess
# --- Configuration ---
# IMPORTANT: Update this to the absolute path of your cache repository clone.
CACHE_DIR = "/path/to/your/local/llm-cache-repo"
def get_llm_response_with_cache(prompt: str, llm_api_call_func) -> str:
"""
Gets an LLM response, using a Git-based file cache to avoid duplicate API calls.
Args:
prompt: The full prompt string to send to the LLM.
llm_api_call_func: A function that takes a prompt string and returns the API response.
Returns:
The LLM's response, either from the cache or a new API call.
"""
# 1. Ensure the cache is up-to-date
# A production implementation should include robust error handling for Git commands.
subprocess.run(["git", "pull"], cwd=CACHE_DIR, check=True)
# 2. Generate the cache key
prompt_hash = hashlib.sha256(prompt.encode('utf-8')).hexdigest()
cache_file_path = os.path.join(CACHE_DIR, prompt_hash)
# 3. Check for a cache hit
if os.path.exists(cache_file_path):
print(f"CACHE HIT: Found response for hash {prompt_hash}")
with open(cache_file_path, 'r', encoding='utf-8') as f:
return f.read()
# 4. Handle a cache miss
print(f"CACHE MISS: Calling API for hash {prompt_hash}")
response = llm_api_call_func(prompt)
# 5. Write the new response to the cache and push
# Note: The prompt and response are stored in plain text. Do not use this method for sensitive data.
with open(cache_file_path, 'w', encoding='utf-8') as f:
f.write(response)
print("Adding new response to cache...")
subprocess.run(["git", "add", cache_file_path], cwd=CACHE_DIR, check=True)
subprocess.run(["git", "commit", "-m", f"Add cache for {prompt_hash}"], cwd=CACHE_DIR, check=True)
subprocess.run(["git", "push"], cwd=CACHE_DIR, check=True)
return response
# --- Example Usage ---
def fake_openai_call(prompt: str) -> str:
# Replace this with your actual client.chat.completions.create() call
print("--- Faking expensive API call ---")
return f"This is the LLM's answer to the prompt starting with: '{prompt[:30]}...'"
if __name__ == "__main__":
my_prompt = "Generate a Kubernetes Deployment YAML for a Python Flask app named 'my-app' listening on port 5000."
# First call (will be a miss)
response1 = get_llm_response_with_cache(my_prompt, fake_openai_call)
print("\nResponse 1:\n", response1)
# Second call (will be a hit)
response2 = get_llm_response_with_cache(my_prompt, fake_openai_call)
print("\nResponse 2:\n", response2)
Benefits of This Approach
- Cost Reduction: Avoids expensive API calls for repeated prompts. With GPT-4 Turbo input prices around $10 per million tokens, caching just a few hundred complex prompts in CI can lead to substantial savings.
- No New Infrastructure: It uses your existing Git provider, so there is no need to set up or maintain a separate caching service like Redis or Memcached.
- Audit Trail: The Git history provides a complete, version-controlled log of every unique prompt and its corresponding LLM response.
- Faster Execution on Hits: Reading a local file takes milliseconds, while a network API call can take several seconds. This speeds up CI/CD jobs that get a cache hit.
Limitations and Considerations
This method is pragmatic but has trade-offs compared to a dedicated caching system.
- Manual Cache Invalidation: To get a fresh response for a cached prompt, you must manually delete the file from the repository (
git rm <hash>, commit and push). There is no built-in time-to-live (TTL) mechanism. - Repository Size: The cache repository will grow indefinitely. While text-based responses are small, this method is unsuitable for caching large files like images or audio. Regular maintenance may be needed to prune old entries.
- Concurrency and Race Conditions: If two CI jobs miss on the same prompt simultaneously, they will both call the LLM API. They will then race to commit and push the new file. One
git pushwill fail. The failing script needs retry logic (for example,git pulland check again), or you will waste an API call.
This Git-based caching technique is most effective in environments with high prompt repetition, such as CI/CD pipelines for code analysis, documentation generation, or testing. For applications requiring high-throughput, atomic operations, or automatic cache eviction, a dedicated solution like Redis is more appropriate. For many teams, however, this simple approach provides a significant benefit for minimal effort.
Stay up to date
Get DevOps tips, tutorials, and guides delivered to your inbox.