Overview
LemonData provides an intelligent caching system that can significantly reduce your API costs and response latency. Our caching goes beyond simple request matching - it understands the semantic meaning of your prompts.Cost Savings
Cache hits are billed at a fraction of the normal cost.
Faster Responses
Cached responses are returned instantly, no model inference needed.
Context-Aware
Semantic matching finds similar requests even with different wording.
Privacy Controls
Full control over what gets cached and shared.
How It Works
LemonData uses a two-layer caching system:Layer 1: Response Cache (Exact Match)
For deterministic requests (temperature=0), we cache the exact response:
- Match: Identical model, messages, and parameters
- Speed: Instant (microseconds)
- Best for: Repeated identical queries
Layer 2: Semantic Cache (Similarity Match)
For all requests, we also check semantic similarity:- Match: Similar meaning, even with different wording
- Threshold: 92% similarity (configurable)
- Best for: FAQ-style queries, common questions
Cache Headers
Request Headers
Control caching behavior per-request:| Header | Value | Effect |
|---|---|---|
Cache-Control: no-cache | - | Skip cache, fresh response |
Cache-Control: no-store | - | Don’t cache this response |
Response Headers
Every response includes cache status:Checking Cache Status
Cache Billing
Cache hits are significantly cheaper than fresh requests:| Type | Cost |
|---|---|
| Cache HIT | 80% off |
| Cache MISS | Full price |
Privacy Controls
API Key Level
Configure caching behavior for each API key in your dashboard:| Mode | Description |
|---|---|
| Default | Cache enabled, may share with similar requests |
| No Share | Cache enabled, but responses are private to your account |
| Disabled | No caching at all |
Request Level
Override per-request:Cache Feedback
If you receive an incorrect cached response, you can report it:wrong_answer- Factually incorrectoutdated- Information is staleirrelevant- Doesn’t match the questionother- Other issues
Best Practices
Use temperature=0 for cacheable queries
Use temperature=0 for cacheable queries
Deterministic settings maximize cache hit rates.
Standardize prompt formats
Standardize prompt formats
Consistent formatting improves semantic matching.
Use no-cache for time-sensitive queries
Use no-cache for time-sensitive queries
Current events, real-time data should skip cache.
Monitor cache hit rates
Monitor cache hit rates
Check your dashboard for cache statistics and savings.
When NOT to Cache
Disable caching for:- Real-time information: Stock prices, weather, news
- Personalized content: User-specific recommendations
- Creative tasks: When variety is desired
- Sensitive data: Confidential information