Overview
LemonData provides an intelligent caching system that can significantly reduce your API costs and response latency. Our caching goes beyond simple request matching - it understands the semantic meaning of your prompts.Cost Savings
Cache hits are billed at a fraction of the normal cost.
Faster Responses
Cached responses are returned instantly, no model inference needed.
Context-Aware
Semantic matching finds similar requests even with different wording.
Privacy Controls
Full control over what gets cached and shared.
How It Works
LemonData uses a two-layer caching system:Layer 1: Response Cache (Exact Match)
For deterministic requests (temperature=0), we cache the exact response:
- Match: Identical model, messages, and parameters
- Speed: Instant (microseconds)
- Best for: Repeated identical queries
Layer 2: Semantic Cache (Similarity Match)
For all requests, we also check semantic similarity using a two-stage matching algorithm:- Stage 1 (Query-only): ≥95% similarity on user query
- Stage 2 (Full context): ≥85% similarity including conversation context
- Best for: FAQ-style queries, common questions
Cache Control
Request-Level Control
Control caching behavior per-request using thecache_control parameter in the request body:
| Type | Effect |
|---|---|
no_cache | Skip cache lookup, always get fresh response |
no_store | Don’t store this response in cache |
response_only | Only use exact match cache (skip semantic) |
semantic_only | Only use semantic cache (skip exact match) |
Response Headers
Every response includes cache status:Checking Cache Status
Cache Billing
Cache hits are significantly cheaper than fresh requests:| Type | Cost |
|---|---|
| Cache HIT | 90% off |
| Cache MISS | Full price |
Privacy Controls
Organization / User Level
Configure caching behavior in your dashboard settings:| Mode | Description |
|---|---|
| Shared | Cache enabled, responses may be shared across users (default for personal accounts) |
| Isolated | Cache enabled, but responses are private to your organization (default for organizations) |
| Disabled | No caching at all |
- Similarity Threshold: Adjust semantic matching sensitivity (default: 92%)
- Custom TTL: Override cache expiration time
- Excluded Models: Disable caching for specific models
Request Level
Override per-request using thecache_control parameter:
Cache Feedback
If you receive an incorrect cached response, you can report it:wrong_answer- Factually incorrectoutdated- Information is staleirrelevant- Doesn’t match the questionother- Other issues
Best Practices
Use temperature=0 for cacheable queries
Use temperature=0 for cacheable queries
Deterministic settings maximize cache hit rates.
Standardize prompt formats
Standardize prompt formats
Consistent formatting improves semantic matching.
Use no-cache for time-sensitive queries
Use no-cache for time-sensitive queries
Current events, real-time data should skip cache.
Monitor cache hit rates
Monitor cache hit rates
Check your dashboard for cache statistics and savings.
When NOT to Cache
Disable caching for:- Real-time information: Stock prices, weather, news
- Personalized content: User-specific recommendations
- Creative tasks: When variety is desired
- Sensitive data: Confidential information