Documentation Index
Fetch the complete documentation index at: https://docs.lemondata.cc/llms.txt
Use this file to discover all available pages before exploring further.
Overview
LemonData provides an intelligent caching system that can significantly reduce your API costs and response latency. Our caching goes beyond simple request matching - it understands the semantic meaning of your prompts.Cost Savings
Cache hits are billed at a fraction of the normal cost.
Faster Responses
Cached responses are returned instantly, no model inference needed.
Context-Aware
Semantic matching finds similar requests even with different wording.
Privacy Controls
Full control over what gets cached and shared.
How It Works
LemonData uses a two-layer caching system:Layer 1: Response Cache (Exact Match)
For deterministic requests (temperature=0), we cache the exact response:
- Match: Identical model, messages, and parameters
- Speed: Instant (microseconds)
- Best for: Repeated identical queries
Layer 2: Semantic Cache (Similarity Match)
For all requests, we also check semantic similarity using a two-stage matching algorithm:- Stage 1 (Query-only): ≥95% similarity on user query
- Stage 2 (Full context): ≥95% similarity including conversation context
- Best for: FAQ-style queries, common questions
Cache Control
Request-Level Control
Control caching behavior per-request using thecache_control parameter in the request body:
| Type | Effect |
|---|---|
no_cache | Skip cache lookup, always get fresh response |
no_store | Don’t store this response in cache |
response_only | Only use exact match cache (skip semantic) |
semantic_only | Only use semantic cache (skip exact match) |
Response Headers
Every response includes cache status:Checking Cache Status
Cache Billing
Cache hits are significantly cheaper than fresh requests:| Type | Cost |
|---|---|
| Cache HIT | 90% off |
| Cache MISS | Full price |
Privacy Controls
Organization / User Level
Configure caching behavior in your dashboard settings:| Mode | Description |
|---|---|
| Shared | Cache enabled, responses may be shared across users (default for personal accounts) |
| Isolated | Cache enabled, but responses are private to your organization (default for organizations) |
| Disabled | No caching at all |
- Similarity Threshold: Adjust semantic matching sensitivity (default: 92%)
- Custom TTL: Override cache expiration time
- Excluded Models: Disable caching for specific models
Request Level
Override per-request using thecache_control parameter:
Cache Feedback
If you receive an incorrect cached response, you can report it:wrong_answer- Factually incorrectoutdated- Information is staleirrelevant- Doesn’t match the questionother- Other issues
Best Practices
Use temperature=0 for cacheable queries
Use temperature=0 for cacheable queries
Deterministic settings maximize cache hit rates.
Standardize prompt formats
Standardize prompt formats
Consistent formatting improves semantic matching.
Use no-cache for time-sensitive queries
Use no-cache for time-sensitive queries
Current events, real-time data should skip cache.
Monitor cache hit rates
Monitor cache hit rates
Check your dashboard for cache statistics and savings.
When NOT to Cache
Disable caching for:- Real-time information: Stock prices, weather, news
- Personalized content: User-specific recommendations
- Creative tasks: When variety is desired
- Sensitive data: Confidential information