Overview
In addition to LemonData’s platform semantic cache, many AI providers offer their own prompt caching feature. This is a separate caching mechanism that operates at the provider level (Anthropic, OpenAI, DeepSeek, etc.).Two Types of Caching
These are mutually exclusive: if platform cache hits, no upstream call is made, so provider cache doesn’t apply.
| Type | Where | How it Works | Cost |
|---|---|---|---|
| Platform Cache | LemonData | Semantic similarity matching | 10% of normal price |
| Provider Cache | Upstream (Anthropic/OpenAI/etc) | Exact prefix matching | Discounted token rates |
How Provider Prompt Cache Works
Provider prompt caching stores the processed representation of your prompt prefix on the provider’s servers. When you send a request with the same prefix, the provider can skip reprocessing those tokens.Key Characteristics
- Prefix-based: Only the beginning of your prompt can be cached
- Exact match: Requires identical tokens (not semantic similarity)
- Time-limited: Cache entries expire (typically 5-60 minutes)
- Automatic: No special configuration needed
Supported Providers
| Provider | Cache Read Discount | Cache Write Cost | Min Tokens |
|---|---|---|---|
| Anthropic | 90% off | 25% premium | 1024 |
| OpenAI | 50% off | Same as input | 1024 |
| DeepSeek | 90% off | Same as input | 64 |
| 75% off | 25% premium | 32768 |
Discounts are applied automatically. LemonData passes through the provider’s cache pricing to you.
Identifying Cache Usage
In Usage Logs
Your usage logs show detailed cache token breakdown:| Field | Description |
|---|---|
cacheReadTokens | Tokens served from provider cache (discounted) |
cacheWriteTokens | Tokens written to cache (for future requests) |
nonCachedPromptTokens | Tokens processed without cache |
In Transactions
Transactions show a Provider Cache label when upstream caching was used:- Cache (sky blue): Platform semantic cache hit - 90% discount
- Provider Cache (teal): Upstream prompt cache hit - discounted rates
Cost Calculation Example
For a request with 10,000 input tokens to Claude (Anthropic): Without cache:Best Practices
Use consistent system prompts
Use consistent system prompts
Place your system prompt and static context at the beginning of your messages. This maximizes cache hit potential.
Batch similar requests
Batch similar requests
Send requests with the same prefix close together in time to benefit from cache before it expires.
Meet minimum token requirements
Meet minimum token requirements
Ensure your cacheable prefix meets the provider’s minimum (e.g., 1024 tokens for Anthropic/OpenAI).
Monitor cache metrics
Monitor cache metrics
Check your dashboard usage statistics for cache hit rates and savings.
Platform Cache vs Provider Cache
| Aspect | Platform Cache | Provider Cache |
|---|---|---|
| Matching | Semantic similarity | Exact prefix match |
| Cost | 10% of normal price | Discounted rates |
| Latency | Instant (~1ms) | Reduced (skip processing) |
| Control | Dashboard settings | Automatic |
| Scope | Cross-user (optional) | Per-API-key |
When Each Applies
Checking Cache Status
Response Headers
Usage API
Query your usage logs to see cache breakdown:FAQ
Can I disable provider caching?
Can I disable provider caching?
Provider caching is automatic and cannot be disabled. However, it only benefits you (lower costs), so there’s no reason to disable it.
Why didn't my request hit provider cache?
Why didn't my request hit provider cache?
Common reasons:
- Prefix changed (even one token difference)
- Cache expired (typically 5-60 minutes)
- Prefix too short (below minimum tokens)
- Different API key used
Does BYOK support provider caching?
Does BYOK support provider caching?
Yes! When using your own API keys (BYOK), provider caching works the same way. The cache is tied to your upstream API key.
How do I maximize cache savings?
How do I maximize cache savings?
- Use platform semantic cache for repeated similar queries
- Structure prompts with static content first
- Keep system prompts consistent across requests
- Send related requests in quick succession