Skip to main content

Overview

In addition to LemonData’s platform semantic cache, many AI providers offer their own prompt caching feature. This is a separate caching mechanism that operates at the provider level (Anthropic, OpenAI, DeepSeek, etc.).
Two Types of Caching
TypeWhereHow it WorksCost
Platform CacheLemonDataSemantic similarity matching10% of normal price
Provider CacheUpstream (Anthropic/OpenAI/etc)Exact prefix matchingDiscounted token rates
These are mutually exclusive: if platform cache hits, no upstream call is made, so provider cache doesn’t apply.

How Provider Prompt Cache Works

Provider prompt caching stores the processed representation of your prompt prefix on the provider’s servers. When you send a request with the same prefix, the provider can skip reprocessing those tokens.

Key Characteristics

  • Prefix-based: Only the beginning of your prompt can be cached
  • Exact match: Requires identical tokens (not semantic similarity)
  • Time-limited: Cache entries expire (typically 5-60 minutes)
  • Automatic: No special configuration needed
Request 1: [System prompt + Context A + Question 1]
           ^^^^^^^^^^^^^^^^^^^^^^^^
           This prefix gets cached

Request 2: [System prompt + Context A + Question 2]
           ^^^^^^^^^^^^^^^^^^^^^^^^
           Cache hit! Only Question 2 is processed

Supported Providers

ProviderCache Read DiscountCache Write CostMin Tokens
Anthropic90% off25% premium1024
OpenAI50% offSame as input1024
DeepSeek90% offSame as input64
Google75% off25% premium32768
Discounts are applied automatically. LemonData passes through the provider’s cache pricing to you.

Identifying Cache Usage

In Usage Logs

Your usage logs show detailed cache token breakdown:
FieldDescription
cacheReadTokensTokens served from provider cache (discounted)
cacheWriteTokensTokens written to cache (for future requests)
nonCachedPromptTokensTokens processed without cache

In Transactions

Transactions show a Provider Cache label when upstream caching was used:
  • Cache (sky blue): Platform semantic cache hit - 90% discount
  • Provider Cache (teal): Upstream prompt cache hit - discounted rates

Cost Calculation Example

For a request with 10,000 input tokens to Claude (Anthropic): Without cache:
10,000 tokens × $3.00/1M = $0.030
With provider cache (8,000 cached + 2,000 new):
Cache read:  8,000 tokens × $0.30/1M = $0.0024  (90% off)
Cache write: 2,000 tokens × $3.75/1M = $0.0075  (25% premium)
Total: $0.0099 (67% savings)

Best Practices

Place your system prompt and static context at the beginning of your messages. This maximizes cache hit potential.
Send requests with the same prefix close together in time to benefit from cache before it expires.
Ensure your cacheable prefix meets the provider’s minimum (e.g., 1024 tokens for Anthropic/OpenAI).
Check your dashboard usage statistics for cache hit rates and savings.

Platform Cache vs Provider Cache

AspectPlatform CacheProvider Cache
MatchingSemantic similarityExact prefix match
Cost10% of normal priceDiscounted rates
LatencyInstant (~1ms)Reduced (skip processing)
ControlDashboard settingsAutomatic
ScopeCross-user (optional)Per-API-key

When Each Applies

Request arrives


┌─────────────────────┐
│ Platform Cache Hit? │
└─────────────────────┘
    │ Yes              │ No
    ▼                  ▼
┌─────────┐    ┌─────────────────────┐
│ Return  │    │ Call Upstream API   │
│ Cached  │    └─────────────────────┘
│ (10%)   │            │
└─────────┘            ▼
               ┌─────────────────────┐
               │ Provider Cache Hit? │
               └─────────────────────┘
                   │ Yes        │ No
                   ▼            ▼
               Discounted    Full Price
               Token Rate    Token Rate

Checking Cache Status

Response Headers

X-Cache-Status: HIT           # Platform cache hit
X-Cache-Status: MISS          # No platform cache
X-Upstream-Cache-Read: 8000   # Provider cache read tokens
X-Upstream-Cache-Write: 2000  # Provider cache write tokens

Usage API

Query your usage logs to see cache breakdown:
curl https://api.lemondata.cc/v1/usage/logs \
  -H "Authorization: Bearer sk-your-key" \
  -H "Content-Type: application/json"
Response includes:
{
  "promptTokens": 10000,
  "cacheReadTokens": 8000,
  "cacheWriteTokens": 2000,
  "nonCachedPromptTokens": 0,
  "completionTokens": 500,
  "cost": 0.0099
}

FAQ

Provider caching is automatic and cannot be disabled. However, it only benefits you (lower costs), so there’s no reason to disable it.
Common reasons:
  • Prefix changed (even one token difference)
  • Cache expired (typically 5-60 minutes)
  • Prefix too short (below minimum tokens)
  • Different API key used
Yes! When using your own API keys (BYOK), provider caching works the same way. The cache is tied to your upstream API key.
  1. Use platform semantic cache for repeated similar queries
  2. Structure prompts with static content first
  3. Keep system prompts consistent across requests
  4. Send related requests in quick succession