✨ Upstream Prompt Cache

Overview

In addition to LemonData’s platform semantic cache, many AI providers offer their own prompt caching feature. This is a separate caching mechanism that operates at the provider level (Anthropic, OpenAI, DeepSeek, etc.).

Two Types of Caching

Type	Where	How it Works	Cost
Platform Cache	LemonData	Semantic similarity matching	10% of normal price
Provider Cache	Upstream (Anthropic/OpenAI/etc)	Exact prefix matching	Discounted token rates

These are mutually exclusive: if platform cache hits, no upstream call is made, so provider cache doesn’t apply.

How Provider Prompt Cache Works

Provider prompt caching stores the processed representation of your prompt prefix on the provider’s servers. When you send a request with the same prefix, the provider can skip reprocessing those tokens.

Key Characteristics

Prefix-based: Only the beginning of your prompt can be cached
Exact match: Requires identical tokens (not semantic similarity)
Time-limited: Cache entries expire (typically 5-60 minutes)
Automatic: No special configuration needed

Request 1: [System prompt + Context A + Question 1]
           ^^^^^^^^^^^^^^^^^^^^^^^^
           This prefix gets cached

Request 2: [System prompt + Context A + Question 2]
           ^^^^^^^^^^^^^^^^^^^^^^^^
           Cache hit! Only Question 2 is processed

Supported Providers

Provider	Cache Read Discount	Cache Write Cost	Min Tokens
Anthropic	90% off	25% premium	1024
OpenAI	50% off	Same as input	1024
DeepSeek	90% off	Same as input	64
Google	75% off	25% premium	32768

Discounts are applied automatically. LemonData passes through the provider’s cache pricing to you.

Identifying Cache Usage

In Usage Logs

Your usage logs show detailed cache token breakdown:

Field	Description
`cacheReadTokens`	Tokens served from provider cache (discounted)
`cacheWriteTokens`	Tokens written to cache (for future requests)
`nonCachedPromptTokens`	Tokens processed without cache

In Transactions

Transactions show a Provider Cache label when upstream caching was used:

Cache (sky blue): Platform semantic cache hit - 90% discount
Provider Cache (teal): Upstream prompt cache hit - discounted rates

Cost Calculation Example

For a request with 10,000 input tokens to Claude (Anthropic): Without cache:

10,000 tokens × $3.00/1M = $0.030

With provider cache (8,000 cached + 2,000 new):

Cache read:  8,000 tokens × $0.30/1M = $0.0024  (90% off)
Cache write: 2,000 tokens × $3.75/1M = $0.0075  (25% premium)
Total: $0.0099 (67% savings)

Best Practices

Use consistent system prompts

Place your system prompt and static context at the beginning of your messages. This maximizes cache hit potential.

Batch similar requests

Send requests with the same prefix close together in time to benefit from cache before it expires.

Meet minimum token requirements

Ensure your cacheable prefix meets the provider’s minimum (e.g., 1024 tokens for Anthropic/OpenAI).

Monitor cache metrics

Check your dashboard usage statistics for cache hit rates and savings.

Platform Cache vs Provider Cache

Aspect	Platform Cache	Provider Cache
Matching	Semantic similarity	Exact prefix match
Cost	10% of normal price	Discounted rates
Latency	Instant (~1ms)	Reduced (skip processing)
Control	Dashboard settings	Automatic
Scope	Cross-user (optional)	Per-API-key

When Each Applies

Request arrives
    │
    ▼
┌─────────────────────┐
│ Platform Cache Hit? │
└─────────────────────┘
    │ Yes              │ No
    ▼                  ▼
┌─────────┐    ┌─────────────────────┐
│ Return  │    │ Call Upstream API   │
│ Cached  │    └─────────────────────┘
│ (10%)   │            │
└─────────┘            ▼
               ┌─────────────────────┐
               │ Provider Cache Hit? │
               └─────────────────────┘
                   │ Yes        │ No
                   ▼            ▼
               Discounted    Full Price
               Token Rate    Token Rate

Checking Cache Status

Response Headers

X-Cache-Status: HIT           # Platform cache hit
X-Cache-Status: MISS          # No platform cache
X-Upstream-Cache-Read: 8000   # Provider cache read tokens
X-Upstream-Cache-Write: 2000  # Provider cache write tokens

Usage API

Usage logs API is not public yet. Check cache breakdown via response headers and the dashboard:

GET /v1/usage/logs is currently not a public endpoint.
Use X-Cache-Status and X-Upstream-Cache-* response headers, plus the dashboard usage page.

Response includes:

{
  "promptTokens": 10000,
  "cacheReadTokens": 8000,
  "cacheWriteTokens": 2000,
  "nonCachedPromptTokens": 0,
  "completionTokens": 500,
  "cost": 0.0099
}

FAQ

Can I disable provider caching?

Provider caching is automatic and cannot be disabled. However, it only benefits you (lower costs), so there’s no reason to disable it.

Why didn't my request hit provider cache?

Common reasons:

Prefix changed (even one token difference)
Cache expired (typically 5-60 minutes)
Prefix too short (below minimum tokens)
Different API key used

Does BYOK support provider caching?

Yes! When using your own API keys (BYOK), provider caching works the same way. The cache is tied to your upstream API key.

How do I maximize cache savings?

Use platform semantic cache for repeated similar queries
Structure prompts with static content first
Keep system prompts consistent across requests
Send related requests in quick succession

Getting Started

Core Guides

Coding Agents

✨ Upstream Prompt Cache

Overview

How Provider Prompt Cache Works

Key Characteristics

Supported Providers

Identifying Cache Usage

In Usage Logs

In Transactions

Cost Calculation Example

Best Practices

Platform Cache vs Provider Cache

When Each Applies

Checking Cache Status

Response Headers

Usage API

FAQ

Getting Started

Core Guides

Coding Agents

​Overview

​How Provider Prompt Cache Works

​Key Characteristics

​Supported Providers

​Identifying Cache Usage

​In Usage Logs

​In Transactions

​Cost Calculation Example

​Best Practices

​Platform Cache vs Provider Cache

​When Each Applies

​Checking Cache Status

​Response Headers

​Usage API

​FAQ

Overview

How Provider Prompt Cache Works

Key Characteristics

Supported Providers

Identifying Cache Usage

In Usage Logs

In Transactions

Cost Calculation Example

Best Practices

Platform Cache vs Provider Cache

When Each Applies

Checking Cache Status

Response Headers

Usage API

FAQ