✨ Agent-First API

Overview

LemonData’s Agent-First API enriches error responses with structured hints that AI agents can parse and act on immediately — no web searches, no doc lookups, no guesswork. Every error response includes optional fields like did_you_mean, suggestions, hint, retryable, and retry_after inside the standard error object. These fields are backward-compatible — clients that don’t use them see no difference.

Error Hint Fields

All hint fields are optional extensions inside the error object:

Field	Type	Description
`did_you_mean`	`string`	Closest matching model name
`suggestions`	`array`	Recommended models with metadata
`alternatives`	`array`	Currently available alternative models
`hint`	`string`	Human/agent-readable next-step guidance
`retryable`	`boolean`	Whether retrying the same request may succeed
`retry_after`	`number`	Seconds to wait before retrying
`balance_usd`	`number`	Current account balance in USD
`estimated_cost_usd`	`number`	Estimated cost of the failed request

Error Code Examples

model_not_found (400)

When a model name doesn’t match any active model:

{
  "error": {
    "message": "Model 'gpt5' not found",
    "type": "invalid_request_error",
    "param": "model",
    "code": "model_not_found",
    "did_you_mean": "gpt-4o",
    "suggestions": [
      {"id": "gpt-4o"},
      {"id": "gpt-4o-mini"},
      {"id": "claude-sonnet-4-5"}
    ],
    "hint": "Did you mean 'gpt-4o'? Use GET /v1/models to list all available models."
  }
}

The did_you_mean resolution uses:

Static alias mapping (from production error data)
Normalized string matching (strips hyphens, case-insensitive)
Edit distance matching (threshold ≤ 3)

insufficient_balance (402)

When account balance is too low for the estimated cost:

{
  "error": {
    "message": "Insufficient balance: need ~$0.3500 for claude-sonnet-4-5, but balance is $0.1200.",
    "type": "insufficient_balance",
    "code": "insufficient_balance",
    "balance_usd": 0.12,
    "estimated_cost_usd": 0.35,
    "suggestions": [
      {"id": "gpt-4o-mini"},
      {"id": "deepseek-chat"}
    ],
    "hint": "Insufficient balance: need ~$0.3500 for claude-sonnet-4-5, but balance is $0.1200. Try a cheaper model, or top up at https://lemondata.cc/dashboard/billing."
  }
}

suggestions contains models cheaper than the estimated cost that the agent can switch to.

all_channels_failed (503)

When all upstream channels for a model are unavailable:

{
  "error": {
    "message": "Model claude-opus-4-6 temporarily unavailable",
    "code": "all_channels_failed",
    "retryable": true,
    "retry_after": 30,
    "alternatives": [
      {"id": "claude-sonnet-4-5", "status": "available", "tags": []},
      {"id": "gpt-4o", "status": "available", "tags": []}
    ],
    "hint": "All channels for 'claude-opus-4-6' are temporarily unavailable. Retry in 30s or try an alternative model."
  }
}

retryable is false when the reason is no_channels (no channels configured for this model). It’s true only for transient failures like circuit breaker trips or quota exhaustion.

rate_limit_exceeded (429)

{
  "error": {
    "message": "Rate limit: 60 rpm exceeded",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded",
    "retryable": true,
    "retry_after": 8,
    "hint": "Rate limited. Retry after 8s. Current limit: 60/min for user role."
  }
}

The retry_after value is calculated from the actual rate limit window reset time.

context_length_exceeded (400)

When input exceeds the model’s context window (upstream error, enriched with hints):

{
  "error": {
    "message": "This model's maximum context length is 128000 tokens...",
    "type": "invalid_request_error",
    "code": "context_length_exceeded",
    "retryable": false,
    "suggestions": [
      {"id": "gemini-2.5-pro"},
      {"id": "claude-sonnet-4-5"}
    ],
    "hint": "Reduce your input or switch to a model with a larger context window."
  }
}

Native Endpoint Headers

When you call /v1/chat/completions with a model that has a native endpoint (Anthropic or Gemini), the success response includes optimization headers:

X-LemonData-Hint: This model supports native Anthropic format. Use POST /v1/messages for better performance (no format conversion).
X-LemonData-Native-Endpoint: /v1/messages

Model Provider	Suggested Endpoint	Benefit
Anthropic (Claude)	`/v1/messages`	No format conversion, extended thinking, prompt caching
Google (Gemini)	`/v1beta/gemini`	No format conversion, grounding, safety settings
OpenAI	—	Chat completions is already the native format

These headers appear on both streaming and non-streaming responses.

/v1/models Enhancements

Three new fields in the lemondata extension of each model object:

{
  "id": "gpt-4o",
  "lemondata": {
    "category": "chat",
    "pricing_unit": "per_token",
    "cache_pricing": {
      "cache_read_per_1m": "1.25",
      "cache_write_per_1m": "2.50",
      "platform_cache_discount": 0.9
    }
  }
}

Field	Values	Description
`category`	`chat`, `image`, `video`, `audio`, `tts`, `stt`, `3d`, `embedding`, `rerank`	Model type
`pricing_unit`	`per_token`, `per_image`, `per_second`, `per_request`	How the model is billed
`cache_pricing`	object or `null`	Upstream prompt cache prices + platform semantic cache discount

Category Filtering

GET /v1/models?category=chat          # Chat models only
GET /v1/models?category=image         # Image generation models
GET /v1/models?tag=coding&category=chat  # Coding-optimized chat models

llms.txt

A machine-readable API overview is available at:

GET https://api.lemondata.cc/llms.txt

It includes:

First-call template with a working example
Common model names (dynamically generated from usage data)
All 12 API endpoints
Filter parameters for model discovery
Error handling guidance

AI agents that read llms.txt before their first API call can typically succeed on the first attempt.

Usage in Agent Code

Python (OpenAI SDK)

from openai import OpenAI, BadRequestError

client = OpenAI(
    api_key="sk-your-key",
    base_url="https://api.lemondata.cc/v1"
)

def smart_chat(messages, model="gpt-4o"):
    try:
        return client.chat.completions.create(
            model=model, messages=messages
        )
    except BadRequestError as e:
        error = e.body.get("error", {}) if isinstance(e.body, dict) else {}
        # Use did_you_mean for auto-correction
        if error.get("code") == "model_not_found" and error.get("did_you_mean"):
            return client.chat.completions.create(
                model=error["did_you_mean"], messages=messages
            )
        raise

JavaScript (OpenAI SDK)

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'sk-your-key',
  baseURL: 'https://api.lemondata.cc/v1'
});

async function smartChat(messages, model = 'gpt-4o') {
  try {
    return await client.chat.completions.create({ model, messages });
  } catch (error) {
    const err = error?.error;
    if (err?.code === 'model_not_found' && err?.did_you_mean) {
      return client.chat.completions.create({
        model: err.did_you_mean, messages
      });
    }
    throw error;
  }
}

Design Principles

Fail fast, fail informatively

Errors return immediately with all the data an agent needs to self-correct.

No auto-routing

The API never silently substitutes a different model. The agent decides.

Data-driven suggestions

All recommendations come from production data, not hardcoded lists.

Backward compatible

All hint fields are optional. Existing clients see no difference.

Getting Started

Guides

✨ Agent-First API

Overview

Error Hint Fields

Error Code Examples

model_not_found (400)

insufficient_balance (402)

all_channels_failed (503)

rate_limit_exceeded (429)

context_length_exceeded (400)

Native Endpoint Headers

/v1/models Enhancements

Category Filtering

llms.txt

Usage in Agent Code

Python (OpenAI SDK)

JavaScript (OpenAI SDK)

Design Principles

Fail fast, fail informatively

No auto-routing

Data-driven suggestions

Backward compatible

Getting Started

Guides

​Overview

​Error Hint Fields

​Error Code Examples

​model_not_found (400)

​insufficient_balance (402)

​all_channels_failed (503)

​rate_limit_exceeded (429)

​context_length_exceeded (400)

​Native Endpoint Headers

​/v1/models Enhancements

​Category Filtering

​llms.txt

​Usage in Agent Code

​Python (OpenAI SDK)

​JavaScript (OpenAI SDK)

​Design Principles

Fail fast, fail informatively

No auto-routing

Data-driven suggestions

Backward compatible

Overview

Error Hint Fields

Error Code Examples

model_not_found (400)

insufficient_balance (402)

all_channels_failed (503)

rate_limit_exceeded (429)

context_length_exceeded (400)

Native Endpoint Headers

/v1/models Enhancements

Category Filtering

llms.txt

Usage in Agent Code

Python (OpenAI SDK)

JavaScript (OpenAI SDK)

Design Principles