Skip to main content

Overview

LemonData’s Agent-First API enriches error responses with structured hints that AI agents can parse and act on immediately — no web searches, no doc lookups, no guesswork. Every error response includes optional fields like did_you_mean, suggestions, hint, retryable, and retry_after inside the standard error object. These fields are backward-compatible — clients that don’t use them see no difference.

Error Hint Fields

All hint fields are optional extensions inside the error object:
FieldTypeDescription
did_you_meanstringClosest matching model name
suggestionsarrayRecommended models with metadata
alternativesarrayCurrently available alternative models
hintstringHuman/agent-readable next-step guidance
retryablebooleanWhether retrying the same request may succeed
retry_afternumberSeconds to wait before retrying
balance_usdnumberCurrent account balance in USD
estimated_cost_usdnumberEstimated cost of the failed request

Error Code Examples

model_not_found (400)

When a model name doesn’t match any active model:
{
  "error": {
    "message": "Model 'gpt5' not found",
    "type": "invalid_request_error",
    "param": "model",
    "code": "model_not_found",
    "did_you_mean": "gpt-4o",
    "suggestions": [
      {"id": "gpt-4o"},
      {"id": "gpt-4o-mini"},
      {"id": "claude-sonnet-4-5"}
    ],
    "hint": "Did you mean 'gpt-4o'? Use GET /v1/models to list all available models."
  }
}
The did_you_mean resolution uses:
  1. Static alias mapping (from production error data)
  2. Normalized string matching (strips hyphens, case-insensitive)
  3. Edit distance matching (threshold ≤ 3)

insufficient_balance (402)

When account balance is too low for the estimated cost:
{
  "error": {
    "message": "Insufficient balance: need ~$0.3500 for claude-sonnet-4-5, but balance is $0.1200.",
    "type": "insufficient_balance",
    "code": "insufficient_balance",
    "balance_usd": 0.12,
    "estimated_cost_usd": 0.35,
    "suggestions": [
      {"id": "gpt-4o-mini"},
      {"id": "deepseek-chat"}
    ],
    "hint": "Insufficient balance: need ~$0.3500 for claude-sonnet-4-5, but balance is $0.1200. Try a cheaper model, or top up at https://lemondata.cc/dashboard/billing."
  }
}
suggestions contains models cheaper than the estimated cost that the agent can switch to.

all_channels_failed (503)

When all upstream channels for a model are unavailable:
{
  "error": {
    "message": "Model claude-opus-4-6 temporarily unavailable",
    "code": "all_channels_failed",
    "retryable": true,
    "retry_after": 30,
    "alternatives": [
      {"id": "claude-sonnet-4-5", "status": "available", "tags": []},
      {"id": "gpt-4o", "status": "available", "tags": []}
    ],
    "hint": "All channels for 'claude-opus-4-6' are temporarily unavailable. Retry in 30s or try an alternative model."
  }
}
retryable is false when the reason is no_channels (no channels configured for this model). It’s true only for transient failures like circuit breaker trips or quota exhaustion.

rate_limit_exceeded (429)

{
  "error": {
    "message": "Rate limit: 60 rpm exceeded",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded",
    "retryable": true,
    "retry_after": 8,
    "hint": "Rate limited. Retry after 8s. Current limit: 60/min for user role."
  }
}
The retry_after value is calculated from the actual rate limit window reset time.

context_length_exceeded (400)

When input exceeds the model’s context window (upstream error, enriched with hints):
{
  "error": {
    "message": "This model's maximum context length is 128000 tokens...",
    "type": "invalid_request_error",
    "code": "context_length_exceeded",
    "retryable": false,
    "suggestions": [
      {"id": "gemini-2.5-pro"},
      {"id": "claude-sonnet-4-5"}
    ],
    "hint": "Reduce your input or switch to a model with a larger context window."
  }
}

Native Endpoint Headers

When you call /v1/chat/completions with a model that has a native endpoint (Anthropic or Gemini), the success response includes optimization headers:
X-LemonData-Hint: This model supports native Anthropic format. Use POST /v1/messages for better performance (no format conversion).
X-LemonData-Native-Endpoint: /v1/messages
Model ProviderSuggested EndpointBenefit
Anthropic (Claude)/v1/messagesNo format conversion, extended thinking, prompt caching
Google (Gemini)/v1beta/geminiNo format conversion, grounding, safety settings
OpenAIChat completions is already the native format
These headers appear on both streaming and non-streaming responses.

/v1/models Enhancements

Three new fields in the lemondata extension of each model object:
{
  "id": "gpt-4o",
  "lemondata": {
    "category": "chat",
    "pricing_unit": "per_token",
    "cache_pricing": {
      "cache_read_per_1m": "1.25",
      "cache_write_per_1m": "2.50",
      "platform_cache_discount": 0.9
    }
  }
}
FieldValuesDescription
categorychat, image, video, audio, tts, stt, 3d, embedding, rerankModel type
pricing_unitper_token, per_image, per_second, per_requestHow the model is billed
cache_pricingobject or nullUpstream prompt cache prices + platform semantic cache discount

Category Filtering

GET /v1/models?category=chat          # Chat models only
GET /v1/models?category=image         # Image generation models
GET /v1/models?tag=coding&category=chat  # Coding-optimized chat models

llms.txt

A machine-readable API overview is available at:
GET https://api.lemondata.cc/llms.txt
It includes:
  • First-call template with a working example
  • Common model names (dynamically generated from usage data)
  • All 12 API endpoints
  • Filter parameters for model discovery
  • Error handling guidance
AI agents that read llms.txt before their first API call can typically succeed on the first attempt.

Usage in Agent Code

Python (OpenAI SDK)

from openai import OpenAI, BadRequestError

client = OpenAI(
    api_key="sk-your-key",
    base_url="https://api.lemondata.cc/v1"
)

def smart_chat(messages, model="gpt-4o"):
    try:
        return client.chat.completions.create(
            model=model, messages=messages
        )
    except BadRequestError as e:
        error = e.body.get("error", {}) if isinstance(e.body, dict) else {}
        # Use did_you_mean for auto-correction
        if error.get("code") == "model_not_found" and error.get("did_you_mean"):
            return client.chat.completions.create(
                model=error["did_you_mean"], messages=messages
            )
        raise

JavaScript (OpenAI SDK)

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'sk-your-key',
  baseURL: 'https://api.lemondata.cc/v1'
});

async function smartChat(messages, model = 'gpt-4o') {
  try {
    return await client.chat.completions.create({ model, messages });
  } catch (error) {
    const err = error?.error;
    if (err?.code === 'model_not_found' && err?.did_you_mean) {
      return client.chat.completions.create({
        model: err.did_you_mean, messages
      });
    }
    throw error;
  }
}

Design Principles

Fail fast, fail informatively

Errors return immediately with all the data an agent needs to self-correct.

No auto-routing

The API never silently substitutes a different model. The agent decides.

Data-driven suggestions

All recommendations come from production data, not hardcoded lists.

Backward compatible

All hint fields are optional. Existing clients see no difference.