Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.lemondata.cc/llms.txt

Use this file to discover all available pages before exploring further.

Overview

LemonData implements rate limits to ensure fair usage and platform stability. Limits vary by account tier.

Rate Limit Tiers

TierRequests/minDescription
User1,000Default tier for all accounts
Partner3,000For integration partners
VIP10,000High-volume users
Rate limits are subject to change. Contact [email protected] for custom limits.

Rate Limit Response

When you exceed the rate limit, the API returns a 429 status code with a Retry-After header indicating how long to wait before retrying.

Rate Limit Exceeded

When you exceed the limit, you’ll receive a 429 response:
{
  "error": {
    "message": "Rate limit exceeded. Please retry later.",
    "type": "rate_limit_exceeded",
    "code": "rate_limit_exceeded"
  }
}
The response includes a Retry-After header:
Retry-After: 60  # Seconds to wait before retrying

Handling Rate Limits

Exponential Backoff

Implement exponential backoff for automatic retries:
import time
from openai import OpenAI, RateLimitError

client = OpenAI(
    api_key="sk-your-api-key",
    base_url="https://api.lemondata.cc/v1"
)

def make_request_with_backoff(messages, max_retries=5):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model="gpt-4o",
                messages=messages
            )
        except RateLimitError as e:
            if attempt == max_retries - 1:
                raise

            wait_time = 2 ** attempt  # 1, 2, 4, 8, 16 seconds
            print(f"Rate limited. Waiting {wait_time}s...")
            time.sleep(wait_time)

Request Queuing

For high-volume applications, implement a request queue:
import asyncio
from collections import deque

class RateLimitedClient:
    def __init__(self, requests_per_minute=60):
        self.rpm = requests_per_minute
        self.interval = 60 / requests_per_minute
        self.last_request = 0

    async def request(self, messages):
        # Wait if needed to respect rate limit
        now = asyncio.get_event_loop().time()
        wait_time = max(0, self.last_request + self.interval - now)
        if wait_time > 0:
            await asyncio.sleep(wait_time)

        self.last_request = asyncio.get_event_loop().time()
        return await self.client.chat.completions.create(
            model="gpt-4o",
            messages=messages
        )

Batch Processing

For bulk operations, process in batches with delays:
def process_batch(items, batch_size=50, delay=1):
    results = []
    for i in range(0, len(items), batch_size):
        batch = items[i:i + batch_size]
        for item in batch:
            result = client.chat.completions.create(
                model="gpt-4o",
                messages=[{"role": "user", "content": item}]
            )
            results.append(result)
        time.sleep(delay)  # Pause between batches
    return results

Best Practices

Track rate limit headers to stay under limits proactively.
Cache responses for identical requests to reduce API calls.
Faster models (like gpt-5-mini) allow more throughput.
If you need higher limits, contact [email protected].

Upgrading Your Tier

To request a tier upgrade:
  1. Log in to your Dashboard
  2. Go to Settings → Account
  3. Contact support with your use case
Or email [email protected] with:
  • Your account email
  • Expected request volume
  • Use case description