Rate Limits

Overview

LemonData implements rate limits to ensure fair usage and platform stability. Limits vary by account tier.

Rate Limit Tiers

Tier	Requests/min	Description
User	1,000	Default tier for all accounts
Partner	1,500	For integration partners
VIP	2,000	High-volume users

Rate Limit Headers

Every API response includes rate limit information:

X-RateLimit-Limit: 1000        # Your limit per minute
X-RateLimit-Remaining: 950     # Requests remaining
X-RateLimit-Reset: 1234567890  # Unix timestamp when limit resets

Rate Limit Exceeded

When you exceed the limit, you’ll receive a 429 response:

{
  "error": {
    "message": "Rate limit exceeded. Please slow down.",
    "type": "rate_limit_exceeded"
  }
}

With additional header:

Retry-After: 60  # Seconds to wait before retrying

Handling Rate Limits

Exponential Backoff

Implement exponential backoff for automatic retries:

import time
from openai import OpenAI, RateLimitError

client = OpenAI(
    api_key="sk-your-api-key",
    base_url="https://api.lemondata.cc/v1"
)

def make_request_with_backoff(messages, max_retries=5):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model="gpt-4o",
                messages=messages
            )
        except RateLimitError as e:
            if attempt == max_retries - 1:
                raise

            wait_time = 2 ** attempt  # 1, 2, 4, 8, 16 seconds
            print(f"Rate limited. Waiting {wait_time}s...")
            time.sleep(wait_time)

Request Queuing

For high-volume applications, implement a request queue:

import asyncio
from collections import deque

class RateLimitedClient:
    def __init__(self, requests_per_minute=1000):
        self.rpm = requests_per_minute
        self.interval = 60 / requests_per_minute
        self.last_request = 0

    async def request(self, messages):
        # Wait if needed to respect rate limit
        now = asyncio.get_event_loop().time()
        wait_time = max(0, self.last_request + self.interval - now)
        if wait_time > 0:
            await asyncio.sleep(wait_time)

        self.last_request = asyncio.get_event_loop().time()
        return await self.client.chat.completions.create(
            model="gpt-4o",
            messages=messages
        )

Batch Processing

For bulk operations, process in batches with delays:

def process_batch(items, batch_size=50, delay=1):
    results = []
    for i in range(0, len(items), batch_size):
        batch = items[i:i + batch_size]
        for item in batch:
            result = client.chat.completions.create(
                model="gpt-4o",
                messages=[{"role": "user", "content": item}]
            )
            results.append(result)
        time.sleep(delay)  # Pause between batches
    return results

Best Practices

Monitor your usage

Track rate limit headers to stay under limits proactively.

Implement caching

Cache responses for identical requests to reduce API calls.

Use appropriate models

Faster models (like GPT-4o-mini) allow more throughput.

If you need higher limits, contact [email protected].

Upgrading Your Tier

To request a tier upgrade:

Log in to your Dashboard
Go to Settings → Account
Contact support with your use case

Or email [email protected] with:

Your account email
Expected request volume
Use case description

Getting Started

Guides

Integrations

Overview

Rate Limit Tiers

Rate Limit Headers

Rate Limit Exceeded

Handling Rate Limits

Exponential Backoff

Request Queuing

Batch Processing

Best Practices

Upgrading Your Tier

Getting Started

Guides

Integrations

​Overview

​Rate Limit Tiers

​Rate Limit Headers

​Rate Limit Exceeded

​Handling Rate Limits

​Exponential Backoff

​Request Queuing

​Batch Processing

​Best Practices

​Upgrading Your Tier

Overview

Rate Limit Tiers

Rate Limit Headers

Rate Limit Exceeded

Handling Rate Limits

Exponential Backoff

Request Queuing

Batch Processing

Best Practices

Upgrading Your Tier