Langsung ke konten utama

Documentation Index

Fetch the complete documentation index at: https://docs.lemondata.cc/llms.txt

Use this file to discover all available pages before exploring further.

Pemilihan Model

Memilih model yang tepat dapat berdampak signifikan pada biaya dan kualitas.

Rekomendasi Berdasarkan Tugas

TugasModel yang DirekomendasikanAlasan
Tanya Jawab Sederhanagpt-5-mini, gemini-2.5-flashCepat, murah, sudah memadai
Penalaran kompleksgpt-5.4, claude-opus-4-6, deepseek-r1Logika dan perencanaan yang lebih baik
Codingclaude-sonnet-4-6, gpt-4o, deepseek-v3.2Dioptimalkan untuk code
Penulisan kreatifclaude-sonnet-4-6, gpt-4oKualitas prosa yang lebih baik
Vision/Gambargpt-4o, claude-sonnet-4-6, gemini-2.5-flashDukungan vision native
Konteks panjanggemini-2.5-pro, claude-sonnet-4-6Jendela token 1M+
Sensitif terhadap biayagpt-5-mini, gemini-2.5-flash, deepseek-v3.2Nilai terbaik

Tingkatan Biaya

$$$$ Premium: gpt-5.4, claude-opus-4-6
$$$  Standar: claude-sonnet-4-6, gpt-4o
$$   Hemat:   gpt-5-mini, gemini-2.5-flash
$    Ekonomis:  deepseek-v3.2, deepseek-r1

Optimasi Biaya

1. Gunakan Model yang Lebih Kecil Terlebih Dahulu

def smart_query(question: str, complexity: str = "auto"):
    """Use cheaper models for simple tasks."""

    if complexity == "simple":
        model = "gpt-5-mini"
    elif complexity == "complex":
        model = "gpt-4o"
    else:
        # Start cheap, escalate if needed
        model = "gpt-5-mini"

    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": question}]
    )
    return response

2. Tetapkan max_tokens

Selalu tetapkan batas max_tokens yang wajar:
# ❌ Bad: No limit, could generate thousands of tokens
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Summarize this article"}]
)

# ✅ Good: Limit response length
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Summarize this article"}],
    max_tokens=500  # Reasonable limit for a summary
)

3. Optimalkan Prompt

# ❌ Verbose prompt (more input tokens)
prompt = """
I would like you to please help me by analyzing the following text
and providing a comprehensive summary of the main points. Please be
thorough but also concise in your response. The text is as follows:
{text}
"""

# ✅ Concise prompt (fewer tokens)
prompt = "Summarize the key points:\n{text}"

4. Aktifkan Caching

Manfaatkan semantic caching:
# For repeated similar queries, caching provides major savings
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What is machine learning?"}],
    temperature=0  # Deterministic = better cache hits
)

5. Batch Permintaan yang Serupa

# ❌ Many small requests
for question in questions:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": question}]
    )

# ✅ Fewer larger requests
combined_prompt = "\n".join([f"{i+1}. {q}" for i, q in enumerate(questions)])
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": f"Answer each question:\n{combined_prompt}"}]
)

Optimasi Performa

1. Gunakan Streaming untuk UX

Streaming meningkatkan performa yang dirasakan:
stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a long essay"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

2. Pilih Model Cepat untuk Penggunaan Interaktif

Kasus PenggunaanDirekomendasikanLatensi
Chat UIgpt-5-mini, gemini-2.5-flash~200ms token pertama
Penyelesaian tabclaude-haiku-4-5~150ms token pertama
Pemrosesan latar belakanggpt-4o, claude-sonnet-4-6~500ms token pertama

3. Tetapkan Timeout

client = OpenAI(
    api_key="sk-your-key",
    base_url="https://api.lemondata.cc/v1",
    timeout=60.0  # 60 second timeout
)

Keandalan

1. Implementasikan Retry

import time
from openai import RateLimitError, APIError

def chat_with_retry(messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model="gpt-4o",
                messages=messages
            )
        except RateLimitError:
            wait = 2 ** attempt
            print(f"Rate limited, waiting {wait}s...")
            time.sleep(wait)
        except APIError as e:
            if attempt == max_retries - 1:
                raise
            time.sleep(1)
    raise Exception("Max retries exceeded")

2. Tangani Error dengan Baik

from openai import APIError, AuthenticationError, RateLimitError

try:
    response = client.chat.completions.create(...)
except AuthenticationError:
    # Check API key
    notify_admin("Invalid API key")
except RateLimitError:
    # Queue for later or use backup
    add_to_queue(request)
except APIError as e:
    if e.status_code == 402:
        notify_admin("Balance low")
    elif e.status_code >= 500:
        # Server error, retry later
        schedule_retry(request)

3. Gunakan Model Cadangan

FALLBACK_CHAIN = ["gpt-4o", "claude-sonnet-4-6", "gemini-2.5-flash"]

def chat_with_fallback(messages):
    for model in FALLBACK_CHAIN:
        try:
            return client.chat.completions.create(
                model=model,
                messages=messages
            )
        except APIError:
            continue
    raise Exception("All models failed")

Keamanan

1. Lindungi API Key

# ❌ Never hardcode keys
client = OpenAI(api_key="sk-abc123...")

# ✅ Use environment variables
import os
client = OpenAI(api_key=os.environ["LEMONDATA_API_KEY"])

2. Validasi Input Pengguna

def validate_message(content: str) -> bool:
    """Validate user input before sending to API."""
    if len(content) > 100000:
        raise ValueError("Message too long")
    # Add other validation as needed
    return True

3. Tetapkan Batas API Key

Buat API key terpisah dengan batas pengeluaran untuk:
  • Pengembangan/pengujian
  • Produksi
  • Aplikasi yang berbeda

Monitoring

1. Lacak Penggunaan

Periksa dashboard Anda secara berkala untuk:
  • Penggunaan token per model
  • Rincian biaya
  • Rasio cache hit
  • Rasio error

2. Catat Metrik Penting

import logging

response = client.chat.completions.create(...)

logging.info({
    "model": response.model,
    "prompt_tokens": response.usage.prompt_tokens,
    "completion_tokens": response.usage.completion_tokens,
    "total_tokens": response.usage.total_tokens,
})

3. Siapkan Alert

Konfigurasikan alert saldo rendah di dashboard Anda untuk menghindari gangguan layanan.

Checklist

  • Menggunakan model yang sesuai untuk setiap tugas
  • Menetapkan batas max_tokens
  • Prompt ringkas
  • Caching diaktifkan jika sesuai
  • Melakukan batch permintaan serupa
  • Streaming untuk UX interaktif
  • Model cepat untuk penggunaan real-time
  • Timeout dikonfigurasi
  • Logika retry diimplementasikan
  • Penanganan error tersedia
  • Model fallback dikonfigurasi
  • API key di environment variables
  • Validasi input
  • Key terpisah untuk dev/prod
  • Batas pengeluaran ditetapkan