最佳實務

模型選擇

選擇合適的模型會大幅影響成本與品質。

依任務類型的建議

任務	建議模型	原因
簡單問答	`gpt-5-mini`, `gemini-2.5-flash`	快速、便宜、已足夠應付需求
複雜推理	`gpt-5.4`, `claude-opus-4-6`, `deepseek-r1`	邏輯與規劃能力更佳
程式撰寫	`claude-sonnet-4-6`, `gpt-4o`, `deepseek-v3.2`	針對程式碼進行最佳化
創意寫作	`claude-sonnet-4-6`, `gpt-4o`	文字表達品質更佳
視覺/影像	`gpt-4o`, `claude-sonnet-4-6`, `gemini-2.5-flash`	原生支援視覺能力
長上下文	`gemini-2.5-pro`, `claude-sonnet-4-6`	1M+ token 視窗
成本敏感	`gpt-5-mini`, `gemini-2.5-flash`, `deepseek-v3.2`	最佳性價比

成本層級

$$$$ Premium: gpt-5.4, claude-opus-4-6
$$$  Standard: claude-sonnet-4-6, gpt-4o
$$   Budget:   gpt-5-mini, gemini-2.5-flash
$    Economy:  deepseek-v3.2, deepseek-r1

成本最佳化

1. 優先使用較小型的模型

def smart_query(question: str, complexity: str = "auto"):
    """Use cheaper models for simple tasks."""

    if complexity == "simple":
        model = "gpt-5-mini"
    elif complexity == "complex":
        model = "gpt-4o"
    else:
        # Start cheap, escalate if needed
        model = "gpt-5-mini"

    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": question}]
    )
    return response

2. 設定 max_tokens

請務必設定合理的 max_tokens 上限：

# ❌ Bad: No limit, could generate thousands of tokens
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Summarize this article"}]
)

# ✅ Good: Limit response length
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Summarize this article"}],
    max_tokens=500  # Reasonable limit for a summary
)

3. 最佳化 Prompt

# ❌ Verbose prompt (more input tokens)
prompt = """
I would like you to please help me by analyzing the following text
and providing a comprehensive summary of the main points. Please be
thorough but also concise in your response. The text is as follows:
{text}
"""

# ✅ Concise prompt (fewer tokens)
prompt = "Summarize the key points:\n{text}"

4. 啟用快取

善用語意快取：

# For repeated similar queries, caching provides major savings
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What is machine learning?"}],
    temperature=0  # Deterministic = better cache hits
)

5. 批次處理相似請求

# ❌ Many small requests
for question in questions:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": question}]
    )

# ✅ Fewer larger requests
combined_prompt = "\n".join([f"{i+1}. {q}" for i, q in enumerate(questions)])
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": f"Answer each question:\n{combined_prompt}"}]
)

效能最佳化

1. 為 UX 使用串流回應

串流回應可改善使用者感受到的效能：

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a long essay"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

2. 互動式使用情境選擇快速模型

使用情境	建議	延遲
聊天 UI	`gpt-5-mini`, `gemini-2.5-flash`	~200ms 首個 token
Tab 補全	`claude-haiku-4-5`	~150ms 首個 token
背景處理	`gpt-4o`, `claude-sonnet-4-6`	~500ms 首個 token

3. 設定逾時

client = OpenAI(
    api_key="sk-your-key",
    base_url="https://api.lemondata.cc/v1",
    timeout=60.0  # 60 second timeout
)

可靠性

1. 實作重試機制

import time
from openai import RateLimitError, APIError

def chat_with_retry(messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model="gpt-4o",
                messages=messages
            )
        except RateLimitError:
            wait = 2 ** attempt
            print(f"Rate limited, waiting {wait}s...")
            time.sleep(wait)
        except APIError as e:
            if attempt == max_retries - 1:
                raise
            time.sleep(1)
    raise Exception("Max retries exceeded")

2. 妥善處理錯誤

from openai import APIError, AuthenticationError, RateLimitError

try:
    response = client.chat.completions.create(...)
except AuthenticationError:
    # Check API key
    notify_admin("Invalid API key")
except RateLimitError:
    # Queue for later or use backup
    add_to_queue(request)
except APIError as e:
    if e.status_code == 402:
        notify_admin("Balance low")
    elif e.status_code >= 500:
        # Server error, retry later
        schedule_retry(request)

3. 使用備援模型

FALLBACK_CHAIN = ["gpt-4o", "claude-sonnet-4-6", "gemini-2.5-flash"]

def chat_with_fallback(messages):
    for model in FALLBACK_CHAIN:
        try:
            return client.chat.completions.create(
                model=model,
                messages=messages
            )
        except APIError:
            continue
    raise Exception("All models failed")

安全性

1. 保護 API Keys

# ❌ Never hardcode keys
client = OpenAI(api_key="sk-abc123...")

# ✅ Use environment variables
import os
client = OpenAI(api_key=os.environ["LEMONDATA_API_KEY"])

2. 驗證使用者輸入

def validate_message(content: str) -> bool:
    """Validate user input before sending to API."""
    if len(content) > 100000:
        raise ValueError("Message too long")
    # Add other validation as needed
    return True

3. 設定 API Key 限制

為以下用途建立具備支出上限的獨立 API keys：

開發/測試
正式環境
不同應用程式

監控

1. 追蹤使用量

請定期檢查您的 dashboard，以掌握：

各模型的 token 使用量
成本明細
快取命中率
錯誤率

2. 記錄重要指標

import logging

response = client.chat.completions.create(...)

logging.info({
    "model": response.model,
    "prompt_tokens": response.usage.prompt_tokens,
    "completion_tokens": response.usage.completion_tokens,
    "total_tokens": response.usage.total_tokens,
})

3. 設定警示

在您的 dashboard 中設定低餘額警示，以避免服務中斷。

檢查清單

成本最佳化

效能

為互動式 UX 使用串流回應
即時使用情境採用快速模型
已設定逾時

可靠性

已實作重試邏輯
已建立錯誤處理機制
已設定備援模型

安全性

API keys 存放於環境變數中
輸入驗證
為 dev/prod 使用獨立 keys
已設定支出上限

計費與定價影片生成

快速入門

核心指南

Coding Agents

模型選擇

依任務類型的建議

成本層級

成本最佳化

1. 優先使用較小型的模型

2. 設定 max_tokens

3. 最佳化 Prompt

4. 啟用快取

5. 批次處理相似請求

效能最佳化

1. 為 UX 使用串流回應

2. 互動式使用情境選擇快速模型

3. 設定逾時

可靠性

1. 實作重試機制

2. 妥善處理錯誤

3. 使用備援模型

安全性

1. 保護 API Keys

2. 驗證使用者輸入

3. 設定 API Key 限制

監控

1. 追蹤使用量

2. 記錄重要指標

3. 設定警示

檢查清單

快速入門

核心指南

Coding Agents

​模型選擇

​依任務類型的建議

​成本層級

​成本最佳化

​1. 優先使用較小型的模型

​2. 設定 max_tokens

​3. 最佳化 Prompt

​4. 啟用快取

​5. 批次處理相似請求

​效能最佳化

​1. 為 UX 使用串流回應

​2. 互動式使用情境選擇快速模型

​3. 設定逾時

​可靠性

​1. 實作重試機制

​2. 妥善處理錯誤

​3. 使用備援模型

​安全性

​1. 保護 API Keys

​2. 驗證使用者輸入

​3. 設定 API Key 限制

​監控

​1. 追蹤使用量

​2. 記錄重要指標

​3. 設定警示

​檢查清單

模型選擇

依任務類型的建議

成本層級

成本最佳化

1. 優先使用較小型的模型

2. 設定 max_tokens

3. 最佳化 Prompt

4. 啟用快取

5. 批次處理相似請求

效能最佳化

1. 為 UX 使用串流回應

2. 互動式使用情境選擇快速模型

3. 設定逾時

可靠性

1. 實作重試機制

2. 妥善處理錯誤

3. 使用備援模型

安全性

1. 保護 API Keys

2. 驗證使用者輸入

3. 設定 API Key 限制

監控

1. 追蹤使用量

2. 記錄重要指標

3. 設定警示

檢查清單