Zum Hauptinhalt springen

Übersicht

Streaming ermöglicht es Ihnen, Teilantworten zu erhalten, während diese generiert werden, was eine bessere Benutzererfahrung für Chat-Anwendungen bietet.

Streaming aktivieren

Setzen Sie stream: true in Ihrer Anfrage:
curl https://api.lemondata.cc/v1/chat/completions \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Write a short poem"}],
    "stream": true
  }'

Format der Streaming-Antwort

Jeder Chunk im Stream folgt diesem Format:
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":" world"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4o","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Umgang mit dem Stream-Ende

Der Stream endet mit:
  • finish_reason: "stop" - Normale Fertigstellung
  • finish_reason: "length" - max_tokens Limit erreicht
  • finish_reason: "tool_calls" - Modell möchte ein Tool aufrufen
  • data: [DONE] - Letzte Nachricht

Sammeln der vollständigen Antwort

Um die vollständige Antwort während des Streamings zu sammeln:
full_response = ""

for chunk in stream:
    if chunk.choices[0].delta.content:
        content = chunk.choices[0].delta.content
        full_response += content
        print(content, end="", flush=True)

print(f"\n\nFull response: {full_response}")

Asynchrones Streaming

Für asynchrone Anwendungen:
import asyncio
from openai import AsyncOpenAI

async def main():
    client = AsyncOpenAI(
        api_key="sk-your-api-key",
        base_url="https://api.lemondata.cc/v1"
    )

    stream = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}],
        stream=True
    )

    async for chunk in stream:
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="")

asyncio.run(main())

Beispiel für eine Web-Anwendung

Für eine Web-Chat-Schnittstelle:
async function streamChat(message) {
  const response = await fetch('https://api.lemondata.cc/v1/chat/completions', {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer sk-your-api-key',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      model: 'gpt-4o',
      messages: [{ role: 'user', content: message }],
      stream: true
    })
  });

  const reader = response.body.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const chunk = decoder.decode(value);
    const lines = chunk.split('\n').filter(line => line.startsWith('data: '));

    for (const line of lines) {
      const data = line.slice(6);
      if (data === '[DONE]') return;

      const parsed = JSON.parse(data);
      const content = parsed.choices[0]?.delta?.content;
      if (content) {
        // Append to your UI
        document.getElementById('output').textContent += content;
      }
    }
  }
}