Skip to main content

Overview

Streaming lets you receive partial output as it is generated, which improves perceived latency and user experience. For new OpenAI-style integrations, prefer Responses streaming first. If your framework still uses Chat Completions streaming, LemonData supports that compatibility path too.
curl https://api.lemondata.cc/v1/responses \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.4",
    "input": "Write a short poem.",
    "stream": true
  }'

Chat Completions Streaming

If your framework still expects SSE chunks from /v1/chat/completions, that also works:
stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a short poem"}],
    stream=True
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

Stream End Conditions

Typical completion conditions:
  • response.completed for Responses API streams
  • finish_reason: "stop" for Chat Completions streams
  • finish_reason: "length" when a token limit is hit
  • tool/function call events when the model wants to use tools

Web App Pattern

async function streamChat(message) {
  const response = await fetch('https://api.lemondata.cc/v1/chat/completions', {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer sk-your-api-key',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      model: 'gpt-4o',
      messages: [{ role: 'user', content: message }],
      stream: true
    })
  });

  const reader = response.body.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const chunk = decoder.decode(value);
    const lines = chunk.split('\\n').filter(line => line.startsWith('data: '));

    for (const line of lines) {
      const data = line.slice(6);
      if (data === '[DONE]') return;
      const parsed = JSON.parse(data);
      const content = parsed.choices?.[0]?.delta?.content;
      if (content) {
        document.getElementById('output').textContent += content;
      }
    }
  }
}

Best Practices

Use /v1/responses if your SDK or app already supports it. Keep /v1/chat/completions streaming for compatibility-driven integrations.
Append delta chunks to the UI or terminal as they arrive rather than waiting for the full response.
Treat network drops and upstream disconnects as normal failure modes and reconnect carefully for long-running sessions.