Overview
Streaming lets you receive partial output as it is generated, which improves perceived latency and user experience. For new OpenAI-style integrations, prefer Responses streaming first. If your framework still uses Chat Completions streaming, LemonData supports that compatibility path too.Recommended: Responses Streaming
Chat Completions Streaming
If your framework still expects SSE chunks from/v1/chat/completions, that also works:
Stream End Conditions
Typical completion conditions:response.completedfor Responses API streamsfinish_reason: "stop"for Chat Completions streamsfinish_reason: "length"when a token limit is hit- tool/function call events when the model wants to use tools
Web App Pattern
Best Practices
Prefer Responses streaming for new builds
Prefer Responses streaming for new builds
Use
/v1/responses if your SDK or app already supports it. Keep /v1/chat/completions streaming for compatibility-driven integrations.Flush output incrementally
Flush output incrementally
Append delta chunks to the UI or terminal as they arrive rather than waiting for the full response.
Handle disconnects and retries
Handle disconnects and retries
Treat network drops and upstream disconnects as normal failure modes and reconnect carefully for long-running sessions.