๐Ÿค–

How Does ChatGPT Streaming Work?

Real-time Token-by-Token Response Delivery via SSE

When you set stream: true on OpenAI's Chat Completions API, the response is not returned all at once but transmitted in SSE (Server-Sent Events) format as each token is generated. The client receives this stream via EventSource or fetch + ReadableStream, with each chunk in data: {"choices":[{"delta":{"content":"Hi"}}]} format. The stream ends when data: [DONE] is received. Users can see results in real-time while the LLM is generating.

Architecture Diagram

๐ŸŒ
ChatGPT UI
fetch + ReadableStream
โ‘  POST stream:true
text/event-stream
data: {"delta":{"content":"์•ˆ"}}
data: {"delta":{"content":"๋…•"}}
data: {"delta":{"content":"ํ•˜"}}
data: {"delta":{"content":"์„ธ"}}
data: [DONE]
๐Ÿง 
OpenAI API
LLM token generation
Generate tokens one by one โ†’ send immediately
What appears on screen:
์•ˆ๋…•ํ•˜์„ธ
Key point: Instead of waiting for the full answer, <strong>tokens are sent via SSE as soon as they are generated</strong>
Why SSE? (Not WebSocket)
  • Single request (POST), only response is streamed โ†’ unidirectional is sufficient
  • HTTP-based, good CDN/proxy compatibility
  • On disconnection, retry with new request (stateless)
  • Simpler server implementation compared to WebSocket

How It Works

1

Client sends request to POST /v1/chat/completions with stream: true

2

Server starts response with Content-Type: text/event-stream

3

LLM generates token โ†’ data: {"delta":{"content":"Hi"}} sent immediately

4

Client receives each chunk and appends to UI

5

All tokens generated โ†’ data: [DONE] sent

6

Client handles stream termination

Pros

  • Dramatically improved perceived response speed (minimized TTFT)
  • No need to wait for full LLM generation
  • Simple implementation as it is HTTP-based
  • Can cancel mid-stream (AbortController)

Cons

  • Token-by-token processing logic required
  • Complex error handling (mid-stream disconnection)
  • Total token count unknown in advance
  • Client buffering management needed

Use Cases

ChatGPT / Claude web UI AI coding assistants AI chatbot interfaces Real-time document summarization/translation display