Streaming Responses with Claude API in Python (2026)

Streaming sends Claude’s response token by token as it’s generated, instead of waiting for the full completion before showing anything. For a chat UI this is the difference between a user staring at a spinner for several seconds and seeing the first words appear within a few hundred milliseconds. The Claude API Tutorial introduces the basic stream.text_stream helper — this guide covers the full picture: the raw event stream, async streaming, error handling, and a complete FastAPI endpoint that streams Claude’s output to a browser.


Prerequisites

pip install anthropic
# for the API endpoint example later:
pip install fastapi uvicorn

The Simple Way: text_stream

For the common case — print or display text as it arrives — use the messages.stream() context manager and iterate stream.text_stream. It yields plain text chunks, already de-noised from the underlying event protocol:

from anthropic import Anthropic

client = Anthropic()

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a haiku about debugging."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

    final_message = stream.get_final_message()

print(f"\n\nstop_reason: {final_message.stop_reason}")
print(f"output tokens: {final_message.usage.output_tokens}")

stream.get_final_message() must be called inside (or after) the with block, after the iteration finishes. It returns the same Message object you’d get from a non-streaming call — complete content, stop_reason, and usage — without you having to reassemble it from chunks.


The Raw Event Stream

text_stream is built on top of a lower-level stream of typed events. Iterate the stream object directly to see all of them:

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a haiku about debugging."}],
) as stream:
    for event in stream:
        print(event.type)

The event types you’ll see, in order:

  • message_start — the initial Message shell: empty content, role, model, and usage.input_tokens
  • content_block_start — a new content block begins; event.index and event.content_block show its type (text, tool_use, etc.)
  • content_block_delta — incremental content; event.delta.type is text_delta (has .text), input_json_delta (has .partial_json, for tool inputs), or thinking_delta (extended thinking)
  • content_block_stop — the block at event.index is complete
  • message_delta — top-level changes: event.delta.stop_reason and updated event.usage.output_tokens
  • message_stop — the stream is finished

Handling text manually looks like this — useful if you need to react to message_delta (e.g. update a token counter live) while still streaming text:

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a haiku about debugging."}],
) as stream:
    for event in stream:
        if event.type == "content_block_delta" and event.delta.type == "text_delta":
            print(event.delta.text, end="", flush=True)
        elif event.type == "message_delta":
            print(f"\n[tokens so far: {event.usage.output_tokens}]", end="")

Async Streaming

For web backends, use AsyncAnthropic so the stream doesn’t block the event loop. The interface is identical, just with async with / async for:

import asyncio
from anthropic import AsyncAnthropic

client = AsyncAnthropic()


async def main():
    async with client.messages.stream(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Write a haiku about debugging."}],
    ) as stream:
        async for text in stream.text_stream:
            print(text, end="", flush=True)


asyncio.run(main())

Building a Streaming API Endpoint (FastAPI + SSE)

To stream Claude’s output to a browser, forward each text chunk as a Server-Sent Event. FastAPI’s StreamingResponse accepts an async generator — wrap the Claude stream directly:

from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from anthropic import AsyncAnthropic

app = FastAPI()
client = AsyncAnthropic()


@app.get("/chat")
async def chat(message: str):
    async def event_stream():
        async with client.messages.stream(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            messages=[{"role": "user", "content": message}],
        ) as stream:
            async for text in stream.text_stream:
                yield f"data: {text}\n\n"

        yield "event: done\ndata: {}\n\n"

    return StreamingResponse(
        event_stream(),
        media_type="text/event-stream",
        headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
    )

X-Accel-Buffering: no stops nginx from buffering the whole response before sending it — without it, “streaming” arrives in one burst at the end. On the frontend, read the stream with fetch and a ReadableStream reader, or use the browser’s EventSource for GET-only endpoints. Each data: ... line is one text chunk; append it to the UI as it arrives.


Handling Errors and Interruptions

Streaming requests can fail mid-stream — rate limits, network drops, or overload errors. Wrap the stream in a try/except and decide how to surface a partial response:

import anthropic

try:
    with client.messages.stream(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Write a haiku about debugging."}],
    ) as stream:
        for text in stream.text_stream:
            print(text, end="", flush=True)
except anthropic.APIConnectionError:
    print("\n[connection lost — showing partial response]")
except anthropic.RateLimitError:
    print("\n[rate limited — retry shortly]")
except anthropic.APIStatusError as e:
    print(f"\n[API error {e.status_code}]")

If the client disconnects (e.g. the user closes the browser tab mid-response), exit the FastAPI generator early so the SDK calls stream.close() via the async with block — this stops billing for output tokens you’d otherwise generate into the void. FastAPI handles this automatically when the underlying connection drops and the generator is garbage-collected, but for long generations consider checking await request.is_disconnected() periodically inside the loop and breaking if true.


Streaming with Tool Use

Streaming works the same way when tools are provided — text still arrives via text_delta events, and tool arguments arrive incrementally via input_json_delta events on the relevant content block. stream.get_final_message() gives you fully-parsed tool_use blocks once the stream ends, exactly like a non-streaming response. See Claude API Function Calling for the complete tool-use loop, which works unchanged whether the underlying calls are streamed or not.


Best Practices

  • Use stream.get_final_message() for stop_reason and usage instead of manually accumulating message_delta events
  • For web backends, use AsyncAnthropic — a synchronous stream blocks the event loop for the entire generation
  • Set Cache-Control: no-cache and X-Accel-Buffering: no so proxies don’t buffer SSE responses
  • Detect client disconnects and stop the generation early — half-finished output you discard still consumes output tokens
  • Streaming doesn’t change pricing — total input/output tokens are billed the same whether streamed or not
  • Wrap streams in try/except for APIConnectionError, RateLimitError, and APIStatusError; decide whether to show a partial response or retry

Summary

  • stream.text_stream yields plain text chunks — the simplest way to display output as it’s generated
  • The raw event stream exposes message_start, content_block_start, content_block_delta (text_delta / input_json_delta / thinking_delta), content_block_stop, message_delta, and message_stop
  • stream.get_final_message() returns the complete Message with stop_reason and usage after streaming finishes
  • Use AsyncAnthropic with async with / async for for non-blocking streaming in web backends
  • FastAPI’s StreamingResponse + an async generator turns Claude’s stream into Server-Sent Events for the browser
  • Tool use streams the same way — text via text_delta, tool arguments via input_json_delta
  • Handle disconnects and API errors explicitly — streaming adds new failure modes mid-response that a single blocking call doesn’t have

Further reading: Claude API Tutorial for the full Messages API, and Claude API Function Calling for multi-step tool-use loops.


Subscribe to my newsletter — practical guides on Claude API, AI agents, RAG, and automation.

Subscribe