A multi-agent system is a group of AI agents that each handle a specific task and pass results to one another. Instead of prompting one model to do everything, you split the work — a researcher finds facts, a writer drafts content, an editor reviews it. Each agent does less, but the overall output is dramatically better.
Why Multi-Agent?
Single-agent LLM calls hit limits quickly:
- Context window overflow — one agent can’t hold a 200-page report and write a summary at the same time
- Quality degradation — asking one model to research AND write AND fact-check produces mediocre results at all three
- No parallelism — sequential prompts are slow; agents can run in parallel
- Hard to debug — when one big prompt fails, you don’t know which step went wrong
Multi-agent fixes all of these by giving each agent a narrow, well-defined job.
Core Architecture Patterns
1. Orchestrator → Workers
A central agent (orchestrator) breaks down the task and delegates to specialized sub-agents. Workers report back; the orchestrator assembles the final result.
Orchestrator
├── Research Agent → gathers data
├── Analysis Agent → interprets data
└── Writer Agent → produces output2. Pipeline (Sequential)
Output of agent N becomes input to agent N+1. Simple to reason about, easy to debug.
Input → Researcher → Drafter → Editor → Output3. Parallel Fan-Out
Orchestrator dispatches multiple agents simultaneously, then merges their results. Great for research tasks where sources are independent.
┌─ Agent A ─┐
Input ───┤─ Agent B ─├─── Merger ── Output
└─ Agent C ─┘Building a Simple Two-Agent Pipeline
A Researcher agent fetches raw information; a Writer agent turns it into a polished article section. Both use Claude:
import anthropic
client = anthropic.Anthropic()
def researcher(topic: str) -> str:
"""Finds key facts about a topic."""
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system=(
"You are a technical researcher. Given a topic, return 5-7 "
"concise bullet points with the most important facts, stats, "
"and concepts. Be specific — no filler."
),
messages=[{"role": "user", "content": f"Research topic: {topic}"}],
)
return response.content[0].text
def writer(topic: str, research: str) -> str:
"""Writes a clear explanation based on the research."""
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
system=(
"You are a technical writer for developers. "
"Write clear, practical prose — no fluff, no marketing speak."
),
messages=[
{
"role": "user",
"content": (
f"Topic: {topic}\n\n"
f"Research notes:\n{research}\n\n"
"Write a 3-paragraph explanation for a developer audience."
),
}
],
)
return response.content[0].text
# Run the pipeline
topic = "how transformer attention mechanisms work"
print("Step 1: Research...")
facts = researcher(topic)
print(facts)
print("\nStep 2: Write...")
article = writer(topic, facts)
print(article)Orchestrator Pattern
The orchestrator decides which agents to call and in what order. Here’s a clean implementation that handles the full research-write-review cycle:
import anthropic
from dataclasses import dataclass
@dataclass
class AgentResult:
agent: str
output: str
client = anthropic.Anthropic()
def run_agent(name: str, system: str, user: str, max_tokens: int = 1024) -> AgentResult:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=max_tokens,
system=system,
messages=[{"role": "user", "content": user}],
)
return AgentResult(agent=name, output=response.content[0].text)
def orchestrator(task: str) -> str:
print(f"Orchestrator: starting task — {task[:60]}...")
# Step 1: Research
research = run_agent(
name="Researcher",
system="You are a technical researcher. Return bullet-point facts only.",
user=f"Research: {task}",
)
print(f" [{research.agent}] done ({len(research.output)} chars)")
# Step 2: Draft
draft = run_agent(
name="Writer",
system="You are a technical writer. Write clearly for developers.",
user=f"Task: {task}\n\nResearch:\n{research.output}\n\nWrite a draft.",
max_tokens=2048,
)
print(f" [{draft.agent}] done ({len(draft.output)} chars)")
# Step 3: Review
final = run_agent(
name="Editor",
system=(
"You are a senior technical editor. "
"Fix clarity, remove redundancy, ensure accuracy. "
"Return only the improved text."
),
user=f"Edit this draft:\n\n{draft.output}",
max_tokens=2048,
)
print(f" [{final.agent}] done ({len(final.output)} chars)")
return final.output
result = orchestrator("Explain how vector databases work and when to use them")
print("\n=== Final Output ===")
print(result)Parallel Fan-Out with ThreadPoolExecutor
When agents are independent, run them simultaneously to cut latency:
import anthropic
from concurrent.futures import ThreadPoolExecutor, as_completed
client = anthropic.Anthropic()
def research_source(source_name: str, topic: str) -> tuple[str, str]:
response = client.messages.create(
model="claude-haiku-4-5", # cheap model for parallel workers
max_tokens=512,
messages=[
{
"role": "user",
"content": (
f"You are a {source_name} expert. "
f"Give 3 bullet points about: {topic}"
),
}
],
)
return source_name, response.content[0].text
def parallel_research(topic: str, sources: list[str]) -> dict[str, str]:
results = {}
with ThreadPoolExecutor(max_workers=len(sources)) as pool:
futures = {
pool.submit(research_source, source, topic): source
for source in sources
}
for future in as_completed(futures):
name, output = future.result()
results[name] = output
print(f" [{name}] completed")
return results
def synthesize(topic: str, research: dict[str, str]) -> str:
combined = "\n\n".join(f"## {k}\n{v}" for k, v in research.items())
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[
{
"role": "user",
"content": (
f"Topic: {topic}\n\n"
f"Research from multiple sources:\n{combined}\n\n"
"Synthesize into one coherent summary."
),
}
],
)
return response.content[0].text
# Run parallel research + synthesis
topic = "best practices for deploying LLMs in production"
sources = ["DevOps", "ML Engineering", "Security"]
print("Running parallel research...")
research = parallel_research(topic, sources)
print("Synthesizing...")
summary = synthesize(topic, research)
print(summary)Agent Handoffs with State
Real pipelines need to pass structured state between agents, not just raw strings. Use a dataclass or dict to carry context:
from dataclasses import dataclass, field
import anthropic
client = anthropic.Anthropic()
@dataclass
class PipelineState:
topic: str
research: str = ""
outline: str = ""
draft: str = ""
final: str = ""
errors: list[str] = field(default_factory=list)
def research_agent(state: PipelineState) -> PipelineState:
try:
resp = client.messages.create(
model="claude-haiku-4-5",
max_tokens=512,
messages=[{"role": "user", "content": f"Research key facts about: {state.topic}"}],
)
state.research = resp.content[0].text
except Exception as e:
state.errors.append(f"research: {e}")
return state
def outline_agent(state: PipelineState) -> PipelineState:
if not state.research:
state.errors.append("outline skipped: no research")
return state
resp = client.messages.create(
model="claude-haiku-4-5",
max_tokens=512,
messages=[
{
"role": "user",
"content": f"Create a 5-point article outline based on:\n{state.research}",
}
],
)
state.outline = resp.content[0].text
return state
def writer_agent(state: PipelineState) -> PipelineState:
if not state.outline:
state.errors.append("writer skipped: no outline")
return state
resp = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
messages=[
{
"role": "user",
"content": (
f"Topic: {state.topic}\n"
f"Outline:\n{state.outline}\n"
f"Research:\n{state.research}\n\n"
"Write the full article."
),
}
],
)
state.draft = resp.content[0].text
return state
# Run the pipeline
state = PipelineState(topic="Getting started with LangChain")
for agent in [research_agent, outline_agent, writer_agent]:
state = agent(state)
print(f"{agent.__name__}: {'OK' if not state.errors else state.errors[-1]}")
print("\nFinal draft length:", len(state.draft), "chars")
if state.errors:
print("Errors:", state.errors)When to Use Multi-Agent vs Single Agent
- Use single agent when the task fits in one prompt, needs one skill, and latency matters more than quality
- Use pipeline when the task has clear sequential steps and each step has different requirements
- Use orchestrator when the task is complex, the number of steps is dynamic, or you need error recovery per step
- Use parallel fan-out when sub-tasks are independent and you need to reduce wall-clock time
Cost and Latency Considerations
- Use cheaper models for worker agents:
claude-haiku-4-5for research/extraction;claude-sonnet-4-6for final synthesis - Parallelize where possible: 5 parallel Haiku calls often beat 1 Sonnet call in both cost and speed
- Cache repeated prompts: if multiple agents use the same system prompt or context, enable prompt caching to save tokens
- Set per-agent token limits: researchers don’t need 4096 tokens; cap them at 512–1024
- Fail fast: check state for errors after each agent; skip downstream agents if a critical step failed
Full Example: Content Pipeline
A complete pipeline that turns a topic into a publishable article:
import anthropic
from concurrent.futures import ThreadPoolExecutor
from dataclasses import dataclass, field
client = anthropic.Anthropic()
@dataclass
class Article:
topic: str
facts: list[str] = field(default_factory=list)
outline: str = ""
draft: str = ""
title: str = ""
seo_description: str = ""
def call(system: str, user: str, model: str = "claude-haiku-4-5", max_tokens: int = 512) -> str:
return client.messages.create(
model=model,
max_tokens=max_tokens,
system=system,
messages=[{"role": "user", "content": user}],
).content[0].text
def run_pipeline(topic: str) -> Article:
art = Article(topic=topic)
# Step 1: Parallel research (3 angles simultaneously)
print("1. Parallel research...")
angles = ["technical fundamentals", "practical use cases", "common pitfalls"]
with ThreadPoolExecutor(max_workers=3) as pool:
futures = [
pool.submit(call, "Return 3 concise bullet points.", f"{angle} of: {topic}")
for angle in angles
]
art.facts = [f.result() for f in futures]
# Step 2: Outline
print("2. Outlining...")
combined = "\n\n".join(art.facts)
art.outline = call(
"Create a 5-section article outline.",
f"Topic: {topic}\n\nResearch:\n{combined}",
)
# Step 3: Draft (Sonnet for quality)
print("3. Writing draft...")
art.draft = call(
"Write a complete developer-focused article. Be practical, use code examples.",
f"Topic: {topic}\nOutline:\n{art.outline}\nResearch:\n{combined}",
model="claude-sonnet-4-6",
max_tokens=3000,
)
# Step 4: Parallel meta generation (title + SEO simultaneously)
print("4. Generating metadata...")
with ThreadPoolExecutor(max_workers=2) as pool:
title_f = pool.submit(
call, "Return only a compelling article title, no quotes.",
f"Generate SEO title for:\n{art.draft[:500]}"
)
seo_f = pool.submit(
call, "Return only a 155-character meta description, no quotes.",
f"Meta description for:\n{art.draft[:500]}"
)
art.title = title_f.result()
art.seo_description = seo_f.result()
return art
article = run_pipeline("How to build a RAG system with Python")
print(f"\nTitle: {article.title}")
print(f"SEO: {article.seo_description}")
print(f"Draft: {len(article.draft)} chars")Summary
Multi-agent systems let you build AI pipelines that are smarter, faster, and easier to debug than single-prompt approaches:
- Split work by skill: researcher, writer, editor, reviewer
- Run independent agents in parallel with
ThreadPoolExecutor - Pass structured state between agents to avoid lost context
- Use cheap models for workers, powerful models for synthesis
- Fail fast and log errors per agent — don’t let one bad step corrupt the whole pipeline
Related reading: How to Build an AI Agent with Python for single-agent fundamentals, or LangChain for Beginners for a framework-based approach to agent orchestration.