Prompt engineering is the practice of designing inputs that get the best output from an LLM. For developers building on the Claude API, it’s less about “magic phrases” and more about clear communication with a reasoning system. This guide covers the core techniques — with working Python examples — so you can write prompts that are reliable, testable, and production-ready.
1. Use a System Prompt to Set Context
The system prompt is the most powerful tool you have. It defines Claude’s role, constraints, and output style for the entire conversation.
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
system="You are a senior Python developer. Answer concisely. Use code examples. Avoid unnecessary explanations.",
messages=[
{"role": "user", "content": "How do I read a JSON file in Python?"}
]
)
print(response.content[0].text)A well-crafted system prompt replaces dozens of instructions repeated in every user message. Define who Claude is, what it should focus on, and how it should respond.
2. Be Specific — Avoid Vague Instructions
Vague prompts produce vague responses. The more precisely you describe the task, the more reliable the output.
# ❌ Vague
"Summarize this article."
# ✅ Specific
"Summarize this article in 3 bullet points. Each point must be under 20 words. Focus on technical details relevant to Python developers."Specify: format, length, tone, audience, constraints. Everything you leave out is filled in by Claude’s defaults.
3. Few-Shot Prompting: Show Examples
If you need a specific format, show it. Few-shot examples are more reliable than instructions alone.
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=256,
system="You classify support tickets into categories.",
messages=[
{"role": "user", "content": "Ticket: 'My payment failed'
Category:"},
{"role": "assistant", "content": "billing"},
{"role": "user", "content": "Ticket: 'App crashes on login'
Category:"},
{"role": "assistant", "content": "bug"},
{"role": "user", "content": "Ticket: 'How do I export my data?'
Category:"},
]
)
print(response.content[0].text) # → "feature-request" or "support"Two or three examples are usually enough to lock in the pattern. More than five rarely adds value.
4. Chain of Thought: Ask Claude to Think Step by Step
For complex reasoning, multi-step math, or code generation, asking Claude to think before answering significantly improves accuracy.
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[{
"role": "user",
"content": """A server processes 1,200 requests/minute.
Each request takes on average 80ms. How many concurrent workers are needed?
Think through this step by step before giving the final answer."""
}]
)
print(response.content[0].text)This works because it forces Claude to “show its work” instead of jumping to an answer. The reasoning is visible, debuggable, and usually correct.
5. Request Structured Output (JSON)
When integrating Claude into a pipeline, you often need structured data, not prose. Ask for JSON and validate it.
import json
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=512,
system="You extract information from text. Always respond with valid JSON only. No explanations.",
messages=[{
"role": "user",
"content": """Extract from this job posting:
'Senior Python Developer at Acme Corp. 5+ years required. Remote. $120k-$150k.'
Respond as JSON with keys: title, company, experience_years, remote, salary_min, salary_max."""
}]
)
data = json.loads(response.content[0].text)
print(data["salary_min"]) # → 120000Tips for reliable JSON output:
- Say “valid JSON only” and “no explanations” in the system prompt
- Provide the exact key names you need
- Use
json.loads()and wrap in a try/except for production - For complex schemas, paste in a JSON Schema or example object
6. Constrain the Output Format
Tell Claude exactly how to format the response — length, structure, and style.
# Request a specific markdown structure
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=800,
system="""You write technical documentation.
Format rules:
- Start with a one-sentence summary
- Use ## headings for sections
- Use code blocks with language tags
- Maximum 400 words total""",
messages=[{"role": "user", "content": "Document the Python `requests.get()` function."}]
)7. Use XML Tags to Structure Complex Prompts
When your prompt contains multiple components (context, instructions, data), use XML-style tags to separate them clearly. Claude handles structured prompts better than long walls of text.
user_prompt = """
<context>
You are reviewing a pull request for a Python web service.
</context>
<code>
def get_user(user_id):
query = f"SELECT * FROM users WHERE id = {user_id}"
return db.execute(query)
</code>
<task>
Review this code. Identify security issues. Suggest fixes with corrected code.
</task>
"""
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[{"role": "user", "content": user_prompt}]
)
print(response.content[0].text)8. Control Temperature and Parameters
# Deterministic output (code, data extraction) — use low temperature
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
temperature=0.0, # Fully deterministic
messages=[{"role": "user", "content": "Write a Python function to validate an email address."}]
)
# Creative output (brainstorming, writing) — higher temperature
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
temperature=1.0, # More varied
messages=[{"role": "user", "content": "Give me 10 creative names for a developer tool."}]
)- temperature=0 — deterministic, best for code and structured data
- temperature=0.5–0.7 — balanced, good for most tasks
- temperature=1.0 — creative, more varied responses
9. Prompt Caching for Repeated Context
If you send the same large context (docs, codebase, rules) with every request, use prompt caching to cut latency and cost by up to 90%.
# Mark stable content as cacheable
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
system=[
{
"type": "text",
"text": "You are a code reviewer. Follow these rules: ...",
},
{
"type": "text",
"text": open("codebase_context.txt").read(), # Large, stable content
"cache_control": {"type": "ephemeral"}, # Cache this
}
],
messages=[{"role": "user", "content": "Review the authentication module."}]
)Cached content is reused for up to 5 minutes. Ideal for: system prompt docs, few-shot examples, or any context that doesn’t change between requests.
10. Common Mistakes to Avoid
- Over-constraining — too many rules can make Claude overly cautious. Give it room to reason.
- Ambiguous pronouns — “fix it” is vague. Say “fix the SQL injection vulnerability in the function above.”
- Missing context — Claude doesn’t know your codebase, conventions, or audience unless you tell it.
- No output format — without a format spec, responses vary. Always define the expected structure.
- Ignoring the system prompt — putting everything in the user message loses the most powerful lever.
- One-shot testing — test prompts with at least 10–20 different inputs before using in production.
Prompt Engineering Checklist
- Define a clear system prompt with role, constraints, and format rules
- Specify the output format (JSON, markdown, plain text, bullets)
- Include 2–3 examples if a specific pattern is needed
- Add “think step by step” for complex reasoning tasks
- Use XML tags to separate context, data, and instructions
- Set temperature=0 for deterministic tasks
- Use prompt caching for large, reused contexts
- Test across edge cases before shipping
What’s Next?
Prompt engineering is the foundation — combine it with the right tools to build reliable systems:
- Build an AI agent that uses prompts + tool calling to complete complex tasks
- Claude API Python tutorial — start here if you haven’t set up the SDK yet
- Model Context Protocol (MCP) — connect Claude to external tools with structured prompts
- Read the Anthropic prompt engineering docs for advanced techniques like extended thinking