Claude API Node.js & TypeScript Tutorial: @anthropic-ai/sdk, Streaming & Prompt Caching (2026)

The Anthropic Node.js SDK lets you integrate Claude into any JavaScript or TypeScript project in minutes. This guide walks you through installation, your first API call, streaming responses, multi-turn conversations, and error handling — all with working code examples.

Table of Contents

Prerequisites

Node.js 18+ installed
An Anthropic API key — get one at console.anthropic.com
Basic JavaScript / TypeScript knowledge

Step 1: Install the SDK

Create a new project folder and install the official Anthropic SDK:

mkdir claude-node-app && cd claude-node-app
npm init -y
npm install @anthropic-ai/sdk

The SDK ships with full TypeScript types out of the box — no separate @types package needed.

Step 2: Set Your API Key

Never hardcode your API key. Store it as an environment variable:

# Linux / macOS
export ANTHROPIC_API_KEY="sk-ant-..."

# Windows (PowerShell)
$env:ANTHROPIC_API_KEY = "sk-ant-..."

For local development, use a .env file with the dotenv package:

npm install dotenv

# .env
ANTHROPIC_API_KEY=sk-ant-...

Step 3: Your First API Call

Create index.js and make your first request to Claude:

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();
// API key is read automatically from ANTHROPIC_API_KEY env var

async function main() {
  const message = await client.messages.create({
    model: "claude-sonnet-4-5",
    max_tokens: 1024,
    messages: [
      {
        role: "user",
        content: "Explain what an API is in one paragraph.",
      },
    ],
  });

  console.log(message.content[0].text);
}

main();

Run it:

node index.js

You should see Claude’s response printed to the console. The content[0].text path extracts the plain text from the response object.

Step 4: Streaming Responses

For long outputs, streaming shows text as it’s generated — much better UX than waiting for the full response:

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

async function streamExample() {
  const stream = await client.messages.stream({
    model: "claude-sonnet-4-5",
    max_tokens: 1024,
    messages: [
      {
        role: "user",
        content: "Write a short story about a robot learning to cook.",
      },
    ],
  });

  // Print each token as it arrives
  for await (const chunk of stream) {
    if (
      chunk.type === "content_block_delta" &&
      chunk.delta.type === "text_delta"
    ) {
      process.stdout.write(chunk.delta.text);
    }
  }

  const finalMessage = await stream.getFinalMessage();
  console.log("\nTotal tokens used:", finalMessage.usage.input_tokens + finalMessage.usage.output_tokens);
}

streamExample();

For a deeper look at streaming — including raw SSE events, async iteration, and building a streaming API endpoint — see our guide to streaming responses with the Claude API in Python.

Step 5: Multi-Turn Conversations

Claude keeps no state between calls — you send the full conversation history each time. Here’s a simple interactive chat loop:

import Anthropic from "@anthropic-ai/sdk";
import readline from "readline";

const client = new Anthropic();
const conversationHistory = [];

const rl = readline.createInterface({
  input: process.stdin,
  output: process.stdout,
});

async function chat(userMessage) {
  conversationHistory.push({ role: "user", content: userMessage });

  const response = await client.messages.create({
    model: "claude-sonnet-4-5",
    max_tokens: 1024,
    system: "You are a helpful coding assistant.",
    messages: conversationHistory,
  });

  const assistantMessage = response.content[0].text;
  conversationHistory.push({ role: "assistant", content: assistantMessage });

  return assistantMessage;
}

function askQuestion() {
  rl.question("You: ", async (input) => {
    if (input.toLowerCase() === "quit") {
      rl.close();
      return;
    }
    const reply = await chat(input);
    console.log("Claude:", reply);
    askQuestion();
  });
}

console.log('Chat started. Type "quit" to exit.');
askQuestion();

Step 6: Using TypeScript

The SDK has full TypeScript support. Create index.ts:

import Anthropic from "@anthropic-ai/sdk";
import type { Message } from "@anthropic-ai/sdk/resources/messages";

const client = new Anthropic();

async function askClaude(prompt: string): Promise<string> {
  const message: Message = await client.messages.create({
    model: "claude-sonnet-4-5",
    max_tokens: 512,
    messages: [{ role: "user", content: prompt }],
  });

  const block = message.content[0];
  if (block.type !== "text") throw new Error("Unexpected content type");
  return block.text;
}

const answer = await askClaude("What is the difference between null and undefined in JavaScript?");
console.log(answer);

Run TypeScript directly with tsx:

npm install -D tsx
npx tsx index.ts

Step 7: Error Handling

Wrap API calls in try/catch and handle Anthropic-specific errors:

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

async function safeCall(prompt) {
  try {
    const response = await client.messages.create({
      model: "claude-sonnet-4-5",
      max_tokens: 512,
      messages: [{ role: "user", content: prompt }],
    });
    return response.content[0].text;
  } catch (error) {
    if (error instanceof Anthropic.APIError) {
      console.error(`API Error ${error.status}: ${error.message}`);
      if (error.status === 429) {
        console.error("Rate limit hit — back off and retry");
      }
      if (error.status === 401) {
        console.error("Invalid API key");
      }
    } else {
      throw error; // Re-throw unexpected errors
    }
  }
}

Choosing the Right Model

Anthropic offers several Claude models for different use cases:

// Fast, cheap — great for simple tasks
model: "claude-haiku-4-5"

// Balanced — best for most applications
model: "claude-sonnet-4-5"

// Most capable — complex reasoning, long documents
model: "claude-opus-4-5"

For production apps, start with claude-sonnet-4-5. Switch to Haiku for high-throughput workloads or Opus for tasks requiring deep reasoning.

Full Working Example

Here’s a complete Node.js app that summarizes any text you pass it:

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

async function summarize(text) {
  const response = await client.messages.create({
    model: "claude-sonnet-4-5",
    max_tokens: 256,
    system: "You are a summarization assistant. Return a concise 2-3 sentence summary.",
    messages: [{ role: "user", content: text }],
  });
  return response.content[0].text;
}

const sampleText = `
  Node.js is an open-source, cross-platform JavaScript runtime environment
  that executes JavaScript code outside of a web browser. It uses an event-driven,
  non-blocking I/O model that makes it lightweight and efficient.
  Node.js was created by Ryan Dahl in 2009.
`;

const summary = await summarize(sampleText);
console.log("Summary:", summary);

// Token usage
console.log("Using model: claude-sonnet-4-5");

Prompt Caching with @anthropic-ai/sdk

Prompt caching reduces costs and latency by reusing expensive prompt processing. Set cache_control: { type: "ephemeral" } on any content block you want Anthropic to cache:

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const systemPrompt = `You are a senior TypeScript engineer.
You follow best practices and write clean, well-typed code.
[... long system prompt with guidelines, context, examples ...]`;

// Cache the system prompt — reused across many calls
const response = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 1024,
  system: [
    {
      type: "text",
      text: systemPrompt,
      cache_control: { type: "ephemeral" },  // cached here
    },
  ],
  messages: [{ role: "user", content: "Review this function for bugs." }],
});

console.log(response.content[0].text);
console.log("Cache status:", response.usage);
// { cache_creation_input_tokens: 1200, cache_read_input_tokens: 0, ... }

How Cache Works

First call: cache_creation_input_tokens > 0 — Anthropic stores the cached block (slightly slower)
Subsequent calls: cache_read_input_tokens > 0 — cached block is reused (faster + cheaper)
Cache TTL: 5 minutes of inactivity resets the cache
Minimum size: the cached block must be at least 1,024 tokens to qualify

Caching Large Context (e.g. a codebase)

import fs from "fs";
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

// Load a large file once and cache it across all questions
const codebase = fs.readFileSync("./src/index.ts", "utf-8");

async function askAboutCode(question: string): Promise<string> {
  const response = await client.messages.create({
    model: "claude-sonnet-4-6",
    max_tokens: 1024,
    system: [
      {
        type: "text",
        text: `Here is the full codebase:\n\n${codebase}`,
        cache_control: { type: "ephemeral" },
      },
    ],
    messages: [{ role: "user", content: question }],
  });
  return response.content[0].text as string;
}

// First call — creates cache
console.log(await askAboutCode("What does this module export?"));
// Second call — reads from cache (saves ~90% of input token cost)
console.log(await askAboutCode("Are there any async functions?"));

Prompt caching can cut input token costs by up to 90% when reusing large system prompts across multiple calls. See the Anthropic prompt caching docs for full details.

For a deeper dive into cache breakpoints, TTL options, and real cost-savings math, check out our complete guide to Claude prompt caching.

Summary

You’ve learned how to integrate Claude into a Node.js project:

Install @anthropic-ai/sdk and set your API key
Make basic messages.create() calls
Stream responses for better UX
Build multi-turn conversations by maintaining message history
Use TypeScript with full type safety
Handle API errors gracefully

Next step: explore Model Context Protocol (MCP) to connect Claude to external tools, or check out the AI Agent tutorial to build autonomous agents.