Claude Messages API 全部参数说明（含代码示例）

Claude Messages API 是 Anthropic 官方唯一的对话调用接口，端点为 POST https://api.anthropic.com/v1/messages，鉴权用请求头 x-api-key 加 anthropic-version。所有能力——普通对话、工具调用、流式输出、多模态图片输入、扩展思考——都通过这一个端点的不同参数实现，没有独立的子接口。这篇文章把请求体里的每个参数逐个讲清楚，并给出可直接运行的 Python 与 Node.js 示例。

三个必填参数

一个最小可用的请求只需要三个字段：

model：模型 ID 字符串。2026 年主力模型为 claude-opus-4-8（最强）、claude-sonnet-4-6（均衡）、claude-haiku-4-5（快且省）。怎么选见 Claude 模型怎么选？Opus / Sonnet / Haiku 选型指南。
max_tokens：本次回复最多生成多少 token，是硬性上限。非流式建议设到 16000 左右，避免触发 SDK 超时；流式可设到 64000。设太小会把回答截断。
messages：对话消息数组，每个元素是 {"role": "user"|"assistant", "content": ...}。首条必须是 user，且 user/assistant 必须交替，否则报 400。

Python 最小示例

from anthropic import Anthropic

client = Anthropic()  # 自动读取环境变量 ANTHROPIC_API_KEY

resp = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=1024,
    messages=[{"role": "user", "content": "用一句话介绍 Claude"}],
)
print(resp.content[0].text)

Node.js 最小示例

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic(); // 读取 ANTHROPIC_API_KEY

const resp = await client.messages.create({
  model: "claude-opus-4-8",
  max_tokens: 1024,
  messages: [{ role: "user", content: "用一句话介绍 Claude" }],
});
console.log(resp.content[0].text);

SDK 安装与第一个请求的完整流程，参见 Claude API Python 示例代码：10 分钟跑通第一个程序和 Claude API Node.js 调用示例：从安装到流式输出。

常用可选参数一览

参数	类型	作用
system	字符串 / 数组	系统提示词，定义角色与行为规则，不算一条 message
stream	布尔	设为 true 走 SSE 流式输出，边生成边返回
stop_sequences	字符串数组	遇到指定字符串就停止生成
tools	数组	声明可调用的工具（函数）定义
tool_choice	对象	控制是否/必须使用工具：auto、any、tool、none
thinking	对象	开启扩展思考，4.x 新模型用 `{"type": "adaptive"}`
output_config	对象	控制思考深度（effort）和结构化输出格式（format）
metadata	对象	携带 user_id 等元信息，便于滥用监测

system：系统提示词

system 是顶层参数，不放进 messages 数组。它用来设定 Claude 的角色、语气和约束。注意：不要把会变的内容（当前时间、随机 ID）写进 system，否则会让提示缓存失效。

resp = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=1024,
    system="你是一名严谨的中文技术编辑，回答直接、不说套话。",
    messages=[{"role": "user", "content": "什么是 SSE？"}],
)

系统提示词的完整写法见 Claude 系统提示词怎么设置？完整配置步骤图解。

messages 与多轮对话

多轮对话靠你自己把历史拼回 messages 数组——API 本身不保存状态。每轮把上一次的助手回复追加进去再发：

messages = [
    {"role": "user", "content": "记住数字 42"},
    {"role": "assistant", "content": "好的，已记住 42。"},
    {"role": "user", "content": "我让你记的数字是多少？"},
]

content 既可以是字符串，也可以是内容块数组（混合文本与图片）。多轮上下文管理的细节见 Claude API 多轮对话怎么实现？上下文管理详解；图片块的传法见 Claude API 图片输入怎么传？多模态调用教程。

stream：流式输出

设 stream=True 后，服务端以 Server-Sent Events 逐块推送。官方 SDK 提供了封装，不用手动解析 SSE：

with client.messages.stream(
    model="claude-opus-4-8",
    max_tokens=2048,
    messages=[{"role": "user", "content": "写一首关于秋天的短诗"}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
    final = stream.get_final_message()  # 需要完整对象时

当 max_tokens 较大（超过约 16000）时必须走流式，否则容易触发 HTTP 超时。SSE 事件类型与底层细节见 Claude API 流式输出 Python 实现教程（SSE 详解）。

tools 与 tool_choice：工具调用

每个工具需要 name、description 和 JSON Schema 形式的 input_schema。Claude 返回 stop_reason: "tool_use" 时，你执行函数并把结果作为 tool_result 块回传：

tools = [{
    "name": "get_weather",
    "description": "查询某城市当前天气",
    "input_schema": {
        "type": "object",
        "properties": {"city": {"type": "string", "description": "城市名"}},
        "required": ["city"],
    },
}]

resp = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=1024,
    tools=tools,
    messages=[{"role": "user", "content": "北京天气怎么样？"}],
)

tool_choice 可取 {"type": "auto"}（默认，模型自己决定）、{"type": "any"}（必须调一个）、{"type": "tool", "name": "..."}（强制调指定工具）、{"type": "none"}（禁用）。完整闭环代码见 Claude Tool Use 工具调用怎么用？完整代码实战和 Claude 函数调用示例：让模型调用你的 API。

thinking 与 output_config：思考与输出控制

在 Opus 4.8、Sonnet 4.6 等新模型上，扩展思考用 thinking={"type": "adaptive"}，由模型自行决定思考深度，旧的 budget_tokens 写法已被移除（在 4.7/4.8 上会报 400）。思考深度通过 output_config 里的 effort 调节，取值 low、medium、high、xhigh、max：

resp = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=8000,
    thinking={"type": "adaptive"},
    output_config={"effort": "high"},
    messages=[{"role": "user", "content": "证明根号 2 是无理数"}],
)

需要 JSON 结构化输出时，用 output_config={"format": {...}}（旧的顶层 output_format 已废弃）。另外提醒：新模型不再接受 temperature、top_p、top_k 和助手消息预填充，传了会 400，请改用提示词和结构化输出来控制行为。

读响应：stop_reason 与 usage

响应里有两个字段几乎每次都要看：stop_reason 告诉你为什么停下（end_turn 正常结束、tool_use 要调工具、max_tokens 被截断、refusal 被拒绝）；usage 给出 token 用量，用于计费和缓存命中分析。Token 怎么算见 Claude API Token 怎么计算？附在线估算方法，省钱技巧见 Claude API 怎么省钱？5 个降低 Token 成本的方法。

常见问题

max_tokens 和上下文窗口是一回事吗？

不是。max_tokens 是这次回复最多生成多少 token；上下文窗口是输入加输出的总容量（Opus 4.8 为 1M）。两者独立，max_tokens 必须小于等于模型的最大输出上限。

为什么我的请求报 400 roles must alternate？

因为 messages 数组里出现了连续两条相同角色的消息，或首条不是 user。修正办法是确保从 user 开始、user 与 assistant 严格交替。401 认证失败排查见 Claude API 报错 401 怎么解决？认证失败排查指南。

调用价格和限额是多少？

各模型按输入/输出 token 分别计费，具体价格与速率限额以 Anthropic 官网为准。计费方式对比可参考 Claude API 价格详解：各模型计费方式与省钱对比，遇到 429 限流的重试方案见 Claude API 429 限流报错怎么办？3 种重试方案。