LLM 0.32a0 是一次重大的向后兼容重构

来源: Simon Willison’s Weblog | 作者: Simon Willison | 日期: 2026-04-29 原文链接: https://simonwillison.net/2026/Apr/29/llm/#atom-everything

一句话总结

LLM 0.32a0 通过两个核心架构变更——将输入建模为消息序列、将输出建模为带类型的部分流——突破了原有”文本进、文本出”的抽象限制，使其能够覆盖当今前沿模型的多样化输入输出能力。

速览

原有抽象已触及天花板——LLM 自 2023 年创建以来一直使用”文本提示词 → 文本响应”模型，但随着附件、schemas、tools 等功能堆叠，这个抽象已无法覆盖前沿模型的能力
输入重构为消息序列——新增 llm.user() 和 llm.assistant() 构建器函数，支持通过 messages=[] 数组直接注入完整对话历史，解决了无法从外部预填充对话的痛点
输出重构为带类型的部分流——新增 stream_events() / astream_events() API，每个事件携带类型标识（text、tool_call_name、tool_call_args 等），可区分处理不同类型的输出内容
向后兼容——原有 prompt= 接口仍然可用，LLM 在幕后自动升级为单元素 messages 数组
CLI 支持推理 token 可视化——思考文本以不同颜色显示且输出到 stderr，不影响管道结果；新增 -R/--no-reasoning 标志可抑制推理输出
新增响应序列化机制——response.to_dict() / Response.from_dict() 允许用户将响应存储到任意后端，摆脱对 SQLite 的绑定
对话可通过 reply() 延续——response.reply() 提供了一种比传统 conversation 对象更轻量的对话延续方式
下一步：SQLite 日志系统重设计——计划将对话存储建模为图结构，避免重复存储被不断扩展的对话

核心内容

原有抽象为什么不够用了

LLM 最初在 2023 年 4 月创建时，将世界简化为”发送文本提示词、获得文本响应”。这在当时是合理的。但三年间，LLM 先后叠加了附件（图像/音频/视频输入）、schemas（结构化 JSON 输出）、tools（工具调用）等功能。与此同时，前沿模型本身也在演进——推理能力、图像生成、混合类型输出不断涌现。

原来的”文本进、文本出”抽象已经无法表达这些能力。LLM 通过插件系统为数千个模型提供统一接口，抽象层的局限意味着整个插件生态都受到制约。

输入侧重构：消息序列

核心问题：旧的 conversation API 只能从头构建对话，无法从外部注入一段已有的对话历史。这使得构建模拟 OpenAI chat completions API 等场景变得过于困难。

旧的 CLI 工具通过 SQLite 持久化对话来绕过这个问题，但这从未成为稳定 API 的一部分，而且很多场景下用户不想绑定 SQLite。

新方案：

import llm
from llm import user, assistant

model = llm.get_model("gpt-5.5")
response = model.prompt(messages=[
    user("Capital of France?"),
    assistant("Paris"),
    user("Germany?"),
])

llm.user() 和 llm.assistant() 是新的构建器函数，直接在 messages=[] 数组中使用。旧的 prompt= 参数仍然可用——LLM 在幕后将其升级为单元素的 messages 数组，实现向后兼容。

新增的 response.reply() 方法提供了对话延续的轻量替代方案：

response2 = response.reply("How about Hungary?")

输出侧重构：带类型的部分流

核心问题：当今模型返回的内容类型越来越多样——推理文本、普通文本、工具调用请求、工具输出、图像、音频片段可能交错出现在同一个流式响应中。旧的流式 API 只能逐块输出文本字符串，无法区分这些不同类型的内容。

新方案：stream_events() 和 astream_events() 返回带类型标识的事件流：

for event in response.stream_events():
    if event.type == "text":
        print(event.chunk, end="", flush=True)
    elif event.type == "tool_call_name":
        print(f"\nTool call: {event.chunk}(", end="", flush=True)
    elif event.type == "tool_call_args":
        print(event.chunk, end="", flush=True)

工具调用方面，响应结束后可以调用 response.execute_tool_calls() 直接运行函数，或用 response.reply() 让工具被调用后将返回值自动发送回模型。

CLI 层面的体现：思考文本（reasoning tokens）现在以灰色显示，与最终响应文本用颜色区分。思考文本输出到 stderr，不会污染管道结果。新增 -R/--no-reasoning 标志可完全抑制推理 token 的显示。这是本次发布唯一面向 CLI 的变更。

响应序列化机制

旧的 SQLite 持久化机制不够灵活。新增的 to_dict() / from_dict() 方法让用户可以将响应导出为 JSON 风格的字典（实际上是一个 TypedDict，定义在 llm/serialization.py 模块中），存储到任何后端，然后还原。

serializable = response.to_dict()
response = Response.from_dict(serializable)

下一步计划

这是一个 alpha 发布，目的是在真实环境中检验新设计。Simon 预计稳定版 0.32 将与 alpha 非常相似。

剩余大任务是重新设计 SQLite 日志系统，以捕获新抽象返回的更细粒度的细节。理想方案是将对话存储建模为图（graph）结构，支持 chat completions API 风格的对话扩展场景，避免在数据库中产生重复。尚未决定这是放在 0.32 还是 0.33。

名言金句

“The original abstraction—of text input that returns text output—was no longer able to represent everything I needed it to.”
“LLM needs to evolve to better handle the diversity of input and output types that can be processed by today’s frontier models.”
“Surprisingly that ended up being the only CLI-facing change in this release.”

可行建议

尝试 alpha 版本：pip install llm==0.32a0，在真实项目中测试新的 messages API 和 streaming events API，向 Simon 反馈设计问题
迁移 conversation 代码：如果你有使用 model.conversation() 的代码，可以开始迁移到 messages=[] 模式以获得更大灵活性
利用序列化机制：如果你之前因为 SQLite 绑定而回避持久化对话，现在可以用 to_dict() / from_dict() 接入自己的存储层
插件开发者注意：需要关注新的 streaming events 协议，参考 llm-anthropic 插件的更新

资源清单

LLM 0.32a0 是一次重大的向后兼容重构

2026 年 4 月 29 日

我刚发布了 LLM 0.32a0，这是我的 LLM Python 库和 CLI 工具的一个 alpha 版本，包含了一些我已经筹划了很久的重大变更。

之前版本的 LLM 将世界建模为提示词和响应的关系。向模型发送一段文本提示词，获得一段文本响应。

import llm

model = llm.get_model("gpt-5.5")
response = model.prompt("Capital of France?")
print(response.text())

当我在 2023 年 4 月开始开发这个库时，这样做是合理的。但此后发生了太多变化！

LLM 通过其插件系统为数千个不同的模型提供了统一的抽象层。最初的抽象——文本输入、文本输出——已经无法覆盖我需要它表达的所有能力了。

随着时间推移，LLM 先后增加了用于处理图像、音频和视频输入的附件功能，然后是用于输出结构化 JSON 的 schemas 功能，再然后是用于执行工具调用的 tools 功能。与此同时，LLM 模型本身也在持续演进，增加了推理能力以及返回图像和各种其他有趣能力。

LLM 需要演进，以更好地处理当今前沿模型所能处理的多样化输入和输出类型。

0.32a0 alpha 版本有两个关键变更：模型输入可以表示为一系列消息，模型响应可以由不同类型的部分组成的流构成。

将提示词表示为消息序列

LLM 接受文本作为输入，但自从 ChatGPT 展示了双向对话界面的价值以来，最常见的提示方式就是将输入视为一系列对话轮次。

第一轮可能看起来像这样：

user: Capital of France?
assistant:

（然后模型来填写 assistant 的回复。）

但每个后续轮次都需要重放到目前为止的整个对话，就像一个剧本：

user: Capital of France?
assistant: Paris
user: Germany?
assistant:

主要供应商的大多数 JSON API 都遵循这种模式。以下是使用 OpenAI chat completions API 的上述示例，该 API 已被其他提供商广泛模仿：

curl https://api.openai.com/v1/chat/completions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.5",
    "messages": [
      {
        "role": "user",
        "content": "Capital of France?"
      },
      {
        "role": "assistant",
        "content": "Paris"
      },
      {
        "role": "user",
        "content": "Germany?"
      }
    ]
  }'

在 0.32 之前，LLM 将这些建模为对话（conversation）：

model = llm.get_model("gpt-5.5")

conversation = model.conversation()
r1 = conversation.prompt("Capital of France?")
print(r1.text())
# 输出 "Paris"

r2 = conversation.prompt("Germany?")
print(r2.text())
# 输出 "Berlin"

如果你是从头开始与模型构建对话，这是可行的，但它没有提供从一开始就注入一段已有对话的方式。这使得构建一个模拟 OpenAI chat completions API 这样的任务变得比它本应的要困难得多。

llm CLI 工具通过一种使用 SQLite 持久化和还原对话的自定义机制来解决这个问题，但这从未成为 LLM API 的稳定部分——而且在很多场景下，你可能想使用 Python 库但不想绑定 SQLite 作为存储层。

新的 alpha 版本现在支持这样做：

import llm
from llm import user, assistant

model = llm.get_model("gpt-5.5")

response = model.prompt(messages=[
    user("Capital of France?"),
    assistant("Paris"),
    user("Germany?"),
])
print(response.text())

llm.user() 和 llm.assistant() 是新的构建器函数，设计用于在 messages=[] 数组中使用。

之前的 prompt= 选项仍然可用，但 LLM 会在幕后将其升级为单元素的 messages 数组。

你现在还可以回复一个响应，作为构建对话的替代方式：

response2 = response.reply("How about Hungary?")
print(response2)  # 默认 __str__() 调用 .text()

流式部分（Streaming Parts）

alpha 版本的另一个重大新接口涉及从提示词中流式返回结果。

此前，LLM 支持这样的流式处理：

response = model.prompt("Generate an SVG of a pelican riding a bicycle")
for chunk in response:
    print(chunk, end="")

或者异步变体：

import asyncio
import llm

model = llm.get_async_model("gpt-5.5")
response = model.prompt("Generate an SVG of a pelican riding a bicycle")

async def run():
    async for chunk in response:
        print(chunk, end="", flush=True)

asyncio.run(run())

当今许多模型返回混合类型的内容。针对 Claude 运行的一个提示词可能先返回推理输出，然后是文本，然后是工具调用的 JSON 请求，最后是更多文本内容。

有些模型甚至可以在服务端执行工具，例如 OpenAI 的代码解释器工具或 Anthropic 的网页搜索。这意味着模型返回的结果可以混合文本、工具调用、工具输出和其他格式。

多模态输出模型也开始涌现，它们可以在流式响应中交错返回图像甚至音频片段。

新的 LLM alpha 将这些建模为一个带类型的消息部分流。以下是 Python API 消费者视角的使用方式：

import asyncio
import llm

model = llm.get_model("gpt-5.5")
prompt = "invent 3 cool dogs, first talk about your motivations"

def describe_dog(name: str, bio: str) -> str:
    """Record the name and biography of a hypothetical dog."""
    return f"{name}: {bio}"

def sync_example():
    response = model.prompt(
        prompt,
        tools=[describe_dog],
    )
    for event in response.stream_events():
        if event.type == "text":
            print(event.chunk, end="", flush=True)
        elif event.type == "tool_call_name":
            print(f"\nTool call: {event.chunk}(", end="", flush=True)
        elif event.type == "tool_call_args":
            print(event.chunk, end="", flush=True)

async def async_example():
    model = llm.get_async_model("gpt-5.5")
    response = model.prompt(
        prompt,
        tools=[describe_dog],
    )
    async for event in response.astream_events():
        if event.type == "text":
            print(event.chunk, end="", flush=True)
        elif event.type == "tool_call_name":
            print(f"\nTool call: {event.chunk}(", end="", flush=True)
        elif event.type == "tool_call_args":
            print(event.chunk, end="", flush=True)

sync_example()
asyncio.run(async_example())

示例输出（仅来自第一个同步示例）：

My motivation: create three memorable dogs with distinct "cool" styles—one cinematic, one adventurous, and one charmingly chaotic—so each feels like they could star in their own story. Tool call: describe_dog({"name": "Nova Jetpaw", "bio": "A sleek silver-gray whippet who wears tiny aviator goggles and loves sprinting along moonlit beaches. Nova is fearless, elegant, and rumored to outrun drones just for fun."} Tool call: describe_dog({"name": "Mochi Thunderbark", "bio": "A fluffy corgi with a dramatic black-and-gold bandana and the confidence of a rock star. Mochi is short, loud, loyal, and leads a neighborhood 'security patrol' made entirely of squirrels."} Tool call: describe_dog({"name": "Atlas Snowfang", "bio": "A massive white husky with ice-blue eyes and a backpack full of trail snacks. Atlas is calm, heroic, and always knows the way home—even during blizzards, fog, or confusing camping trips."}

在响应结束时，你可以调用 response.execute_tool_calls() 来实际运行被请求的函数，或者发送一个 response.reply() 让这些工具被调用并将它们的返回值发送回模型：

print(response.reply("Tell me about the dogs"))

这种新的流式传输不同 token 类型的机制意味着 CLI 工具现在可以用不同颜色显示”思考”文本和最终响应中的文本。思考文本输出到 stderr，因此不会影响通过管道传递给其他工具的结果。

这个例子使用 Claude Sonnet 4.6（配合更新了流式事件版本的 llm-anthropic 插件），因为 Anthropic 的模型会将推理文本作为响应的一部分返回：

llm -m claude-sonnet-4.6 'Think about 3 cool dogs then describe them' \
  -o thinking_display 1

动画演示。开始显示 ~/dev/scratch/llm-anthropic % uv run llm -m claude-sonnet-4.6 'Think about 3 cool dogs then describe them' -o thinking_display 1 - 文本首先以灰色流式显示：The user wants me to think about 3 cool dogs and then describe them. Let me come up with 3 interesting, cool dogs and describe them. 然后切换到正常颜色显示描述狗的输出文本。

你可以使用新的 -R/--no-reasoning 标志来抑制推理 token 的输出。令人意外的是，这最终成为了这个版本中唯一面向 CLI 的变更。

响应的序列化和反序列化机制

如前所述，LLM 目前在将对话持久化到 SQLite 方面有着相当不灵活的代码。我在 0.32a0 中添加了一个新机制，应该能为 Python API 用户提供一种自行实现替代方案的方式：

serializable = response.to_dict()
# serializable 是一个 JSON 风格的字典
# 将它存储在任何你喜欢的地方，然后还原它：
response = Response.from_dict(serializable)

这个方法返回的字典实际上是一个在新的 llm/serialization.py 模块中定义的 TypedDict。

下一步计划

我将其作为 alpha 版本发布，这样我可以升级各种插件，并在真实环境中检验这个新设计几天。我预计稳定版 0.32 将与这个 alpha 非常相似，除非 alpha 测试揭示了我整合方式中的某些设计缺陷。

还有一个剩余的大任务：我想重新设计 SQLite 日志系统，以更好地捕获这个新抽象返回的更细粒度的细节。

理想情况下，我想将其建模为图（graph），以最好地支持像 OpenAI 风格的 chat completions API 那样的场景，其中相同的对话不断被扩展，然后在每次提示时重复。我希望能够存储这些对话而不在数据库中产生重复。

我还没决定这应该是 0.32 的功能，还是留到 0.33。