来源：https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents 爬取日期：2026-03-22

Effective Context Engineering for AI Agents

Overview

Context engineering represents a fundamental shift in how developers build with large language models. Rather than focusing solely on crafting perfect prompts, engineers must thoughtfully manage the information entering an LLM’s finite attention budget.

Key Definitions

Context refers to the set of tokens included when sampling from an LLM. Context engineering encompasses strategies for curating and maintaining the optimal token set during inference, including system instructions, tools, external data, and message history.

This differs from prompt engineering, which focuses specifically on writing effective instructions. As teams build more sophisticated agents operating over multiple turns, managing the entire context state becomes essential.

Why Context Matters

LLMs exhibit performance degradation as context length increases—a phenomenon called “context rot.” Like humans with limited working memory, models have an “attention budget” that depletes with each new token. This stems from transformer architecture constraints where every token attends to every other token, creating n² relationships.

The practical implication: context must be treated as a finite resource requiring careful curation.

Effective Context Components

System Prompts: Should be extremely clear and specific enough to guide behavior effectively, yet flexible enough to allow models to develop strong heuristics. Avoid both brittle hardcoded logic and vague high-level guidance.

Tools: Must promote efficiency through token-economical returns and encourage efficient agent behaviors. Minimal, well-defined tool sets prevent ambiguity and support long-term maintainability.

Examples: Few-shot prompting remains valuable. Curate diverse, canonical examples rather than exhaustive edge case lists.

Runtime Context Retrieval

Modern agents increasingly use “just in time” context strategies rather than pre-loading all data. Agents maintain lightweight identifiers (file paths, URLs, queries) and dynamically load relevant information using tools. This mirrors human cognition—we use external indexing systems rather than memorizing entire corpuses.

Progressive disclosure allows agents to incrementally discover context through exploration, maintaining only essential information in working memory.

Managing Long-Horizon Tasks

Three techniques address context window limitations:

Compaction: Summarizing conversation contents and reinitializing with compressed context. Balancing recall and precision prevents losing subtle but critical details.

Structured Note-Taking: Agents write persistent notes outside the context window, retrieving them later. This enables multi-hour coherence across summarization steps.

Sub-agent Architectures: Specialized agents handle focused tasks with clean context windows, returning condensed summaries (1,000-2,000 tokens) to the main agent. This separates detailed search context from high-level synthesis.

Implementation Philosophy

The guiding principle: find “the smallest set of high-signal tokens that maximize the likelihood of your desired outcome.” As model capabilities improve, agents require progressively less prescriptive engineering, allowing greater autonomy.

Developers should continue treating context as precious regardless of window size improvements.