LLM Context Engineering

Effective context engineering for AI agents

Sep 29, 2025

Overview

The key to building efficient and reliable AI agents lies in treating "context" as a finite and valuable resource, and managing and optimizing it meticulously.

1. The Evolution from "Prompt Engineering" to "Context Engineering"

Prompt Engineering: Primarily focuses on how to write and organize instructions (especially system prompts) for LLMs to obtain optimal single-turn outputs.
Context Engineering: A broader concept concerned with managing and maintaining all information entering the LLM's "context window" throughout its entire operation cycle. This includes system prompts, tools, external data, conversation history, etc. It is a continuous, iterative optimization process.

2. Context is a Finite and Critical Resource

LLMs, like humans, have a limited "attention budget".
When there is too much information (tokens) in the context window, model performance degrades, leading to the "context rot" phenomenon, where the model struggles to accurately recall or utilize the information within it.
Therefore, information entering the context must be carefully curated. The goal is: to use the smallest, most efficient set of information (high-signal tokens) to maximize the likelihood of achieving the desired outcome.

3. Structure of Effective Context

Principle: At any given moment, include the "smallest yet highest-signal" set of tokens to maximize the probability of achieving the goal.
System Prompts: Find the "right altitude"—specific enough to guide behavior without resorting to fragile hard-coded logic; use structured sections (background, instructions, tool guidance, output format); start with a minimal viable version, then refine based on failure modes.
Tool Design: Fewer but better tools with clear boundaries, unambiguous parameters, and token-efficient returned information; avoid functional overlap and selection ambiguity.
Example Selection: A small number of diverse, canonical few-shot examples are more effective than cramming with rules and edge cases; examples serve as efficient "behavioral pictures."

4. Dynamic and Hybrid Context Retrieval

The article advocates for a shift from "pre-loading all information" to a "just-in-time" context retrieval strategy.
Agents should not load all potentially relevant data into the context at once. Instead, they should use tools (like file systems, database queries) to dynamically and autonomously retrieve information as needed.
This approach mimics human cognition (we don't remember everything, but we know where to find it) and enables "progressive disclosure", keeping the agent more focused and efficient. In practice, a hybrid strategy combining pre-loading with just-in-time retrieval often works best.

5. Three Key Strategies for Long-horizon Tasks

For complex, long-term tasks that exceed the capacity of a single context window, the article proposes three key techniques:

Compaction:
- Method: When the conversation history nears the context window limit, the model is tasked to summarize and compress it. A new conversation window is then started using this refined summary.
- Purpose: To maintain task continuity by preserving core information (e.g., decisions, unresolved issues) while discarding redundant content.
Structured Note-taking / Agentic Memory:
- Method: The agent is instructed to regularly write key information, to-do items, progress, etc., to an external "memory" (e.g., a NOTES.md file) during task execution, and to read from it when needed.
- Purpose: To provide the agent with persistent memory, enabling it to maintain long-term tracking and planning capabilities for a task even across multiple context resets.
Sub-agent Architectures:
- Method: A complex task is broken down. A main agent is responsible for high-level planning and coordination, delegating specific, in-depth subtasks to specialized sub-agents. Each sub-agent works within its own independent context and returns only a refined summary to the main agent upon completion.
- Purpose: To achieve "separation of concerns," preventing the main agent's context from being overwhelmed by massive details, thereby efficiently handling complex research and analysis tasks.