Context Engineering Replaced Prompt Engineering

For about two years, every job board had postings for “prompt engineer.” Most of those have quietly been replaced by listings for “context engineer,” or “AI systems engineer,” or “agent engineer.” The job title shift tracks a real shift in what the work is. Prompt engineering was about phrasing — how do I word this so the model does what I want.

Context engineering is about information architecture — what does the model need to see, when, in what form, with what other things present, and what does it need to not see.

These sound similar, and they share some skills, but they’re not the same discipline. Prompt engineering treats the prompt as an input you sculpt. Context engineering treats the model’s context window as a scarce resource that gets composed, dynamically, from many sources, and that has properties — bandwidth, attention distribution, contamination effects — you have to manage. It’s much closer to systems engineering than to writing.

The shape of the new discipline

Let me sketch the shape of the discipline.

Retrieval design. Almost every agent fetches things — files, documents, prior conversations, schemas, examples. How that fetching is structured shapes the agent’s behavior more than the prompt does. Should retrieval be one-shot (pull everything up front) or iterative (pull as needed)? Should it return raw text or summarized chunks? How many results, in what order, with what scoring? Should the agent be allowed to retrieve again if the first results are insufficient? These decisions feel like backend concerns but they directly shape what the model can reason about. Bad retrieval design produces agents that can’t find what they need and don’t know they can’t. Good retrieval design produces agents that look smarter than the model they’re built on.

Focus management. A context window is a finite stage. Anything irrelevant on the stage competes with the relevant content for the model’s attention. Focus management is the work of keeping the stage clean: surfacing the right material, archiving stale conversation, collapsing repetitive content, summarizing long histories. Some of this is mechanical (drop tool call results older than N turns). Some is judgment (which parts of the user’s project description still matter twenty minutes into the trajectory?). A surprising amount of agent quality comes down to focus discipline.

Memory hierarchy. Agents have stratified memory now, and deciding what goes where is part of context engineering. Working memory lives in the current context window — it’s fast, limited, expensive per turn. Short-term memory is the recent trajectory and its summaries — fast to retrieve, useful for continuity. Episodic memory is past trajectories with similar tasks — useful for analogy and pattern-matching. Semantic memory is durable facts about the user, the project, the domain — accessed by retrieval. Long-term memory is the skill library and other procedural knowledge. The engineering question is what gets promoted from which layer to which other layer, and when. This is a real, hard design problem that the field is still working out.

Context lifecycle. Information that enters the context has to eventually leave it. The naive approach is “leave it all until you run out of room, then evict whatever’s oldest.” This is the equivalent of FIFO cache eviction, and it works poorly for the same reason — it doesn’t weight by relevance. Better approaches summarize aggressively, evict by relevance score, and keep a long-running structured summary as a substitute for the raw history. The point is that context isn’t permanent — the same information appears, transforms, and disappears over the course of a long trajectory. Designing those transitions is part of the job.

Information shape. Models read different formats differently. The same content as a flat list versus a structured table versus a JSON object versus prose produces measurably different behaviors. Context engineers think hard about how information is shaped before it hits the model. A schema is often more useful than the data it describes. A diff is often more useful than the two files it compares. A summary is often more useful than the document it summarizes. The shape of information is part of the design surface.

Contamination control. This one is less appreciated but increasingly important. When you load a document into the context window, you’re not just loading the information — you’re also loading whatever stylistic and tonal patterns are in the document. The model picks these up. Load three emails written in a frustrated tone and the agent starts writing in a frustrated tone. Load a piece of poorly-written code and the agent starts emitting similarly poor code.

Context engineering has to take this seriously: what’s loaded affects how the model behaves, not just what it knows. Sometimes you want this effect (load examples of the style you want); sometimes you don’t (you’re loading data, not style). Knowing the difference is craft.

Adversarial context. When the model receives content from external sources — web pages, emails, documents that came in from the world — that content may contain instructions, intentional or otherwise. Context engineering now includes thinking about which parts of context are trusted and which are not, and structuring the prompt to make those distinctions clear to the model. This is where prompt injection defense actually happens. It’s not a separate layer; it’s a property of how context is composed.

Information flow decides more than phrasing

The thing all these have in common is that they’re decisions about how information moves through the system, not decisions about how to phrase a sentence. The phrasing decisions still exist — short paragraphs are easier to attend to, structured headers help models navigate, certain framings work better than others — but they’re a smaller fraction of the total work than they used to be. Most of the difference between a great agent and a mediocre one comes down to context engineering choices that are made invisibly, behind the prompt, in the harness and the retrieval system and the memory hierarchy.

The advantage moved from query to schema

There’s a comparison worth making. In database land, the SQL query is often less important to performance than the schema, the indexes, and the query plan. A bad query on a well-designed schema is usually faster than a great query on a bad schema. The same is true for agents: a careful prompt on top of a sloppy context architecture is usually worse than a sloppy prompt on top of a careful context architecture. The advantage has moved.

Why prompt engineering lost prestige

This is also why prompt engineering, as a standalone discipline, has lost prestige. It was always going to be a transitional craft. The skills that mattered most in 2023 — careful phrasing, knowing which incantations worked — were the skills that had to compensate for an underdeveloped substrate. As the substrate matured, those skills got less valuable, the way obscure HTML hacks got less valuable as CSS matured. They didn’t become useless. They became specialized, embedded in tools, and rarely the bottleneck. Context engineering is what you do once the substrate is mature enough that the prompt isn’t the limiting factor anymore.

The remaining question is what you actually grant agents the autonomy to do once you’ve engineered their context well. The next post is about why “as much as possible” turned out to be the wrong answer, and why the systems that succeeded built collaborative loops rather than chasing full autonomy.