The most important shift in AI engineering between 2023 and 2025 was not bigger context windows, not multimodal input, not even the arrival of reasoning models. It was an inversion in where the intelligence was assumed to live. We stopped trying to put it in the prompt and started building it into the environment around the model.
This sounds abstract until you look at a before-and-after pair. In the prompt-centric world, if you wanted an AI to help users debug their code, you’d write a long system prompt instructing the model on how to diagnose problems, what questions to ask, when to suggest fixes, how to format the response. The model received a description of the world and a description of what to do in it. In the runtime-centric world, you instead give the model a shell, access to the repo, the ability to run tests, and a roughly twenty-line description of its role. The model isn’t told how to debug — it can read the code, run the tests, see what fails, and form hypotheses the same way a person would. The behavior emerges from the situation, not from the instructions.
The instructions get shorter as the environment gets richer. That’s the inversion.
The instructions get shorter as the environment gets richer.
Procedural prose is a bad medium for procedures
It’s worth being precise about why this works, because “richer environment” can sound like a slogan rather than an architectural claim. The reason it works is that natural language descriptions of procedures are a bad medium for procedures. If I tell you “to fix a failing test, first look at the error message, then find the relevant code, then form a hypothesis about why it’s failing, then…” — I am producing prose that compresses a great deal of detail and assumes a great deal of context. The prose is fine for a colleague who already knows what tests are. It’s terrible as a specification, because every word leaves slack that has to be resolved at execution time. Models reading procedural prose are constantly making interpretive guesses, and the guesses degrade the further down the procedure they get.
Compare this to actually having a shell and a test runner. The shell isn’t a description of an action; it’s the action. The test runner isn’t a hint about what failure looks like; it produces the actual failure. The model doesn’t have to imagine the world it’s operating in — it can probe the world directly. Every interpretive guess that the prompt era forced the model to make is replaced by a tool call that returns ground truth.
What changes structurally is the locus of intelligence in the system. In the prompt-centric model, the smart part is the description we hand the model, and the model’s job is to follow it. In the runtime-centric model, the smart part is the environment we put the model in — the affordances it offers, the constraints it enforces, the feedback it produces — and the model’s job is to navigate it. The model is a participant in a situation, not the recipient of a configuration file.
The environment is an operating system, not a string
You can see why teams resisted this shift initially. Building the environment is much more work than writing the prompt. To go from prompt-centric to runtime-centric you have to ship: a tool surface (which means thinking carefully about what the model should be able to do), a permission model (which means thinking carefully about what it shouldn’t), a context manager (which means thinking carefully about what information enters and exits the model’s view), a memory system (which means thinking carefully about persistence), and a verification loop (which means thinking carefully about how to know if anything worked). The prompt was a string. The environment is an operating system. Of course people preferred the string for a while.
But the string had a ceiling, and the environment didn’t. The runtime-centric approach got more capable as you invested in it. Adding a new tool made new behaviors possible without rewriting any existing logic. Hardening a permission scope made the agent safer without making it stupider. Improving the context manager lifted the performance of every skill that ran on top of it.
The platform exhibited the kind of compounding that platforms do — each improvement raised the floor for everything above it. The prompt did not compound. It just got longer.
The CGI era of AI
There’s a useful historical analogy here. Early web applications were structured as CGI scripts: a request came in, a script ran, generated some HTML, exited. State lived in the URL, in cookies, in whatever the developer had hand-rolled. As the web matured, applications moved into runtimes — servlet containers, then application servers, then frameworks like Rails — that handled sessions, routing, persistence, security, and rendering as primitives. The application code shrank because it was sitting on a platform that knew how to do the standard things. Nobody looks back at CGI and says “ah yes, the golden age.” We look back at it as a stage we had to go through before the right abstractions emerged.
The prompt era is going to be remembered the same way. Useful in its moment, formative for the field, and structurally a dead end. What replaced it was not a more elaborate prompt but the recognition that AI applications need a runtime — a place where models, tools, memory, permissions, and skills live and interact — and that this runtime is where almost all of the engineering effort now goes.
The rest of this series is about that runtime. How agents reshaped the unit of intelligence. How harnesses formalized the runtime concept. How skills replaced system prompts. How trajectories replaced outputs. The thread tying it all together is the same inversion: the model is no longer the application. It’s the reasoning engine embedded in one.