Why Small Prompts and Rich Environments Win

The conventional wisdom in 2023 was that more context was better. Bigger windows. Longer prompts. More few-shot examples. More elaborate role definitions. The implicit theory was that the model would behave better the more we told it.

By 2025 we had pretty clear evidence that this was wrong, or at least had been overshot. The harnesses that worked best — the ones producing genuinely capable agents in real environments — were the ones with relatively small prompts on top of relatively rich environments. The thin reasoning kernel pattern.

I want to make the strongest version of this argument, because I think it’s correct and it’s still controversial in some corners of the field.

What a thin reasoning kernel actually means

A “thin reasoning kernel” means: a small, focused prompt — short enough to fit comfortably and crisply in the context window, with no slack — paired with a procedural substrate that handles most of what the prompt used to handle. The substrate provides skills for specific tasks. The substrate provides tools for actions. The substrate provides memory for state. The substrate provides verification loops for ground truth. The prompt’s job is reduced to what only a prompt can do: frame the role, set the high-level disposition, point at the substrate.

This is the inverse of the maximalist prompt, where the system prompt tries to be all of the above at once.

The empirical case for thin kernels is, by now, fairly strong. Claude Code’s system prompt is short. Cursor’s is short. The Aider conventions are short. The harnesses that ship effective agents are not the ones with the longest prompts; they’re the ones with the most carefully designed environments.

The harnesses that ship effective agents are not the ones with the longest prompts; they’re the ones with the most carefully designed environments.

Where teams used to brag about the cleverness of their prompts, they now talk about the cleverness of their tool surfaces, their context managers, their skill libraries. The frontier has moved.

Prompts encode procedure in a lossy medium

The theoretical case is also clean. Long prompts have all the failure modes from earlier in this series: instruction collisions, hidden coupling, attention dilution, lost-in-the-middle effects, contamination. Each additional sentence in the prompt creates more chances for the model to get confused, more chances for two instructions to interfere, more tokens spent on guidance the model won’t use this turn. The marginal value of an additional prompt instruction drops off fast. The marginal value of an additional environment capability does not — adding a new tool, a new skill, a new memory tier creates new behavior that compounds with existing ones.

Put differently: prompts encode procedures in natural language, which is a lossy and error-prone medium. Environments encode procedures in code and structured artifacts, which are reliable. Move what you can out of the prompt and into the environment, and the system gets sharper. The thin kernel is what you have left after this move.

There’s an analogy I find clarifying. Compare two ways of teaching someone to do a job. The first is to write them a fifty-page manual and hand it to them. The second is to put them in a workplace with the right tools, mentors, and feedback systems. Which approach produces a competent worker faster? It’s not even close — the workplace approach wins overwhelmingly, even though it requires less explicit instruction.

Why? Because the workplace embodies the procedural knowledge in artifacts (tools, processes, examples) that the worker can probe, rather than text they have to parse. The fifty-page manual is the maximalist prompt. The workplace is the rich environment.

Answering the three standard objections

Critics of the thin kernel approach usually raise three concerns. Let me address them directly.

“How does the model know what to do without detailed instructions?” It doesn’t, in a literal sense — but that’s not how the system works. The model knows what to do because the environment tells it. The tools have names and descriptions. The skills are discoverable and self-describing. The verification loops produce feedback. The model navigates by sensing, not by following a script. This works for humans too — you don’t tell a new employee how to do every task; you give them an environment in which the tasks make sense, and they figure out the steps.

“How do you enforce constraints without putting them in the prompt?” You enforce them in the environment. Permissions enforce what tools can be called. Sandboxes enforce what code can do. Approval gates enforce what actions need human sign-off. The constraint isn’t a prompt instruction the model can fail to follow; it’s a property of the environment that’s structurally unavailable to be violated. This is much stronger than a written rule.

“How does the model maintain a consistent personality or style?” With a short bit of framing in the kernel, plus retrieved style guidance loaded as needed. The character of an agent doesn’t require a 4,000-word persona document — it requires a few sentences of high-level disposition, plus access to specific style examples when relevant. Most of the persona work that used to live in long prompts was actually doing a different job — compensating for the absence of tools, skills, and memory. With those in place, the persona shrinks.

The deeper observation underlying all this is that we’ve been conflating two different things: configuration and capability. The prompt era treated them as the same — you configure the model to be helpful, and that gave it the capability to help. The substrate era separates them. Configuration is what you put in the prompt. Capability is what you build into the environment. They’re different design surfaces with different properties. The mistake was trying to use one to do the work of the other.

The thin kernel is what survives a model swap

There’s a forward-looking version of this argument too. The thin kernel pattern is what makes a substrate model-agnostic. If the prompt is small, you can swap models more easily — the dependence on a specific model’s quirks is reduced. If the environment is rich, the model upgrades benefit the substrate’s behavior without requiring you to rewrite anything. The thin kernel is more portable across models, and the rich environment is the place where most of your investment compounds. This is exactly the right shape if you believe — as I do — that we’re heading into a world with many models, none of them quite dominant, and your job is to build durable systems on top of a moving target.

The endgame of this trend is, in a sense, the model getting smaller in proportion to the substrate. Not literally smaller — the models keep getting larger in parameter count — but smaller in proportion to the procedural environment that surrounds them. A trillion-parameter model embedded in a hundred-thousand-token procedural environment is still the dominant pattern. But the environment is doing more and more of the heavy lifting, and the model is being asked to do less and less in terms of “remembering what to do.” Its job shrinks to reasoning crisply over the current state. Reasoning is what models are uniquely good at. Everything else — memory, procedure, verification, environment — is better handled by the substrate.

The next post takes this even further. If the substrate is doing the procedural work and the model is doing the reasoning, then “agent” itself may be the wrong unit. We may be heading toward a world of dynamically composed skills, not persistent agent identities. That’s the next argument.