From System Prompts to Skills

If you wanted to teach an AI to do something in 2023, you wrote a longer system prompt. If you wanted to teach it five things, you wrote a much longer system prompt. The implicit theory was that capability lived in instructions, and to add a capability you added instructions. This worked for the first few additions and broke for everything after. By the time you had twelve capabilities you had a 15,000-token prompt that contradicted itself in three places, and you had no idea which capability was responsible for which behavior.

The skill is the architectural response to this problem. A skill is a packaged, named unit of procedural knowledge — typically a SKILL.md file describing when and how to do something, sometimes with associated code, examples, or resources — that can be loaded into the model’s context only when relevant. Skills externalize what the system prompt used to internalize. They take what was a monolithic blob and turn it into a library of modules.

Skills externalize what the system prompt used to internalize.

Treat the model as a developer with a library card

The mental model that helps here is: stop thinking of the model as a configured worker, and start thinking of it as a developer who knows how to read documentation. You wouldn’t onboard a new hire by handing them a 200-page binder and telling them to memorize it. You’d give them a slim playbook that explains the high-level structure, and then you’d give them access to detailed documentation that they consult when they hit a relevant task. The slim playbook is the system prompt in the skills era. The detailed documentation is the set of skills.

This is more than a packaging convention. It changes what’s possible.

Four things skills change that prompts couldn’t

The first thing it changes is composability. Skills can be authored independently and combined freely. The team building the “scheduling a meeting” skill doesn’t need to coordinate with the team building the “writing an email” skill. Each skill specifies its own preconditions, its own steps, its own outputs. When the model needs both, it loads both. You can ship new capabilities by adding skills, not by editing the prompt that everyone shares. This is the difference between giving everyone a library card and rewriting the encyclopedia every time you want to add a new entry.

The second is discoverability. The harness can search the skill set for skills relevant to the current task, surface those to the model with brief descriptions, and let the model decide which to consult in detail. This pattern — progressive disclosure of capabilities, which is the subject of the next post — is wildly more efficient than putting every capability description in the system prompt. The model spends its context budget on capabilities it’s likely to use rather than on capabilities it might theoretically need.

The third is versioning and ownership. Skills are artifacts. They have authors, version histories, tests, and lifecycles. You can deprecate a skill, fork a skill, parameterize a skill, A/B test a skill against an alternative. None of this was possible with a monolithic system prompt, because the prompt had no internal boundaries. The skill brings normal software engineering discipline to procedural knowledge.

The fourth, and maybe most important, is scope isolation. When a skill is loaded, the model gets that skill’s instructions in context — but only for the duration of the task. The skill doesn’t bleed into unrelated work. Compare this to system prompts, where every instruction silently affected every interaction. A constraint added for one use case would warp the model’s behavior in twenty others. Skills are conditional context; system prompts are unconditional context. The difference compounds enormously over the size of the capability library.

The format people converged on — a markdown file with a description block, when-to-use guidance, the steps or examples, and any associated files — is deliberately spartan. It’s readable by humans, parseable by tooling, and consumable by models without fuss. There’s a lesson here: the formats that won in the agent era were the ones that didn’t try to be clever. Markdown with frontmatter has beaten more elaborate specifications consistently, because it’s easy to write, easy to read, easy to diff, and easy to put in version control.

Capability authoring moves to domain experts

What’s underappreciated about skills is how much they shift the labor of building an AI product. In the prompt era, building a capable AI product meant writing prompts. Writing prompts is a fundamentally lonely activity — one person, a text editor, a sense of incantation. Skills can be authored the way documentation is authored: by domain experts, in their own time, in their own files, reviewed through normal code review processes. The bottleneck moves from “do we have a prompt engineer” to “do we have people who understand the domain and can describe what they do.” That’s a much more scalable kind of bottleneck.

It also resolves an awkward issue with the prompt era, which was that all the procedural knowledge was implicit. Nobody outside the prompt team could read the prompt and know what the system did. With skills, the capability surface is explicit. You can list it. You can audit it. You can show a customer which behaviors they’re getting. The system becomes legible in a way it wasn’t before.

Procedure beside the model, not inside it

There’s a deeper philosophical move here that I think matters. In the prompt era we tried to put the procedure inside the model. With skills, we put the procedure beside the model. The model retrieves and follows it; it doesn’t have to remember it. This is the same move that happened in computing decades ago when we stopped putting the entire program in memory and started using paged virtual memory. It turns out you don’t have to have everything in working set; you have to have a way to fetch the right thing at the right time. Skills are that mechanism for procedural knowledge.

The other thing skills make possible — capability gating, layered context, controlled disclosure — is important enough to get its own post. That’s next. Skills are the unit; progressive disclosure is the runtime pattern that uses the unit well.