Chains Were Our First Attempt at Structure

If the prompt era was about cramming the entire application into a single string, the chain era was about admitting that didn’t work and reaching for the most familiar tool in the engineering cupboard: a pipeline.

LangChain didn’t invent chains, but it gave them a name and a mascot, and for a while every AI demo on the internet was a directed acyclic graph of prompts.

Take user input. Pass it to a “classifier” prompt to decide what kind of question it is. Pass that to a router. Use the router’s output to pick a “retrieval” prompt. Embed the question. Hit a vector store. Get documents. Feed documents and original question into a “synthesis” prompt. Format the result with a “formatter” prompt. Ship it.

This was an enormous improvement over a single mega-prompt, and it deserves credit for that. Chains gave teams something they could reason about: discrete steps, each with a defined input and output, each independently testable. You could swap one prompt without rewriting the others. You could log intermediate results. You could, for the first time, look at an AI product and see its data flow on a whiteboard.

The chain was the moment AI development started to look like normal software engineering. Which is to say, it was the moment we tried to make a probabilistic system pretend to be a deterministic one.

The chain was the moment we tried to make a probabilistic system pretend to be a deterministic one.

The chain was a leash with structured slots

The core move was honest about what it was doing: models were unreliable, so we’d compensate by externalizing the control flow. The chain author decided what would happen next; the model just produced the substance at each step. You’d never trust the model to choose its own sequence of operations, because it would forget steps, hallucinate variables, or wander off into a monologue. The chain was a leash with structured slots cut into it.

For a class of problems this worked very well. Retrieval-augmented question answering, for instance, is fundamentally a four-step recipe: embed, search, stuff, generate. Once you have a chain that does those four steps reliably, you have a useful product. RAG took off largely because it had a natural chain shape and the chain shape suited what the models of the time could do.

Dynamic tasks don’t fit static pipelines

The trouble started when teams tried to fit dynamic tasks into static pipelines. Consider an agent that’s supposed to do customer support. Sometimes it should answer from a knowledge base. Sometimes it should look up the customer’s order. Sometimes it should escalate. Sometimes it should ask a clarifying question.

The natural shape of this task is a tree of decisions, not a fixed sequence. You can hammer it into a chain by adding routers, but every router is just another prompt asking the model to make a decision — which is the thing you were trying to avoid by using a chain in the first place.

What you got was chains where the branching factor exploded. Each new behavior added another router, another retry path, another fallback. The graph grew until nobody could read it. Teams started building visual editors for their LangChain graphs, then discovered that no visual editor scales past about thirty nodes before becoming wallpaper. The chain was supposed to be the structure that tamed the model; instead, the chain itself became the thing you couldn’t reason about.

A tax on capability the model was starting to have

There was also a deeper problem. Chains assumed the model didn’t know what to do. Every transition was hard-coded because we didn’t trust the model to make it. But as models got better, this assumption became more and more wasteful. Why route the user’s question through a classifier when the model could just read the question and figure out where to send it? Why force a fixed retrieval step when the model could decide whether retrieval was even necessary? Chains were paying a tax on every request to do work the model could increasingly do for itself, and the tax wasn’t getting cheaper.

Scaffolding, not architecture

The framing that helped me make sense of this later was: chains were scaffolding, not architecture. Scaffolding is the right thing to build when you don’t yet know what shape the building wants to be. It lets you move and supports your weight. But you take it down once the structure stands on its own. The chain era was scaffolding for an industry that didn’t yet have load-bearing patterns. The mistake — the one that produced a lot of dead repos around 2024 — was treating the scaffolding as the finished building.

The right reading of LangChain in retrospect isn’t that it was wrong. It was a perfectly reasonable response to the conditions of 2023: models that couldn’t be trusted with control flow, no shared vocabulary for tools, no runtime to host them in. The framework made an entire generation of developers think clearly about input/output contracts between AI components, which is a contribution that outlived its specific abstractions.

What came next wasn’t a better chain. It was the recognition that we’d been compensating for the wrong thing. The model didn’t need a more elaborate cage; it needed a richer world to operate in, and the freedom to navigate that world itself. The next post is about why that started to seem plausible, and why “prompt engineering” hit a wall right around the same time.