The word “agent” became fashionable around 2023 and has been used to mean almost anything since. A chatbot that can call a weather API is called an agent. A LangChain script that hits three endpoints in a row is called an agent. A research project that runs for forty-five minutes and modifies a codebase is called an agent. These are not the same kind of thing, and the field would be clearer if we admitted it.

The cleanest distinction I’ve found is this: a chatbot with tools waits to be told what to do next. An agent owns the loop.

An agent owns the loop.

A chatbot waits, an agent keeps going

Look at what happens in a tool-augmented chat. The user asks a question. The model decides whether a tool call would help. If yes, it emits a tool call. The harness executes it, returns the result, and gives control back to the model. The model formulates a reply, addresses the user, and stops.

The next move is the user’s. The conversational thread is the spine; tools are accessories along it.

This is a useful pattern. It’s how most LLM products got useful between 2023 and 2024. But it’s also a shallow integration. The model is never out of the user’s sight. Every step is mediated. The model is not pursuing a goal; it’s answering a question that happens to involve a quick lookup.

An agent works the other way around. The user gives a goal. The model — or rather the loop the model is running inside of — keeps going on its own. It plans, takes an action, observes the result, decides what to do next, takes another action, possibly several dozen times, until either the goal is met or the system decides to stop. The user isn’t in the loop. The user is a stakeholder watching from outside, possibly approving high-stakes steps, possibly only seeing the result. The spine of the interaction isn’t the conversation; it’s the goal and the sequence of actions toward it.

The architecture follows from the loop

This distinction is conceptual, but it has concrete architectural consequences. A tool-using chatbot needs a function-calling API and a way to render results. An agent needs all of that plus: a planner that can decompose goals into steps, a memory system that persists across turns, a recovery mechanism for when steps fail, a stopping criterion that prevents infinite loops, a permission system so that the agent can act without re-prompting the user for every step, and an observability layer so that the user can review what happened. These are not optional additions to a chatbot. They are the substance of what makes something an agent.

The clearest test I’ve seen for whether you’re looking at an agent or a fancy chatbot is the question: if the user walked away from the screen for ten minutes, what would the system do? A chatbot would wait, idle, doing nothing. An agent would continue, because the goal it was given hadn’t been met and the next step was its own to take.

You can feel the categorical difference

This isn’t a gradient. It’s a categorical difference, and you can feel it the moment you watch an agent work. There’s a quality of independent forward motion that chatbots simply don’t have.

The agent makes decisions you didn’t sign off on. It encounters problems and solves them on its own. It produces a trajectory you can review afterward, where each step has its own rationale. The artifact is recognizably not just “an answer to a question.” It’s a piece of work that was done while you weren’t watching.

Calling this distinction “operational agency” sounds fancier than it is. What it amounts to is: the model is allowed to act on the world repeatedly without checking back in. Everything else flows from that. If the model can act repeatedly, you need to constrain what it can act on (permissions). If you don’t want to micromanage every step, you need to give it goals not instructions (planning). If something will go wrong eventually, you need it to detect and recover (verification loops). If you’re going to trust it with this independence, you need to be able to audit what it did (trajectories). The whole architecture follows from the decision to grant the model agency in the original sense of the word: the ability to be the one acting.

The unit of work shifts from question to job

The reason this distinction matters commercially is that the value created by these two kinds of systems is very different. A chatbot with tools makes a user’s individual interactions more efficient — they get a better answer with less effort. The value is per-conversation.

An agent gets things done while the user is doing something else. The value is per-task, and tasks can be much bigger than conversations. The unit of work shifts from “the question” to “the job.” Almost every consequential business case for AI in the medium term is in this second category, which is why “agent” became the word everyone wanted to claim, even for systems that were really just chatbots with a function-calling sticker on them.

The word “agent” is about to fracture

I think we’re approaching the moment when the term “agent” will get fractured into more precise sub-terms, the way “AI” eventually fractured into “machine learning” and “computer vision” and “robotics” and so on. We’ll start distinguishing between research agents and execution agents, between long-running and short-running, between supervised and unsupervised, between agents that own a workspace and agents that traverse other people’s systems. Each of these wants different primitives, different harnesses, different evaluation methods. The single word “agent” is already doing too much work.

But the foundational split — between systems that wait to be told and systems that pursue goals — is the one to get right first. Everything else, including the harness that this series is about, is built on top of it.