We often mistake the “Agent” in AI for style; with a persona, tone, conversational flow, as if that’s where all the magic happens. But real innovation lies underneath: in how these systems structure memory, reasoning, and invocation. So now we have Anthropic’s new Agent Skills, which gives us a concrete bridge between architecture and economics.

We’re now making agents efficiently intelligent, so they reason only where ambiguity lies, recall only what’s relevant, and outsource the rest to robust procedural systems.

The problem: context bottleneck

Any model lives inside a context window. Everything it “knows” in a moment, from prompt, conversation, tool specs, and domain instructions must be tokenised into that window. But tokens are expensive. In real-world enterprise settings, you’ll see agents hauling around tens of thousands of tokens just to maintain tool schemas or workflow instructions, many of which won’t even be used in that session. That’s a fragile model: intelligence built on waste.

The architecture can’t treat memory as infinite. The agent economy won’t scale on brute model size alone. The key is cognitive austerity; the ability to scale thought without exploding token cost. So, efficiency is no longer a luxury, it’s the ground on which agent systems must compete.

Agent Skills: procedural memory, replatformed!

Anthropic’s Agent Skills: are filesystem-backed directories containing:

  • A SKILL.md with YAML frontmatter (name + description),
  • Further instruction and resource files (e.g. reference, forms, templates),
  • Optional executable scripts.

At agent startup, only the metadata (name + description) of each installed skill is loaded into the system prompt. Claude “knows” that these skills exist, but not their full bodies. That’s level-1 progressive disclosure. 

When Claude judges a skill might help with the current task, it bash-reads the skill’s SKILL.md into context. That’s level 2. If deeper files are needed, they’re read only then. And if a script is to be executed, that script’s code never enters the token stream, only its output does.  This design dramatically shrinks the token footprint. Intelligence becomes indexable, not hoarded. Yay!

This matters because knowing when to consult procedural memory is as important as having it.

The external nervous system

If Agent Skills are the agent’s procedural memory, then Model Context Protocol (MCP) is the interface between cognition and the external world. While the Anthropic latest focuses less on MCP than Skills, they hint at integration and complementarity: Skills will “complement MCP servers by teaching agents more complex workflows involving external tools and software.” 

MCP is how agents query external data, systems, and APIs, bringing context in via structured calls, not via token ingestion. Those queries return summaries, JSON, or typed payloads. MCP becomes the boundary of what the agent can know, when, and under which policies. What Anthropic’s Skills architecture does is make MCP-style externalisation not optional, but foundational. The agent no longer carries the world; it asks for it when needed.

This design is powerful: it reintroduces governance, auditability, and operational alignment into agent cognition. The introspection of intelligence becomes as tractable as the introspection of APIs.

Token Economics: a layered architecture of thought

Bringing together Skills and MCP gives us a cognitive stack, and a new token model

LayerRoleToken CostInterpretation
Model ReasoningJudgment, abstraction, synthesisHighWhere ambiguity lives
Agent SkillsDomain heuristics, procedureMediumProcedural memory fetched on demand
MCPExternal context, data accessNear-zeroExternal memory boundary
Code ExecutionDeterministic operationZeroReflexive action layer

In traditional agents, every action – from reading, formatting, accessing – is tokenised. But with Skills + MCP, the costly token stream is reserved for uncertainty; the parts that require reasoning. Everything else is delegated: procedures, data retrieval, deterministic code. In Anthropic’s paradigm, this can slash token usage by 70–90% per session. But it’s more than cost: it reframes where intelligence lives. In practice, you could budget cognition like CPU or memory. You’d optimise not for raw throughput but for guidance efficiency.

Are we heading towards modular minds?

Anthropic hints at what comes next: Skills will become first-class libraries, versioned, shareable, composable. Agents will begin to author their own skills, capturing strategies they’ve used successfully.  Meanwhile, MCP may evolve from a data plumbing layer to inter-agent communication fabric, where agents exchange not just payloads but intent.

We’re shifting from monolithic, token-heavy generalists to modular, token-efficient specialists. That’s the true frontier: architects of minds, not bigger models.

Facebooktwitterredditpinterestlinkedinmail