Understanding Anthropic's Zero Trust for AI Agents Guide

June 4, 2026

Stay Connected

Stay up to date with Pomerium news and announcements.

Most writing about AI agent security stops at the scary part: agents can be prompt-injected, they hold credentials, they act near production. True, and not very useful. However, the implementation half of Anthropic's Zero Trust for AI Agents guide is more interesting because it stops describing the problem and starts grading the solutions. It lays out exactly which controls count as table stakes, which are enterprise-grade, and which are reserved for the highest-stakes environments.

Let's review their implementation guidance, so you can see where your own agent deployment lands on their maturity model. The framing throughout is a based on a single principle which we also believe here at Pomerium: chasing individual exploits keeps you permanently reactive, while building on Zero Trust foundations puts you on firmer ground.

Three tiers, and a floor that keeps rising

Every control in the guide is presented across three maturity tiers. Each tier builds upon the previous, so with each step, you strengthen what you already have, and you're not starting over.

Foundation

Minimum viable security for smaller deployments or early implementations. Enough on its own only for small teams.

Enterprise

Where most organizations should aim. The depth needed for scale, multiple deployments, and environments where one compromise has real business impact.

Advanced

For regulated industries, national-security uses, or anywhere a breach carries severe consequences. Aspirational for most; baseline for the highest-risk teams.

The most important sentence in this whole section is a warning about drift. Because AI has accelerated offense — compressing the time between a vulnerability appearing and an attacker exploiting it — the entry bar has been raised. As the guide puts it, friction-only controls no longer qualify. And the tiers themselves are a moving target:

"Expect the Advanced tier to become Enterprise standard as the space evolves, and Enterprise to become Foundation."

In other words, today's gold standard is tomorrow's baseline. Plan accordingly.

It starts with identity

The guide is emphatic that identity comes first, because everything else depends on it: without verifiable identity you can't enforce access, keep meaningful audit trails, or attribute actions to a specific agent. Agents without distinct identities operate in an "attribution gap" where least-privilege becomes impossible.

What's notable is how far the Foundation bar has moved. Unique agent IDs used to be enough, but now they must be cryptographically rooted, because a label is trivial to forge. And on service authentication, the guide is blunt: static API keys and shared service-account passwords are "among the first things an attacker with model-assisted code analysis will find," and no longer count as a legitimate entry point. Short-lived, narrowly-scoped tokens are the new baseline. If you're running API keys with a rotation policy today, the guide says to treat that as a known gap, not a valid security posture.

Then: least agency, enforced at every layer

The access chapters orbit a single idea the guide calls Least Agency. An agent should hold only the permissions its specific function needs, scoped to the moment of need. That plays out across three controls:

Permission models climb from role-based access with deny-by-default, to attribute-based control that factors in context like time and data sensitivity, to continuous authorization that re-evaluates at every action rather than once per session.
Privilege scoping moves from static least-privilege roles, to dynamic elevation that returns to baseline after a task, to just-in-time access that auto-expires the moment the work is done.
Resource boundaries make a point worth underlining: identity-based isolation is the primary control, and network segmentation is only a backstop. An attacker who reaches a network boundary will pivot through it if the services on the other side accept any caller. Isolation has to be enforced at the receiving end, where each service accepts connections only from the specific callers its policy names.

See everything, then respond at machine speed

Observability is split into auditing (what happened on the systems agents touch) and traceability (what the agent itself did internally, its tool calls, sub-agent spawns, and reasoning steps). Both climb from basic logging toward immutable trails, distributed tracing across multi-agent workflows, and full replayable provenance.

One practical tip stands out: before investing anywhere else in detection, instrument two numbers: dwell time (how long between an anomaly and a human noticing) and coverage (the share of alerts that actually get investigated). Those are the two metrics automation can move the most, and they matter most precisely when exploit windows are shrinking.

On response, the guide draws a line that's easy to get wrong in the rush to automate:

"Automate the bookkeeping around incidents, not the decisions."

Models should take notes, capture artifacts, run parallel investigation tracks, and draft the postmortem. Humans should make the containment calls, the disclosure calls, and the customer-comms calls.

Guarding the edges, and the configs

The final chapters cover inputs, outputs, and integrity. Input sanitization, the guide notes, doesn't translate cleanly from traditional security. Agent inputs are freeform and unpredictable, so schema and length checks only go so far, and the advanced answer is techniques like spotlighting (clearly delimiting untrusted content) and classifier-based filtering. Output filtering flips the goal from protecting the agent to preventing data loss, escalating to human-in-the-loop approval for high-risk actions, which the guide calls valuable at any tier and non-negotiable for consequential ones.

Configuration integrity gets a refreshingly pragmatic treatment: agent configs deserve the same version control, review, and signing rigor as application code, because a tampered config can be as damaging as a code vulnerability and is often easier to exploit. And at the infrastructure layer, the guide makes a counterintuitive call to turn on automatic updates wherever an automated update outage is tolerable, because manual approval steps add delay, and delay is now the primary risk. Signed updates from a trusted supplier should flow through automatically; unsigned changes should be rejected outright.

It closes on governance: technical controls only enforce what policy defines. Without clear acceptable-use and incident-response policies, teams make inconsistent calls and Shadow AI quietly bypasses every other control on the list. The mature end-state is moving policy enforcement out of periodic reviews and into automated checks embedded in deployment pipelines.

The whole framework, at a glance

Control domain	Foundation	Enterprise	Advanced
Agent identity & service auth	Cryptographic IDs; short-lived tokens	X.509 certs + lifecycle; mutual TLS	HSM/TPM identity; attestation
Access & privilege	RBAC, deny-by-default	ABAC; dynamic elevation	Continuous authz; JIT/JEA
Isolation	Identity-based isolation	Per-agent sandboxing	Confidential computing
Observability & tracing	Logs + request IDs	Immutable trails; distributed tracing	SIEM streaming; full provenance
Monitoring & response	Thresholds + triage	Learned baselines; auto-containment	ML analysis; SOAR playbooks
Input / output controls	Schema + pattern filters	Attack-pattern & semantic filtering	Classifiers; HITL approval
Integrity & governance	Versioned configs; policies	Signed configs; governance board	Immutable infra; policy-as-code

The five things worth remembering

Identity is the prerequisite. No verifiable identity means no real access control, audit, or attribution.
The floor moved up. Static API keys and rotation policies are now a gap, not a baseline.
Least Agency is the through line. Minimum permissions, scoped to the moment, expired automatically.
Isolate at the receiving end. Identity-based isolation first; network segmentation is only a backstop.
Automate notes, not judgment. Let models prep and draft; keep humans on containment and disclosure.

The through line of the implementation guidance is that none of this is exotic. Most of the building blocks like OAuth, short-lived tokens, version control, deny-by-default, and audit pipelines, already exist in mature security programs. The work is applying them to agents with the same rigor you apply to production code, and doing it before agents are reaching real systems rather than after.

Visit the Pomerium docs to learn more about how Pomerium can help you achieve Advanced levels of agentic security.

Nikhil Balaraman

Stay Connected

Stay up to date with Pomerium news and announcements.

Understanding Anthropic's Zero Trust for AI Agents Guide

Stay Connected

Three tiers, and a floor that keeps rising

It starts with identity

Then: least agency, enforced at every layer

See everything, then respond at machine speed

Guarding the edges, and the configs

The whole framework, at a glance

The five things worth remembering

Stay Connected

More Blog Posts

Google Built an Agent Runtime on Kubernetes. Here's How to Build a Cloud-Agnostic One with Identity Included

IAM for Agentic AI: 6 Platforms Compared | Pomerium

How We Designed a Tamper-Evident SSH Recording System for Zero-Trust infrastructure

Revolutionize
Your Security

Understanding Anthropic's Zero Trust for AI Agents Guide

Stay Connected

Three tiers, and a floor that keeps rising

It starts with identity

Then: least agency, enforced at every layer

See everything, then respond at machine speed

Guarding the edges, and the configs

The whole framework, at a glance

The five things worth remembering

Stay Connected

More Blog Posts

Google Built an Agent Runtime on Kubernetes. Here's How to Build a Cloud-Agnostic One with Identity Included

IAM for Agentic AI: 6 Platforms Compared | Pomerium

How We Designed a Tamper-Evident SSH Recording System for Zero-Trust infrastructure

Revolutionize Your Security

Revolutionize
Your Security