Understanding Anthropic's Zero Trust for AI Agents Guide

June 4, 2026
Share on Bluesky

Most writing about AI agent security stops at the scary part: agents can be prompt-injected, they hold credentials, they act near production. True, and not very useful. However, the implementation half of Anthropic's Zero Trust for AI Agents guide is more interesting because it stops describing the problem and starts grading the solutions. It lays out exactly which controls count as table stakes, which are enterprise-grade, and which are reserved for the highest-stakes environments.

Let's review their implementation guidance, so you can see where your own agent deployment lands on their maturity model. The framing throughout is a based on a single principle which we also believe here at Pomerium: chasing individual exploits keeps you permanently reactive, while building on Zero Trust foundations puts you on firmer ground.

Three tiers, and a floor that keeps rising

Every control in the guide is presented across three maturity tiers. Each tier builds upon the previous, so with each step, you strengthen what you already have, and you're not starting over.

Foundation

Minimum viable security for smaller deployments or early implementations. Enough on its own only for small teams.

Enterprise

Where most organizations should aim. The depth needed for scale, multiple deployments, and environments where one compromise has real business impact.

Advanced

For regulated industries, national-security uses, or anywhere a breach carries severe consequences. Aspirational for most; baseline for the highest-risk teams.

The most important sentence in this whole section is a warning about drift. Because AI has accelerated offense — compressing the time between a vulnerability appearing and an attacker exploiting it — the entry bar has been raised. As the guide puts it, friction-only controls no longer qualify. And the tiers themselves are a moving target:

"Expect the Advanced tier to become Enterprise standard as the space evolves, and Enterprise to become Foundation."

In other words, today's gold standard is tomorrow's baseline. Plan accordingly.

It starts with identity

The guide is emphatic that identity comes first, because everything else depends on it: without verifiable identity you can't enforce access, keep meaningful audit trails, or attribute actions to a specific agent. Agents without distinct identities operate in an "attribution gap" where least-privilege becomes impossible.

What's notable is how far the Foundation bar has moved. Unique agent IDs used to be enough, but now they must be cryptographically rooted, because a label is trivial to forge. And on service authentication, the guide is blunt: static API keys and shared service-account passwords are "among the first things an attacker with model-assisted code analysis will find," and no longer count as a legitimate entry point. Short-lived, narrowly-scoped tokens are the new baseline. If you're running API keys with a rotation policy today, the guide says to treat that as a known gap, not a valid security posture.

Then: least agency, enforced at every layer

The access chapters orbit a single idea the guide calls Least Agency. An agent should hold only the permissions its specific function needs, scoped to the moment of need. That plays out across three controls:

  • Permission models climb from role-based access with deny-by-default, to attribute-based control that factors in context like time and data sensitivity, to continuous authorization that re-evaluates at every action rather than once per session.

  • Privilege scoping moves from static least-privilege roles, to dynamic elevation that returns to baseline after a task, to just-in-time access that auto-expires the moment the work is done.

  • Resource boundaries make a point worth underlining: identity-based isolation is the primary control, and network segmentation is only a backstop. An attacker who reaches a network boundary will pivot through it if the services on the other side accept any caller. Isolation has to be enforced at the receiving end, where each service accepts connections only from the specific callers its policy names.

See everything, then respond at machine speed

Observability is split into auditing (what happened on the systems agents touch) and traceability (what the agent itself did internally, its tool calls, sub-agent spawns, and reasoning steps). Both climb from basic logging toward immutable trails, distributed tracing across multi-agent workflows, and full replayable provenance.

One practical tip stands out: before investing anywhere else in detection, instrument two numbers: dwell time (how long between an anomaly and a human noticing) and coverage (the share of alerts that actually get investigated). Those are the two metrics automation can move the most, and they matter most precisely when exploit windows are shrinking.

On response, the guide draws a line that's easy to get wrong in the rush to automate:

"Automate the bookkeeping around incidents, not the decisions."

Models should take notes, capture artifacts, run parallel investigation tracks, and draft the postmortem. Humans should make the containment calls, the disclosure calls, and the customer-comms calls.

Guarding the edges, and the configs

The final chapters cover inputs, outputs, and integrity. Input sanitization, the guide notes, doesn't translate cleanly from traditional security. Agent inputs are freeform and unpredictable, so schema and length checks only go so far, and the advanced answer is techniques like spotlighting (clearly delimiting untrusted content) and classifier-based filtering. Output filtering flips the goal from protecting the agent to preventing data loss, escalating to human-in-the-loop approval for high-risk actions, which the guide calls valuable at any tier and non-negotiable for consequential ones.

Configuration integrity gets a refreshingly pragmatic treatment: agent configs deserve the same version control, review, and signing rigor as application code, because a tampered config can be as damaging as a code vulnerability and is often easier to exploit. And at the infrastructure layer, the guide makes a counterintuitive call to turn on automatic updates wherever an automated update outage is tolerable, because manual approval steps add delay, and delay is now the primary risk. Signed updates from a trusted supplier should flow through automatically; unsigned changes should be rejected outright.

It closes on governance: technical controls only enforce what policy defines. Without clear acceptable-use and incident-response policies, teams make inconsistent calls and Shadow AI quietly bypasses every other control on the list. The mature end-state is moving policy enforcement out of periodic reviews and into automated checks embedded in deployment pipelines.

The whole framework, at a glance

Control domain

Foundation

Enterprise

Advanced

Agent identity & service auth

Cryptographic IDs; short-lived tokens

X.509 certs + lifecycle; mutual TLS

HSM/TPM identity; attestation

Access & privilege

RBAC, deny-by-default

ABAC; dynamic elevation

Continuous authz; JIT/JEA

Isolation

Identity-based isolation

Per-agent sandboxing

Confidential computing

Observability & tracing

Logs + request IDs

Immutable trails; distributed tracing

SIEM streaming; full provenance

Monitoring & response

Thresholds + triage

Learned baselines; auto-containment

ML analysis; SOAR playbooks

Input / output controls

Schema + pattern filters

Attack-pattern & semantic filtering

Classifiers; HITL approval

Integrity & governance

Versioned configs; policies

Signed configs; governance board

Immutable infra; policy-as-code

The five things worth remembering

  • Identity is the prerequisite. No verifiable identity means no real access control, audit, or attribution.

  • The floor moved up. Static API keys and rotation policies are now a gap, not a baseline.

  • Least Agency is the through line. Minimum permissions, scoped to the moment, expired automatically.

  • Isolate at the receiving end. Identity-based isolation first; network segmentation is only a backstop.

  • Automate notes, not judgment. Let models prep and draft; keep humans on containment and disclosure.

The through line of the implementation guidance is that none of this is exotic. Most of the building blocks like OAuth, short-lived tokens, version control, deny-by-default, and audit pipelines, already exist in mature security programs. The work is applying them to agents with the same rigor you apply to production code, and doing it before agents are reaching real systems rather than after.

Visit the Pomerium docs to learn more about how Pomerium can help you achieve Advanced levels of agentic security.

Share: Share on Bluesky

Stay Connected

Stay up to date with Pomerium news and announcements.

More Blog Posts

See All Blog Posts
Blog
Another GlobalProtect bypass, another reminder that the VPN is the wrong place to put your trust
Blog
When the Web Becomes the Attacker: AI Agent Traps and the Case for Identity-Aware Access

Revolutionize
Your Security

Embrace Seamless Resource Access, Robust Zero Trust Integration, and Streamlined Compliance with Our App.