Most writing about AI agent security stops at the scary part: agents can be prompt-injected, they hold credentials, they act near production. True, and not very useful. However, the implementation half of Anthropic's Zero Trust for AI Agents guide is more interesting because it stops describing the problem and starts grading the solutions. It lays out exactly which controls count as table stakes, which are enterprise-grade, and which are reserved for the highest-stakes environments.
Let's review their implementation guidance, so you can see where your own agent deployment lands on their maturity model. The framing throughout is a based on a single principle which we also believe here at Pomerium: chasing individual exploits keeps you permanently reactive, while building on Zero Trust foundations puts you on firmer ground.
Every control in the guide is presented across three maturity tiers. Each tier builds upon the previous, so with each step, you strengthen what you already have, and you're not starting over.
Foundation
Minimum viable security for smaller deployments or early implementations. Enough on its own only for small teams.
Enterprise
Where most organizations should aim. The depth needed for scale, multiple deployments, and environments where one compromise has real business impact.
Advanced
For regulated industries, national-security uses, or anywhere a breach carries severe consequences. Aspirational for most; baseline for the highest-risk teams.
The most important sentence in this whole section is a warning about drift. Because AI has accelerated offense — compressing the time between a vulnerability appearing and an attacker exploiting it — the entry bar has been raised. As the guide puts it, friction-only controls no longer qualify. And the tiers themselves are a moving target:
"Expect the Advanced tier to become Enterprise standard as the space evolves, and Enterprise to become Foundation."
In other words, today's gold standard is tomorrow's baseline. Plan accordingly.
The guide is emphatic that identity comes first, because everything else depends on it: without verifiable identity you can't enforce access, keep meaningful audit trails, or attribute actions to a specific agent. Agents without distinct identities operate in an "attribution gap" where least-privilege becomes impossible.
What's notable is how far the Foundation bar has moved. Unique agent IDs used to be enough, but now they must be cryptographically rooted, because a label is trivial to forge. And on service authentication, the guide is blunt: static API keys and shared service-account passwords are "among the first things an attacker with model-assisted code analysis will find," and no longer count as a legitimate entry point. Short-lived, narrowly-scoped tokens are the new baseline. If you're running API keys with a rotation policy today, the guide says to treat that as a known gap, not a valid security posture.
The access chapters orbit a single idea the guide calls Least Agency. An agent should hold only the permissions its specific function needs, scoped to the moment of need. That plays out across three controls:
Permission models climb from role-based access with deny-by-default, to attribute-based control that factors in context like time and data sensitivity, to continuous authorization that re-evaluates at every action rather than once per session.
Privilege scoping moves from static least-privilege roles, to dynamic elevation that returns to baseline after a task, to just-in-time access that auto-expires the moment the work is done.
Resource boundaries make a point worth underlining: identity-based isolation is the primary control, and network segmentation is only a backstop. An attacker who reaches a network boundary will pivot through it if the services on the other side accept any caller. Isolation has to be enforced at the receiving end, where each service accepts connections only from the specific callers its policy names.
Observability is split into auditing (what happened on the systems agents touch) and traceability (what the agent itself did internally, its tool calls, sub-agent spawns, and reasoning steps). Both climb from basic logging toward immutable trails, distributed tracing across multi-agent workflows, and full replayable provenance.
One practical tip stands out: before investing anywhere else in detection, instrument two numbers: dwell time (how long between an anomaly and a human noticing) and coverage (the share of alerts that actually get investigated). Those are the two metrics automation can move the most, and they matter most precisely when exploit windows are shrinking.
On response, the guide draws a line that's easy to get wrong in the rush to automate:
"Automate the bookkeeping around incidents, not the decisions."
Models should take notes, capture artifacts, run parallel investigation tracks, and draft the postmortem. Humans should make the containment calls, the disclosure calls, and the customer-comms calls.
The final chapters cover inputs, outputs, and integrity. Input sanitization, the guide notes, doesn't translate cleanly from traditional security. Agent inputs are freeform and unpredictable, so schema and length checks only go so far, and the advanced answer is techniques like spotlighting (clearly delimiting untrusted content) and classifier-based filtering. Output filtering flips the goal from protecting the agent to preventing data loss, escalating to human-in-the-loop approval for high-risk actions, which the guide calls valuable at any tier and non-negotiable for consequential ones.
Configuration integrity gets a refreshingly pragmatic treatment: agent configs deserve the same version control, review, and signing rigor as application code, because a tampered config can be as damaging as a code vulnerability and is often easier to exploit. And at the infrastructure layer, the guide makes a counterintuitive call to turn on automatic updates wherever an automated update outage is tolerable, because manual approval steps add delay, and delay is now the primary risk. Signed updates from a trusted supplier should flow through automatically; unsigned changes should be rejected outright.
It closes on governance: technical controls only enforce what policy defines. Without clear acceptable-use and incident-response policies, teams make inconsistent calls and Shadow AI quietly bypasses every other control on the list. The mature end-state is moving policy enforcement out of periodic reviews and into automated checks embedded in deployment pipelines.
Control domain | Foundation | Enterprise | Advanced |
|---|---|---|---|
Agent identity & service auth | Cryptographic IDs; short-lived tokens | X.509 certs + lifecycle; mutual TLS | HSM/TPM identity; attestation |
Access & privilege | RBAC, deny-by-default | ABAC; dynamic elevation | Continuous authz; JIT/JEA |
Isolation | Identity-based isolation | Per-agent sandboxing | Confidential computing |
Observability & tracing | Logs + request IDs | Immutable trails; distributed tracing | SIEM streaming; full provenance |
Monitoring & response | Thresholds + triage | Learned baselines; auto-containment | ML analysis; SOAR playbooks |
Input / output controls | Schema + pattern filters | Attack-pattern & semantic filtering | Classifiers; HITL approval |
Integrity & governance | Versioned configs; policies | Signed configs; governance board | Immutable infra; policy-as-code |
Identity is the prerequisite. No verifiable identity means no real access control, audit, or attribution.
The floor moved up. Static API keys and rotation policies are now a gap, not a baseline.
Least Agency is the through line. Minimum permissions, scoped to the moment, expired automatically.
Isolate at the receiving end. Identity-based isolation first; network segmentation is only a backstop.
Automate notes, not judgment. Let models prep and draft; keep humans on containment and disclosure.
The through line of the implementation guidance is that none of this is exotic. Most of the building blocks like OAuth, short-lived tokens, version control, deny-by-default, and audit pipelines, already exist in mature security programs. The work is applying them to agents with the same rigor you apply to production code, and doing it before agents are reaching real systems rather than after.
Visit the Pomerium docs to learn more about how Pomerium can help you achieve Advanced levels of agentic security.
Stay up to date with Pomerium news and announcements.
Embrace Seamless Resource Access, Robust Zero Trust Integration, and Streamlined Compliance with Our App.