Pomerium secures agentic access to MCP servers.
Learn more

Your Employees Are Already Dumping Company Data to LLMs (Here’s What To Do About It)

Share on Bluesky
Pomerium Image

It's happening right now, in your organization. That senior developer just pasted your global auth tokens into ChatGPT to debug a tricky race condition. Your data analyst uploaded last quarter's customer churn data to Claude to help write their board presentation. Your product manager is feeding competitive analysis docs to Gemini to brainstorm feature ideas.

They're not being malicious. They're being practical. The productivity gains are simply too compelling to ignore.

I've been tracking this phenomenon across dozens of security and engineering organizations, and the pattern is remarkably consistent: AI adoption happens from the bottom up, completely bypassing IT governance. By the time leadership notices, hundreds of employees are already deep into their workflows with these tools.

Last month, I sat in on a security review at a Fortune 500 tech company. The CISO proudly showed off their new policy blocking ChatGPT at the corporate firewall. Meanwhile, I watched an engineer at the next table paste code into Claude on their personal laptop, tethered to their phone.

Simon Willison, reflecting on his own experience, notes that the new de facto workflow is to “start a new chat by dumping in existing code to seed that context" — then iteratively have the LLM modify or analyze it. New features even encourage this: Claude's latest Projects mode lets you import entire GitHub repos into context (a capability Willison says he's "using a lot"). In short: people aren't waiting for official tooling. They're hacking the process themselves to get answers now.

Each of these involves sharing sensitive internal data with external AI services. Your intellectual property, customer data, and trade secrets are already flowing through these models. The question isn't whether this is happening – it's whether you have any visibility or control over it.

The Inevitable Leak — and Why Bans Backfire

When an employee pastes proprietary code or sensitive customer data into a public model, that data has left the building. This is the "Shadow AI" problem that keeps CISOs up at night. The CISO of JPMorgan Chase recently warned organizations totrack what data is being shared with which models,” because right now, most companies are flying blind.

But bans don't just fail — they make the problem worse:

  1. Usage goes underground. Employees switch to personal devices and accounts, creating zero audit trail.

  2. Security practices degrade. People email code to their personal accounts or use sketchy third-party tools to get the job done.

  3. Morale takes a hit. Nothing says "we don't trust you" like blocking the most significant productivity tool in a decade.

And data leakage is only half the story. The other risk is what the LLM does with the data once it has it. Karpathy has described advanced LLMs as "people spirits"—incredibly capable but lacking human judgment, prone to manipulation, and slavishly obedient. A clever prompt injection attack can turn a helpful assistant into a malicious agent, tricking it into revealing secrets or executing unauthorized actions.

Every uncontrolled integration point is an attack surface. The traditional security response — ban it, block it, write a stern policy — is theater. It's like trying to ban Google in 2005. The tools are too useful.  The workarounds are too easy. Your employees will find a way.

Implementation Patterns That Work

I've helped several organizations implement this pattern. Here's what works:

  • Start with a pilot program. Pick a team that's already using LLMs (they'll be your champions) and build the gateway around their use cases. One fintech that started with their data science team – within three months, 10x more teams wanted in..

  • Make it faster than the alternatives. If your secure gateway adds friction, people will bypass it. One trick: pre-negotiate enterprise agreements with OpenAI/Anthropic for higher rate limits. Your gateway can perform better than personal accounts.

  • Build on what you already have. You likely  have an identity provider and logging infrastructure. Extend these instead of starting  from scratch.

So let's talk about what actually works.

Security Infrastructure for our Agentic Reality

A minimal viable architecture looks like this (for both requests and responses):

Custom
[User] → [CLI/IDE Plugin] → [Agentic Access Gateway] → [Policy Engine] → [LLM Provider]
                                    ↓              ↓
                               [Audit Logs]   [DLP / PPL / OPA]

This is an access proxy, similar to  identity-aware proxies many orgs use for internal apps - but purpose-built for LLM context loading. All requests from an LLM (or an LLM-powered agent/tool) to internal data must funnel through this gateway, which enforces:

  • Strong Authentication & Authorization: Every request must carry a verified identity (of the human and/or service behind it) and be checked against permissions. In Karpathy's vision of future AI infrastructure, a key requirement is to “authenticate AI agents and verify their permissions." If Alice can access the data, her AI assistant can. If not, it can’t.

  • Scoped, Least-Privilege Access: Only allow what’s necessary. If an LLM needs to see one database record or file, it  shouldn't get the whole system. This requires fine-grained access control. As one proposal for an LLM proxy put it, we need “policy-based access control to individual [AI] tools and methods" exposed to the model. Your gateway should enforce those scopes rigorously.

  • Comprehensive Audit Logging: Every piece of data that flows to an LLM must leave an auditable trail. This is non-negotiable for compliance. The JPMC CISO explicitly calls for logging these AI interactions – companies should “retain visibility into which internal data was shared with an AI, by whom, and when". The secure gateway should log each context load: which user or service requested it, what was retrieved, and which model it was sent to.

  • Content Safety Checks: Ideally, the gateway can also perform on-the-fly scanning or filtering. Even simple guardrails like automatically stripping out things like API keys or customer PII from prompts can prevent accidental leaks. The gateway is a natural point to enforce such data loss prevention rules before the data ever reaches the model.

  • Rate Limiting & Abuse Prevention: The gateway can throttle and sandbox the AI's access to prevent overload or abuse. Karpathy mentions the need to "implement rate limiting and resource controls to prevent abuse" in these AI integrations. This protects backend systems and contains the blast radius of any prompt injection attack.

In short, the secure gateway acts as a chaperone between the AI and your crown jewels. If that sounds analogous to a zero-trust agentic access gateway, that's because it is! We're applying proven security principles to this new context-loading interface.

How This Looks in Practice

This isn't theoretical. Forward-thinking orgs are already doing this.

  • Case 1: A FinTech Startup. They discovered developers were pasting production queries into ChatGPT to optimize them. They built a simple gateway that automatically sanitized the queries, replacing real customer IDs and financial data with synthetic data before sending them to the LLM.
    Result: Faster debugging with zero data leakage.

  • Case 2: A Healthcare Tech Company. Bound by strict HIPAA compliance, they routed all LLM requests through a gateway that enforced access to their HIPAA-compliant Azure OpenAI deployment and logged every interaction for audit.
    Result: Maintained compliance while improving clinical documentation quality by 40%.

  • Case 3: A B2B SaaS Platform. Their gateway integrates with their identity provider (Okta). When an engineer's AI assistant requests access to a code repository, the gateway enforces the same read/write permissions that the engineer has.
    Result: The security team has a single pane of glass to monitor all AI-driven access, making threat modeling much easier.

Common Pitfalls to Avoid

  • The Optional Gateway. If the secure gateway is opt-in, it will fail. The path of least resistance must be the secure path. This means blocking direct access once the gateway is stable and proven.

  • The Friction Tax. If your gateway adds significant latency or a frustrating auth loop, your developers—the most creative problem-solvers you have—will find a way around it. Invest in making it fast and seamless.

  • The "Boil the Ocean" Plan. Don't try to build the perfect, all-encompassing system from day one. Start with simple authentication and logging for a single, high-leverage team. Iterate from there.

Your Action Plan for Next Week

If this resonates, here’s what to do:

  • Day 1: Investigate. Talk to your team. Ask your top 5 members of each team how they really use LLMs. Don't be judgmental; be curious. You'll be surprised.

  • Day 2: Architect. Whiteboard what a minimal viable gateway would look like using your existing identity provider (Okta, Azure AD) and identity-aware gateway (Pomerium, etc).

  • Day 3: Propose a Pilot. Identify one friendly team and one internal data source to protect with a proof-of-concept. Your goal is a quick win that demonstrates value.

The mandate for engineering leaders and security teams is blunt: your colleagues are already feeding company data to AI tools – you can't stop that tidal wave, so you need to channel it safely. Hoping that employees won't use ChatGPT or that contractors won't run an LLM on that client dataset is wishful thinking at best.

This is an industry-wide call to action. We have an opportunity right now to establish safe patterns before more serious incidents occur. As Karpathy and others have emphasized, partial autonomy and AI agents can be tremendously powerful – but they absolutely must be bounded by proper guardrails. "Keep AI on a leash," as Karpathy put it in a recent talk, not to limit its usefulness, but to ensure it works for us and not against us. In practice, "the leash" is an identity-aware gateway with strict policies.

So if you're a developer or architect excited about integrating LLMs deeper into your stack: go for it, but do it safely. Build that secure gateway into your design from day one. If your organization doesn’t have a plan for this, be the one to kick off that conversation. The longer you wait, the more likely that sensitive data will slip through uncontrolled channels or that an AI agent will make an unsupervised misstep.

Given how quickly this technology is moving, "later" is not an option - the secure pathway needs to be implemented now. You can't dam the flow of data into AI, but you can route it through a secure, well-monitored channel. Let’s choose the secure path - before it chooses us.


What's your experience with LLM adoption in your organization? I'm collecting patterns and anti-patterns for a follow-up post. Drop me a note.

Stay Connected

Stay up to date with Pomerium news and announcements.

More Blog Posts

See All Blog Posts
Blog
June 2025 MCP Content Round-Up: Incidents, Updates, Releases, and more!
Blog
Asana's AI Connector Leak Exposed Sensitive Data Across Organizations: What It Means for MCP Security
Blog
5 Actionable Zero Trust Patterns from NIST SP 1800-35 (and How to Implement Them)

Revolutionize
Your Security

Embrace Seamless Resource Access, Robust Zero Trust Integration, and Streamlined Compliance with Our App.

Pomerium logo
© 2025 Pomerium. All rights reserved