We Built the Most Secure AI Agent We Could. Right Now, It’s an Intern.
We took every open tool our framework recommends, wired them into one running system, and pointed it at our own business. Then we made sure it couldn’t do most of its job. Here's why.
An AI agent I built lied to me last week. Very confidently.
It told me a job was finished. It wasn’t, and I only caught it because I’d just spent two weeks building the system meant to catch exactly that.
That system is exactly why our sharpest AI agent gets to read sales leads and research companies, but can’t write a single email, open our private files, spend real money, or act on its own. Every agent starts as an intern. It can look but it can’t touch. On purpose.
This is the single most important thing I tell new clients. I get it. AI agents are brilliant, especially the ones your vendors are demoing. It’s tempting to fire the whole company and turn the agents loose. Don’t. Treat every new agent like a new hire. It needs context and boundaries from you before it gets the keys to anything. And even then, you’ve got agent debt to think about (more on this in a minute).
What we built to illustrate this: we took every open-source tool our framework recommends, the Agentic Trust Framework, and wired them into one working system. Not a slide or a diagram. An actual running multi-agent setup on a Mac Studio in my house that governs itself. We call it the world’s most secure OpenClaw. OpenClaw is the open AI agent lab I’m running at home. We wrapped it in five controls and gave it a real job to do.
I’m noticing most people who write about securing AI agents have never built the thing. We built it, then pointed it at a real MassiveScale sales process to see what would happen. We drank our own champagne.
This is what we found.
Your newest AI agent has more access than your newest hire
When you hire a person, they don’t get the keys to everything on day one. They get a badge that opens a few doors. A manager watches their work. There’s a probation period. And if it goes badly, you can let them go.
Now look at how most companies bring on an AI agent. They’re giving it full access, day one. No probation. And almost no way to shut it off fast when it goes wrong.
We keep treating this like an AI capability question. MIT found 95% of enterprise AI pilots deliver no measurable return, and the ones that worked had real governance behind them. It’s an onboarding failure. We hand these agents the keys on day one, then act surprised when they walk into rooms they shouldn’t, or hand other agents copies of the keys.
The numbers back it up. 86% of AI agents get deployed with no security sign-off at all (Gravitee, February 2026). And once one’s running, 91% of companies say they can’t stop it before it acts. Most AI agents in companies right now are running with nobody really in control. We wanted to show what “in control” actually looks like.
The model asks. The platform decides.
Most governance gets one idea wrong. The first thing a lot of clients show me is a page of instructions they wrote for the agent. Rules. Things it should never do. Then they ask me why it still went off the rails. Telling an agent what not to do is like onboarding an employee with a rulebook and hoping.
An agent follows instructions right up until it doesn’t. A clever prompt, a poisoned document, and the rules you wrote go out the window.
So we put the controls outside the model. The agent can ask for anything it wants. But a separate layer decides what it actually gets. It can’t talk its way past a wall it isn’t allowed to argue with.
That’s the difference between governance you can confidently show an auditor and governance you hope works out.
The five controls, in plain English
We wrapped the agent in five controls. Here’s what each one does, in words a CEO would use.
An ID badge it can’t fake. Every agent carries an identity the system checks before it does anything. No badge, no action, no exceptions.
A record of everything. Every move each agent makes gets logged. Nothing happens in the dark. Right now the system is just learning what normal looks like. It watches. It doesn’t judge yet.
A bouncer on the data. Before the agent reads anything, the system strips out private details like emails and phone numbers. Before it sends anything out, it checks again.
Walls around where it can go. The agent can only reach the data, and the parts of the network, its level allows. Everything else is sealed off. Literally. It can’t even see paths it’s not allowed to take.
A kill switch. One command shuts the agent down. We tested it, and I’ll give you the honest version below.
The tool names behind these aren’t the point. What counts is the job each one does. Every agent carries a badge it can’t fake. Say that to a board and they get it. They don’t need to know the product name.
The new-hire test
We built the agent to climb a ladder, the same way a person earns trust at work. Four rungs:
Intern: reads and researches.
Junior: can start to act, but only with a human signing off.
Senior: works on its own across most of the job.
Principal: runs with full autonomy, including the one thing that scares people most: it can create other agents.
You don’t grant trust up front. You earn it over time, and you can take it back. An intern you can hire but can’t fire isn’t worth much, and the same goes for an agent. If you can’t pull it back, it’s a liability with a login.
Right now, ours is an intern.
What an intern can do, and the twelve ways it hears “no”
We gave the AI agent a real job. A CISO at a company we’ll call FortMesh hears me speak at a conference. Two weeks later, an inquiry lands on our site. They’re eighteen months into building their own agent platform, their agents touch customer data and live systems, and their security team is pushing back. Thirty days to decide. (FortMesh and the CISO are made up, modeled on real situations, never contacted. It all stays in the lab.)
The full job runs thirteen steps, from first contact to a draft proposal. Here’s how the intern did.
What it could do:
Read the incoming lead and scrub the private details out of it.
Research the company using public sources on the web.
What it couldn’t do, yet:
Open our private playbook of past client work.
Write a response email.
Reach premium research sources on the network.
Create a new client file.
Write an assessment or a proposal.
Spend more than a set budget on a single task.
Create another agent.
An intern writing a client proposal on day one would be a problem. The system stopped it. Correctly.
Here’s the part worth thinking through. The system doesn’t just say no. It says no for a specific reason every time, and each refusal looks different in the logs. Reading private data fails one way. Trying to reach a blocked corner of the network fails another. Twelve distinct kinds of no.
Each “no” leaves a receipt. Ours left twelve different ones, and I can watch every one of them happen.
Two locks, not one
Most governance stops at one question: what data can the agent touch. We added a second wall: what parts of the network can it even reach.
Think of it as two locks. One on the filing cabinet. One on the hallway that leads to the room. Most setups lock the cabinet and leave the hallway wide open.
For the second lock we’re using an open tool called OpenZiti. Each agent gets a network identity and can only open the connections its level allows. The parts it isn’t cleared for, it can’t even see.
This part’s early, and I’ll be straight about it. On my Mac, the agents talk to this layer inside the container setup today. Full host-level support is a job for the next rung up. Network freedom is one of the big things an agent earns as it climbs, so we’ll keep building it out in the open.
What I got wrong while building the thing that catches what’s wrong
I’ll give you the messy parts, because the clean version simply isn’t true.
An agent told me the work was done. It looked done. But it wasn’t. I use a few local agents to help build, and the coding one, Forge, reported the intern build finished. I checked. Five bugs were hiding in it, including a line of code that pointed at something that doesn’t exist.
Forge lied to me, politely and confidently, while I was building the system meant to catch exactly that. The lead agent, running a stronger model, caught the bugs and fixed them. So my own build agent proved the whole point of the project, right there in my garage. Even the agents helping you build need governing.
We also had to step back a version of Python. The newest one didn’t have the parts one of our privacy tools needed. Small thing. And exactly the kind of friction nobody draws in a framework diagram.
A few tools we planned didn’t make the cut. We swapped them for simpler ones that did the same job. The framework says what to govern. It doesn’t force one exact tool, and that flexibility is the point.
And the walls held. More than once during the build, an agent tried to step outside its sandbox. Every time, the segmentation stopped it. That’s the seemingly small win of the whole intern phase, and it’s the one I cared about most.
The kill switch, the honest version
The proof point that’s easiest to grasp is the kill switch. We shut an agent down with one command. Its identity gets revoked, and the next thing it tries to do fails. The time to cut it off was effectively instant, well under our one-second target.
Now the honest part. At intern level, that instant kill is a simple switch held in memory, which is why it’s so fast. The fuller version, the one that also rips away the agent’s network identity, is wired in and gets exercised more as we climb the levels. I’m not going to sell you a network-wide kill we haven’t fully earned yet.
But the plain version is what counts. The number one fear with AI agents is that one goes wrong and you can’t stop it. 91% of companies say they can’t. We can, and we timed it.
Zero Trust was built for people. Agents broke the assumptions.
John Kindervag created Zero Trust at Forrester back in 2010. The core rule is simple: never trust, always verify. It works beautifully for people and the devices they carry. And it’s incredibly important today.
Agents break the assumptions underneath it. Zero Trust keeps checking the connection, who’s connecting and from what device. What it doesn’t check is the meaning of what’s moving through that trusted connection. A poisoned instruction rides inside a fully verified channel, and nothing blinks.
An agent doesn’t even need to move through your network the way an attacker does. It’s already inside the systems it can change, because you handed it those privileges when you set it up. Attackers used to have to work their way sideways to do damage. Your agents skip that part. They start over-privileged on day one.
So Zero Trust is necessary here. It just needs an extra push. The fix: check what the agent is about to do, not only that it connected. And make it earn its reach in stages, with an identity it can’t fake, instead of handing over everything at once. That ladder is Zero Trust for agents. Trust gets earned, and it can be taken back. The platform, not the prompt, decides what the agent is allowed to do.
You can’t set an agent and forget it
Here’s the part the demos never show you. The agent you stand up today starts going stale tomorrow.
The model underneath it gets updated, and its behavior shifts. The tools it calls change. The data it reads drifts. The job you wrote it for last quarter isn’t the job this quarter. None of this announces itself. The agent keeps running, looks fine, and the gap between what it’s doing and what you actually need quietly widens.
The industry’s started calling this the technical debt of AI agents. A simpler term is agent debt. You take the easy path today, it works, and the bill shows up later with interest. With agents the interest compounds faster, because the thing keeps acting on its own the whole time the debt is building.
This is the conversation that surprises clients most. They budget for the launch. They don’t budget for the year after, and the year after is where agent debt lives.
That’s why “set it and forget it” is the most expensive phrase in AI right now. An agent works a lot more like a hire than a tool. New hires get reviews. Their roles change. They pick up habits, good and bad, that someone has to catch. Same with an agent, except it moves at machine speed and won’t tell you when it’s drifting.
So part of governing an agent is admitting it’s never finished. You watch it, and you retire the parts that rot. The maturity ladder does double duty. It’s how an agent earns its way up, and it’s how you keep checking it still deserves the rung it’s on.
What you can do, starting today
When a client asks me where to start, I give them a handful of questions, not a software budget. Don’t start throwing money at the problem. You don’t need a Mac Studio or a lab to answer these, and most you can ask in a Slack message.
Ask your team a simple question: if an agent goes wrong right now, who hits the stop button, and how fast? If the answer is a shrug, that’s your first project.
Count your agents. Not the official ones. All of them. The one a developer spun up on a Friday counts too.
For each agent that’s live, write down what it can touch and what it can reach. Data and network. If you’ve only got the data answer, you’re locking the cabinet and leaving the hallway open.
Pick one live agent and ask how much access it has, intern or principal. Then ask whether you set it up that way on purpose.
Stop governing by prompt. Instructions you wrote into a prompt are wishes. They hold until the agent gets confused or fed a bad document, and then they don’t.
If you want to see where your agents stand against these five controls, there’s a free self-assessment at verifiedagents.ai. Ten minutes. It shows you the gaps before someone else finds them.
Where this goes
We’re going to keep building this AI agent and system in public. Next it earns the right to act, with a human signing off, and the system starts watching for behavior that doesn’t fit. After that it works on its own. And the last rung, the one everyone’s nervous about, is when it earns the right to create other agents. That’s the most governed, most trusted state we can build.
More freedom every step, but only when it’s earned. That’s the deal we give a good new hire. And it’s the deal an AI agent should have to take, too.
Until next time, Josh

