Zero Trust Launch Day: Search Over Prediction
How Claude Code turned a chaotic rollout into a one-hour fix
Last month we switched our first client team from our legacy VPN to Cloudflare Zero Trust Enterprise. We knew the launch would break things. We just couldn’t predict what.
Zero Trust is aggressive by design. It inspects traffic, blocks unknown destinations, enforces policies. Great for security. Brutal for day-one discovery.
Within minutes of go-live, agents started reporting tasks getting stuck. Some couldn’t reach client portals. Others hit walls on third-party integrations we’d never inventoried.
The Prediction Problem
We did the traditional playbook. Tested with a small group. Built spreadsheets. Asked teams what tools they used.
The list was incomplete almost immediately.
This client team spans multiple lines of business. Our agents are spread across the Philippines and LATAM, hitting different data centers for the same tools. A test group in Iloilo doesn’t surface what breaks for someone in GDL connecting to a different CDN edge.
People don’t remember every SaaS tool they touch. They don’t know the CDN behind the vendor portal. They can’t tell you which analytics pixel fires on that client dashboard.
Search > Prediction
Rich Sutton’s Bitter Lesson: the two most powerful approaches in AI are search and learning. We leaned into search.
We’d built a Claude Code skill wrapping Cloudflare’s Zero Trust APIs. When blocks started hitting, I asked Claude: “What third parties are getting blocked?”
tsx tools/cloudflare-tools/src/cli.ts zt-blocks 1
This queries Cloudflare’s Gateway logs for the last hour, filters to blocked requests only, groups them by destination host, and sorts by frequency. The data exists in Cloudflare’s dashboard. It’s just buried across multiple screens with filters that don’t remember your preferences. The skill flattens that into one command.
Output: grouped list of blocked hosts, sorted by frequency, with affected users.
In minutes, we had the actual blocklist. Not a predicted blocklist. The real one, from production traffic.
The Recovery Loop
Agents had a safety valve: toggle WARP off, continue operations, report the issue. This meant blocks didn’t halt work entirely. But we wanted shields back up fast.
We had a go-live team on site and online. The loop:
Agent reports stuck task
Claude searches gateway logs for that user’s blocks
I review the blocked hosts
Claude adds legitimate services to the split tunnel exclude list
Agent re-enables WARP
The skill could read and write. We already knew what needed to be allowlisted from the search results. Might as well have Claude update the policy directly.
Time from first report to full resolution: under one hour.
Without the skill, I’d be clicking through Cloudflare’s dashboard, filtering logs manually, cross-referencing users, then navigating to a different screen to update policies. With it, Claude did the search and the writes. I made the judgment calls.
The Skill Architecture
Our cloudflare-tools skill is two layers:
Instruction layer (SKILL.md): tells Claude when to invoke and what commands exist
Execution layer (TypeScript CLI): wraps Cloudflare’s APIs for gateway logs, split tunnels, WAF rules
When I mention “Zero Trust blocks,” Claude matches the skill and knows to run the CLI. The tool queries Cloudflare’s GraphQL API, groups results by host, surfaces the patterns.
User prompt → Skill match → CLI execution → API query → Grouped results
Claude interprets. I decide. Policies update. Agents resume.
What Worked
Giving agents the escape hatch. Toggle-off capability meant blocks were friction, not failures. They could report accurately because they could continue working.
Searching production, not predicting requirements. An hour of real traffic taught us more than weeks of auditing.
Claude as the search layer. The skill turned “what’s broken?” into a 30-second query instead of a 30-minute dashboard crawl.
What’s Next
First client team is fully launched. We’re onto the second one now, with a more complicated setup: split tunneling requirements, DLP policies, tighter controls. Different constraints, same playbook.
This is going company-wide. A couple dozen device profiles, security profiles, client-specific policies. Each rollout will surface new blocks. The skill captures what we learn and makes it searchable for the next round.
The broader pattern: stop trying to predict everything. Build the instrumentation to search production and launch with safeguards. Let the real traffic tell you what you missed and react quickly.


