When Your Harness Becomes Your BI Team for a Day

Building a local data lake in a day

Jan 28, 2026

Yesterday I needed to answer a question that should have been simple: “How much does it cost us to serve each client?”

Engineering costs are scattered across seven services. Anthropic, OpenAI, MongoDB Atlas, AWS, Vercel, GitHub Actions, Cloudflare. Some bill by API key. Some by project. Some by account. MongoDB has a mix: single-tenant clusters for some clients, a shared cluster where others split one database by document count ratios.

The traditional path: stand up Redshift or Astronomer. Build DAGs. Write extractors for each API. Wait for it to make it through the priority list. Then figure out attribution logic for the shared services.

I had the first report running in a day.

The Setup

This isn’t magic. It’s the same medallion architecture that data teams have used for years: bronze (raw) → silver (curated) → gold (presentation). The difference is what does the work.

Each data source gets a skill. The skill knows how to authenticate, fetch usage data, and wrap it in a standard envelope:

{
  "event_id": "uuid",
  "event_time": "2026-01-27T10:15:00Z",
  "source": "openai.usage-api",
  "period": { "start": "2026-01-01", "end": "2026-01-31" },
  "payload": { /* raw API response */ }
}

Bronze layer: raw responses go to /raw/openai/usage/2026-01-usage.json. Immutable. Append-only. The skill checks if data exists before fetching and skips if it does.

Silver layer: normalizes everything into a common schema. Client, service, cost, breakdown. Attribution rules live in a single client-mappings.json file that maps API keys to clients, project IDs to clients, AWS accounts to clients.

Gold layer: reports pull from silver, calculate totals, and look for anomalies.

The Pagination Bug

Same day, I noticed OpenAI costs seemed low. I knew roughly what our spend should be from the dashboard. The numbers didn’t match.

The OpenAI usage API paginates. My first skill fetched page one and stopped. Seven days of data instead of thirty-one.

“The openai costs look low, check the traces” Claude Code looked at the skill, found the bug, fixed it, re-ran. Done.

This is the part that’s hard to explain to people who haven’t experienced it. The feedback loop compresses from days detecting anomalies in dashboards to minutes. Not because AI is faster at writing code. Because the context is already loaded. The skill, the data structure, the expected output, the anomaly. Everything needed to debug is in scope.

Attribution Is the Hard Part

Every BI project I’ve seen stumbles on attribution. Engineering infrastructure is messy:

Single-tenant: Some clients have dedicated MongoDB clusters. Every dollar goes directly to them.

Multi-tenant with tagging: OpenAI projects map to clients. Project IDs in the mapping file.

Shared with usage splits: The main MongoDB cluster hosts multiple clients in one database. We calculate storage ratios from document counts, then apply those percentages to the cluster cost.

In a traditional pipeline, each attribution method would be a separate transformation. Different teams own different services. Someone has to coordinate the logic.

With skills, the attribution logic is co-located with the data fetch. Change the mapping, re-run, see results. The iteration speed changes what’s possible.

The Numbers

I’ve captured the major cost drivers: Compute (AWS), Storage (MongoDB), and AI (OpenAI, Anthropic). There are still smaller shared costs to intake, but this covers the bulk.

December 2025 showed one client running 141K messages with healthy unit economics. A pilot client: 12 messages, where infrastructure overhead wasn’t amortized yet. That’s the kind of insight that takes a while to surface through traditional BI. I had it on day one.

The report also flagged opportunities: transfer costs to audit, three low-usage databases as consolidation candidates, data egress to investigate.

What This Isn’t

I want to be clear about what didn’t happen.

I didn’t learn each API’s documentation upfront. I let Claude Code set up the extractors, then compared the output against each service’s billing dashboard to verify. When numbers matched, I moved on.

I didn’t build something production-grade. There’s no scheduling, no alerting, no retry logic. It’s a local, Engineering Costs data lake on my machine.

I didn’t replace a BI team. This answers one question. A real data platform answers thousands.

But that’s the point. The question I needed answered didn’t require a real data platform. It didn’t require me transferring my requirements to a different team and fit into their work queue. It required pulling seven APIs into a common format and doing some math.

The Process

I started with one planning document. Wrote out what I thought I needed: data sources, attribution types, directory structure, skill architecture.

Then iterative feedback. Build the Anthropic skill. See what works. Update the plan. Build OpenAI. Discover the pagination issue. Fix it. Add MongoDB. Realize I hadn’t articulated the shared cluster complexity. Add split logic. Update the plan again.

The plan document became a living spec. Requirements I failed to articulate upfront got discovered and captured as I went.

What’s Next

Moving to S3 will give BI teams access to the bronze/silver layers. They can build their own gold reports without waiting on me.

For scheduling, I’m planning to use Claude’s agent SDK with additional skills: Slack for notifications, GitHub for filing issues. When a job fails, it won’t just dump log traces. It’ll include what broke and why.

This isn’t something most engineering teams need. It’s a business function that faces Engineering → Leadership and Commercial. COGS by client feeds margin analysis, pricing decisions, resource allocation. The audience is finance and leadership, not developers.

The pattern generalizes. Skills as extractors. Medallion for structure. Attribution as configuration. I’ve used the same approach for Cloudflare Zero Trust rollouts (skills wrapping their API for policy management) and compliance evidence gathering (skills pulling audit logs from multiple systems into a common format). Different domains, same harness.

Hit reply if you want to see the implementation plan.

Engineered Intelligence

Discussion about this post

Ready for more?