Note: This is a work in progress documenting our evolving approach to managing AI-accelerated development.
We are shipping code more quickly as tools like Cursor and Claude Code become part of everyday development. Speed is helpful, but it uncovers a fresh set of bottlenecks that aren't obvious until the release train is already moving.
Code Reviews
When pull requests arrive faster than people can read them, review time becomes the new constraint. To help codeowners focus their attention, we've started tagging every PR with a simple risk assessment. The ultimate goal is to have low-risk changes (tiny refactors, copy tweaks, documentation) merge automatically once tests pass. Everything else waits for human review. This way, attention goes where it's most useful rather than being spread thin over every line of code.
We're also exploring ways to augment the review process beyond automation. For instance, quality acceptance needs to verify that implemented changes actually do what was intended without side effects. One approach we're exploring involves executing reviews on the development machine, with an AI agent creating and executing test plans via Playwright.
Feature Flags
Wrapping new code in a feature flag lets us merge early without activating changes for users. Flags are cheap to add, so we use them by default. In the future, if a change is behind a flag and marked low risk, we'll enable the ability to skip manual review entirely.
The key is breaking the build when developers don't use feature flags properly. This allows integration into production as early as possible while enabling controlled testing with development, alpha, or beta users.
Tests and Types
A reliable test suite and strict type checks are the guardrails that make faster reviews possible. These create natural evaluation criteria for model-generated code. I'm using "evals" loosely here, but the process is quite similar to hill climbing on evaluations. I've found that when using Opus with a TDD approach, the model often nails it in zero to few shots.
Generative tools write many tests for us. While some are imperfect, overall coverage improves significantly. We treat type checker failures the same as broken builds, keeping mistakes visible and forcing immediate fixes.
Operational Support
More releases mean more chances for something to break in production. We've connected runtime logs and traces to an analysis Slack bot via MCP. The backbone is a private Grafana instance that receives log drains from all our services. The MCP tool pulls error traces directly into context, which has shifted most of our Time to Resolution toward detection rather than diagnosis.
Keeping the Codebase Cohesive
As velocity climbs, it's easy for architecture to drift and for the engineering team to lose sight of what the architecture should be. We limit drift in two ways:
Short summaries on every PR: A bot posts a brief "what changed and why" note in chat, so the whole team can stay informed without digging through diffs.
Standard code actions: Pre-commit tools adjust imports, logging, and formatting to ensure both human and AI authors follow the same conventions.
Moving Forward
Speed is only useful when it helps teams deliver stable, understandable software. We're not racing toward a brick wall. By trimming review queues, leaning on flags, and automating as much busywork as we can, we're trying to keep our focus on the problems that matter most to our users rather than on the natural process friction that occurs in engineering teams.
Comments welcome—I'd love to hear what's slowing your team down and what you've tried.