Beyond Coding

Three ways to use your Agent-powered IDE besides coding

Jul 03, 2025

Last week, I realized I was spending more time in Claude Code & Cursor than the suite of tools we use outside of these two. This wasn't because I was writing more code, it was because I'd found more productive workflows for these tools that cover many of the tasks around software development.

Most of the focus is on using tools like Claude Code and Cursor for writing code and debugging, I've found three specific workflows where they've become indispensable for our engineering operations. These aren't about generating code, they're about accelerating the processes that often become bottlenecks in software teams.

1. PR Review Augmentation & Automation

In my post about sidestepping the next bottlenecks, I identified PR reviews as an emerging constraint. We went from doing ~40 PR reviews per week to ~80 per week with the same team. In that time we added Cursor as an approved IDE. Some growing pains to say the least, but given they have more AI generated changes to review so attention is required.

We've built a PR review script that changes how we approach code reviews:

The Review Framework

Our Claude Code command analyzes every PR across four risk dimensions:

Security Risk (0-10): Vulnerabilities, data exposure, auth issues
Product Risk (0-10): User impact, feature completeness, backwards compatibility
Engineering Risk (0-10): Code quality, maintainability, performance
Feature Flag Coverage (0-10): How well changes are protected by feature flags

What This Looks Like in Practice

Our abbreviated review Claude Code command:

# PR Review Command

Perform a comprehensive pull request review that includes both code quality assessment and risk analysis.

## Review Process

1. **Analyze Changed Files**
   - Get list of changed files in the PR. Focus on actual changes, not entire files
   - Classify change types (docs, tests, code)

2. **Feature Flag Analysis**
   - Detect feature flag usage patterns:
     - `checkFeatureFlag()` calls
   - [Other rules around feature flags]

3. **Risk Assessment**
   - **Security Risk (0-10)**: Vulnerabilities, data exposure, auth issues
     - 0: No security risk (docs, tests, non-production config)
     - 5: Most PRs should be around this level
     - 10: High risk (new endpoints, auth changes, data model changes)
   - **Product Risk (0-10)**: User impact, feature completeness, backwards compatibility
     - [Scoring rules for Product Risk]
   - **Engineering Risk (0-10)**: Code quality, maintainability, performance
     - [Scoring rules for Engineering Risk]
   - **Feature Flag Coverage (0-10)**: How well changes are protected

4. **Code Quality Review**
   - Follow repository coding standards (call out for patterns)
   - Check for Svelte 5 migration patterns (preferred) vs Svelte 4 (avoid)
   - Verify proper TypeScript usage
   - Assess test coverage and quality

## Output Format

<code>markdown
## PR Review Results

### Risk Assessment
- **Security Risk**: [score]/10 - [brief explanation]
- **Product Risk**: [score]/10 - [brief explanation]
- **Engineering Risk**: [score]/10 - [brief explanation]
- **Feature Flag Coverage**: [score]/10 - [brief explanation]

### Feature Flag Analysis
- **Detected Flags**: [list of FF_* flags found or "None"]
- **Coverage Assessment**: [none/partial/good/excellent]

### Code Quality Feedback
[Detailed review focusing on code quality, potential bugs, performance, security, tests]

### Action Items
- [ ] [Specific actionable feedback items]
</code>

## Key Considerations
- **Feature Flag Impact**: Well-protected changes (score 7+) can reduce other risk scores
- **High-Risk Patterns**: New endpoints, auth changes, data model modifications
- **Svelte 5 Migration**: Encourage modern patterns over legacy Svelte 4

Instead of reviewers starting from scratch, they get a structured analysis:

## PR Review Results

### Risk Assessment
- **Security Risk**: 3/10 - New API endpoint but uses existing auth patterns
- **Product Risk**: 6/10 - Changes user-facing form validation messages
- **Engineering Risk**: 4/10 - Refactors existing utility functions
- **Feature Flag Coverage**: 8/10 - All UI changes protected by FF_NEW_VALIDATION

### Feature Flag Analysis
- **Detected Flags**: FF_NEW_VALIDATION, FF_ENHANCED_ERRORS
- **Coverage Assessment**: Good - user-facing changes are gated
- **Recommendations**: Consider adding flag for API response format change

### Code Quality Feedback
[Detailed analysis of Svelte 5 migration compliance, TypeScript usage, test coverage]

Why this approach works for us

First off, it's tailored to our team's needs. If you use the script above, you may not find it useful so I highly recommend you build your own. Feel free to use the script as a starting point. It is abbreviated for this post. It calls out specific focus areas [Any changes on these routes are high risk] for example.

The second reason is that it's a great way to get new engineers up to speed on the team's standards and combine the internal monologue that codeowners have about the codebase when reviewing PRs. It's a great way to get them familiar with the codebase and the team's standards.

A bit of history. We had the first 3 assessments for about 6 months and then we added the feature flag assessment. We've been using it for about 1 month and the intention is to lower the risks of the other assessments. Using feature flags significantly reduces the Change Risk if we can turn off breaking features. Shout out to Gene Kim and The DevOps Handbook for the inspiration.

The Impact

This approach has two immediate benefits:

Faster Reviews: Reviewers focus on high-risk areas instead of scanning everything. Direct relative links are provided right in Claude Code
Consistent Standards: Every PR gets evaluated against the same criteria and the mind-meld that happens when codeowners get together

The next step is automated approval for low risk PRs. Examples include documentation updates, test additions, and well-flagged features that score consistently low across all risk dimensions.

2. Task Enrichment

The second workflow addresses what the Shape Up methodology calls shaping, basically transforming raw problems into well-defined work. We shoehorned this to mean "enriching" tasks. This is especially critical for what we call "Goalie tasks". These are urgent issues, bugs or errors that arrive with incomplete information.

The Enrichment Process

When a bug report or feature request lands in our system, our Claude Code command:

Extracts the task ID from the ClickUp URL
Analyzes error logs, stack traces, or technical descriptions
Searches the codebase for relevant context
Updates the task with structured information

I put these in the context of Claude Code, but all these processes can be implemented in Cursor Rules, Windsurf, Roo Code, Codex. Pick your flavor, I’m trying to get across the shift in mindset and some practical examples.

From This...

"Login not working for some users"

To This...

## Problem:
Authentication failing for users with special characters in email addresses. 
Error occurs in email validation regex at src/auth/validation.ts:34.

## Task:
Update email validation to handle international characters and edge cases.

### Acceptance Criteria:
- [ ] Fix regex pattern to accept RFC 5322 compliant email addresses
- [ ] Add unit tests for edge cases (unicode, plus signs, etc.)
- [ ] Update validation error messages for clarity
- [ ] Verify no existing users are affected by changes

### Code Examples:
<code>typescript
// Current problematic code (src/auth/validation.ts:34)
const emailRegex = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/;
</code>

<code>typescript
// Proposed solution
const emailRegex = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
</code>

### Shaped Work:
- [ ] Update validation regex (~2 hours)
- [ ] Write comprehensive tests (~3 hours)  
- [ ] Update error messages (~1 hour)
- [ ] Test with existing user data (~2 hours)

Why This Matters

Developers can now pick up tasks and start working immediately instead of spending the first hour figuring out what needs to be done. The context search finds related code, recent changes, and potential causes automatically.

With cleaners PRs and better shaped tasks, our focus shifts to the operational side of things.

3. Operations Intelligence

The third workflow connects AI tools with monitoring infrastructure through MCPs (Model Context Protocols). Instead of writing LogQL queries or digging through Grafana dashboards, our operations team can have natural language conversations with the monitoring data.

Simply asking the model to categorize issues over the last few hours is enough to get a good start on an issue. It allows for targeted investigation and the reasoning models are great at prioritizing which issues are the most important.

The Operations Workflow

When an alert fires or metrics look concerning:

Engineer opens Claude Code or Cursor with our Grafana MCP connected
1. We wrote this MCP in-house to connect with our Grafana instance and work around the auth issues currently plaguing the MCP ecosystem.
Describes the issue in natural language
1. The tool rewrites the query in logql
2. You should spend time here making sure it can consistently find the data you're expecting.
The Agent pulls relevant logs, identifies patterns, and suggests next steps
Engineer can immediately create properly formatted tasks, escalate to appropriate teams, or fix the issue themselves.

The MCP Advantage

This isn't just log analysis, it's the connection between operations and engineering. Those handoffs are often fraught with friction and using a custom MCP and workflow is a great way to get the data your ops and engineers need to make decisions. The AI can prioritize issues based on user impact, correlate problems across different services, generate runbooks for common issues and incidents, and create detailed incident reports automatically.

The Pattern

All three workflows share a common pattern: they use AI to accelerate processes that traditionally require significant human time and expertise. They're not replacing human judgment, they're providing structured information so humans can make better decisions faster.

When you look at the last 6 months at 50,000 feet (or lines of code), the human is still the best at making decisions, having taste, context and agency. The AI is great at finding patterns in data, search and memorization. Pesky bitter lesson again.

What's Coming Next

I expect these workflows to evolve rapidly:

PR Reviews: Full automation for low-risk changes
Task Enrichment: Real-time shaping as issues are created
Operations: Predictive alerting and automated incident response

The teams that figure out these workflows will have advantages in delivery speed and operational efficiency.

I hoped to share what was working for us, but we haven't just applied it to Engineering problems. These three workflows represent a shift in how we think about engineering productivity. We've applied similar patterns to SEO optimization, security log analysis, IT ticket creation and financial reconciliation—each time focusing on removing expertise barriers rather than replacing human judgment.

The teams that master these AI-augmented workflows won't just ship code faster, they'll operate on fundamentally different problems with less friction across their entire org.

We'd traditionally held out Engineering to be mainly Product-focused and it still is, but the lines are blurring. Every technical decision now has a build or buy component because the cost to create custom solutions is lower than ever and SO SO much can be done in the Agent powered IDEs.

What other business processes do you think could benefit from AI augmentation? I'd love to hear about workflows you're experimenting with.

Engineered Intelligence

Discussion about this post