How I stopped hand-fixing generated code and started building a system of AI agents that improve themselves
Table of contents
Open Table of contents
- The Core: Fix Prompts, Not Results
- My Current Setup: The System in Action
- 1. Planning Phase: Setting the Foundation
- 2. Execution Phase: Turning Plans into Code
- 3. Verification Phase: Catching Issues Before They Compound
- Getting Started: Essential Prompts
- Scaling Beyond Basic Workflows
- The Power of Composition
- The Secret Sauce: Parallel Development with Git Worktrees
- I’m Working on Several Improvements
- The Bottom Line
- Advanced Prompts & Shortcuts
- 🎯 Planning & Architecture Prompts
- 🚀 Execution & Implementation Prompts
- ✅ Verification & Quality Assurance Prompts
- 🛠️ Development Workflow Prompts
- 🔧 Advanced Claude Code CLI Commands
- 📋 Custom Slash Commands (Add to .claude/commands/)
- 🎨 UI/Frontend Development Prompts
- 🔐 Security & Best Practices Prompts
- ⚡ Automation & Hooks Configuration
- 🚨 Emergency & Recovery Commands
- 💡 Pro Tips & Keyboard Shortcuts
- 🎮 Advanced Workflow Combinations
The Core: Fix Prompts, Not Results
After months of wrestling with AI-generated code, I figured out something that changed everything for me.
The breakthrough: When something breaks, don’t patch the results. Don’t argue with Claude. Instead:
- Fix the plan
- Fix the prompts
- Fix the agent mix
So the next run works correctly from the start.
Basically, the problem is always me, not the machine. But then again, I’m the one who taught the machine to be problematic in the first place. So really, it’s me all the way down.
AI could truly replace engineers. So I built a system where AI agents manage other AI agents. Now I’m basically a middle manager for robots. (I hope they don’t find out.)
The Minecraft Analogy
Think of it like building redstone contraptions in Minecraft. Your goal isn’t to manually mine each block—it’s to build automated systems that mine, sort, and craft items for you.
The same principle applies to AI development: build a system of agents that can:
- Produce code
- Verify it
- Improve themselves over time
Why This Matters
Results are disposable, but plans and prompts compound.
When you debug at the source, those improvements scale across every future task. It transforms agents from simple code printers into self-improving colleagues.
Key benefits:
- No more hitting the same roadblock repeatedly
- Faster scaling through systematic improvements
- Experimental approach to find what works best
My Current Setup: The System in Action
I keep several Claude Code windows open, each running on its own git worktree. Here’s the team:
The Players:
- o3 and Sonnet 4: Create detailed implementation plans
- Sonnet 3.7 or Sonnet 4: Execute the plans and write code
- o3: Checks results against original requirements (uncompromising and rude even)
- Custom MCPs: Handle specialized tasks like style enforcement and library integration
AI could truly replace engineers and many jobs for that matter. So now I spend my days refereeing fights between different AI models about code quality. o3 is the harsh teacher who fails everything, while Claude just wants everyone to be happy.
The beauty: Any issues found get fed back into the plan template, not fixed inline. The system literally improves itself.
Managing Expectations
I often feel like it’s going slow because we’ve all been trained to expect things at the click of a button.
But it’s more like talking to a coworker and working out what’s going to happen, then reviewing the plan you both came up with.
The reality:
- Still takes time
- Way faster than me planning and coding everything myself
- Catches issues before anything gets written into the codebase
1. Planning Phase: Setting the Foundation
I start by giving Claude Code a high-level task, which calls over to o3 to generate a comprehensive plan.
Why o3 excels at planning:
- Asks clarifying questions to nail down job requirements
- Easiest to work with so far
- Thinks of things I miss
- Still need to guide it through the codebase and pull in focused files
Plan Output Structure
The output is a <task>-plan.md
file containing:
- My original request
- Detailed implementation plan
- Potential edge cases and considerations
- Things it thinks I’ve missed
Real example: When I asked for a form-block.vue processing update, o3’s plan included:
- GraphQL updates
- Advanced Custom Fields updates
- Learned from previous requests that there are several codebases to review
2. Execution Phase: Turning Plans into Code
Step 1: Sonnet 4 reads the plan, verifies it makes sense, and breaks it into actionable tasks
Step 2: Claude Code executes the plan using either Sonnet 3.7 or Sonnet 4
- I use Sonnet 4 when I need a great thinking model to ensure it comes out right
- I feel this post will be outdated soon as new models come out, but I digress—a reason to update the post, I guess
Critical Safety Feature
Critical instruction: Claude writes commits for each task step
Benefits:
- Either Claude or I can revert to any previous state if something goes sideways
- Lifesaver feature—always have a way to roll back
3. Verification Phase: Catching Issues Before They Compound
Once code is generated, I run it through two verification layers:
- Sonnet 4: Verifies code against the original plan
- o3: Verifies against both the plan AND the original ask
Why Dual Verification Works
This dual verification catches different types of issues:
Claude’s tendencies:
- Wants to please
- Might keep unnecessary backwards compatibility code
- Loves adding lint ignore flags
o3’s approach:
- Ruthlessly calls out problems
- Demands removal of unnecessary code
- Flags problematic lint ignores
I can laugh because this is me all over the place (facepalm).
AI could truly replace engineers. So I built a good cop/bad cop system where Claude writes friendly code and o3 tears it apart. I’m basically running a digital therapy session.
Key principle: Any issue either model finds gets baked back into the plan template—not fixed inline. They gotta fix their own messups.
Getting Started: Essential Prompts
Here are simple prompts that I have added keyboard shortcuts to make these steps faster:
Planning:
Create a comprehensive plan for [TASK]. Include implementation requirements, potential edge cases, and dependencies. Output to [TASK]-plan.md
Execution:
Execute the plan in [TASK]-plan.md step by step. Write a commit for each completed task.
Verification:
Verify the implementation against both [TASK]-plan.md AND the original requirements. Flag any discrepancies.
Helpful Resource
One thing I’ve found useful is Prompts.chat. This has worked magic, and I thank everyone who contributed to it.
What it does:
- Helps you get started on any project
- Talk to the AI and assign it a role
- Not just for programming
Scaling Beyond Basic Workflows
Specialized Agents for Complex Tasks
I’ve started encoding more complex workflows with specific agents behind MCPs (AI agents that use the Model Context Protocol to connect with tools and external systems):
Style Enforcement Agent:
- Sweeps all generated code
- Applies local style rules
- Catches issues once Claude gets into the lint/test/debug cycle
Library Integration Agent:
- Reviews generated code
- Replaces generic patterns (like retries and Thread/sleep)
- Uses internal retry library
How it works: Claude might write code that works but uses basic patterns like manual loops. Your Library Integration Agent scans that code and says “Hey, we have a better way to do this” and replaces it with your team’s established libraries and patterns. It’s like having a senior developer review junior code and say “Don’t reinvent the wheel - use our existing tools.”
API Integration Composer:
- Takes API documentation and internal business cases
- Orchestrates multiple agents to build:
- Integrations
- Tests
- Documentation
Instead of you juggling multiple tasks and making sure everything works together, this agent acts as the conductor orchestrating all the moving pieces to deliver a complete, tested, documented integration. It’s basically automating the entire “add new API integration” workflow that normally takes multiple people or multiple days.
The Power of Composition
By building a collection of small, focused agents, I can compose them into more complex workflows.
The benefits:
- Each agent handles one specific task exceptionally well
- Together they can tackle substantial features without manual intervention
- I’ve basically created a team to help me be better—or at least that’s the idea
- I’m all about trying to move faster and understand the tools I use
The Secret Sauce: Parallel Development with Git Worktrees
Here’s what makes this approach powerful: it’s essentially free to fire off a dozen attempts at a task, so I do exactly that.
Git worktrees let me open multiple Claude Code instances side by side, each building different features simultaneously.
How Parallel Development Works
Each window runs independently with its own:
- Conversation history
- Context
- Feature focus
Example setup:
- Window 1: Finishing user authentication
- Window 2: Building payment integration
- Window 3: Fixing bugs in the dashboard
I still merge manually, but I’m no longer babysitting a single agent.
AI could truly replace engineers. So I built a system where multiple AIs work on different features at the same time. It’s like having a team of interns who never get tired, never complain, and never ask for raises or complain about work-life balance.
The Core Philosophy
The key insight: I resist the urge to fix results. Instead, I fix the prompts.
That loop IS the system.
- The code itself is disposable
- The instructions and agents are the real assets
- Taking time to plan and work out prompts is everything
Just like you’d spend time training a new team member rather than constantly fixing their work—invest in the foundation, not the output.
I’m Working on Several Improvements
Better Coordination
- Right now I kick things off manually
- Want automated workflow management
- Need dependency handling between agents
Document Alignment
- Changing how we capture information
- Moving to higher-level abstractions that agents can use more effectively
- Focusing on use cases rather than low-level implementation details
Complex Workflow Expansion
- Current setup handles pretty complex workflows
- Want to push further with:
- More agents
- Better coordination
- More sophisticated interactions
The Bottom Line
The AI handles the boring stuff, and I get to pretend I’m a visionary architect instead of someone who used to spend three hours debugging missing semicolons.
The tools will keep getting better, but the main idea stays the same: fix the instructions, not the output.
When you stop manually fixing what the AI spits out and start teaching it to do better work from the beginning, something cool happens.
You’re not just coding faster—you’re building a system that actually learns and improves itself.
That’s when things get interesting.
Advanced Prompts & Shortcuts
If you want to dive deeper, here’s a collection of more specialized prompts and Claude Code shortcuts that can handle complex workflows…
🎯 Planning & Architecture Prompts
# Initial Planning (with o3 or Claude)
"Create a comprehensive plan for [TASK]. Include:
- Implementation requirements
- Potential edge cases I might have missed
- Dependencies across codebases
- Architecture considerations
- Security implications
Output to [TASK]-plan.md"
# Plan Verification
"Read the plan in [TASK]-plan.md, verify it makes sense, and break it into actionable tasks with clear success criteria"
# Architecture Review
"Review the current architecture and suggest improvements for scalability, maintainability, and performance"
# Dependency Analysis
"Analyze all dependencies in this project and identify:
- Outdated packages
- Security vulnerabilities
- Unused dependencies
- Optimization opportunities"
🚀 Execution & Implementation Prompts
# Step-by-Step Implementation
"Execute the plan in [TASK]-plan.md step by step. Write a commit for each completed task. Use descriptive commit messages following conventional commits format"
# Safe Mode Execution
"Implement [FEATURE] but show me each change before applying it. Explain your reasoning for each modification"
# Test-Driven Development
"First write comprehensive tests for [FEATURE], then implement the feature to pass all tests"
# Refactoring with Safety
"Refactor [CODE/FILE] to improve [METRIC]. Create a backup first, then show me the diff before applying changes"
✅ Verification & Quality Assurance Prompts
# Dual Verification
"Verify the implementation against both [TASK]-plan.md AND the original requirements. Flag any discrepancies or missing features"
# Code Review
"Review this code for:
- Logic errors
- Security vulnerabilities
- Performance issues
- Code style violations
- Unnecessary backwards compatibility
Remove any lint ignore flags unless absolutely necessary"
# Test Coverage Analysis
"Analyze test coverage and write additional tests for uncovered edge cases"
# Performance Audit
"Profile this code and identify performance bottlenecks. Suggest optimizations with benchmarks"
🛠️ Development Workflow Prompts
# Project Onboarding
"Analyze this codebase and create a comprehensive CLAUDE.md file with:
- Project overview
- Architecture patterns
- Key commands
- Development workflow
- Coding conventions"
# Documentation Generation
"Generate comprehensive documentation for [MODULE/API] including:
- Usage examples
- API reference
- Common patterns
- Troubleshooting guide"
# Migration Planning
"Create a migration plan from [OLD_TECH] to [NEW_TECH] with:
- Step-by-step process
- Rollback strategy
- Risk assessment
- Testing checklist"
# Debugging Assistant
"Debug this issue: [DESCRIPTION]. Use console.log strategically, check error boundaries, and trace the execution flow"
🔧 Advanced Claude Code CLI Commands
# Session Management
claude --resume # Resume last session
/clear # Reset context
/summarize # Summarize long conversations - <mark>custom command</mark>
# Planning Mode
Shift+Tab (twice) # Enter planning mode
/plan # Alternative planning command - <mark>custom command</mark>
# File Operations
/add **/*.js # Add all JS files to context
/remove tests/ # Remove test files from context - <mark>custom command</mark>
/lint # Run linter on modified files - <mark>custom command</mark>
# Git Integration
claude diff # Show current changes
claude commit # Smart commit with message
create a pr # Create pull request
# <mark>custom command</mark>s
/init # Bootstrap CLAUDE.md
/memory # Edit project memory
/hooks # Configure automation hooks
/permissions # Manage Claude's permissions
# Model Selection
/model opus # Switch to Opus for complex tasks
/model sonnet # Switch to Sonnet for speed
# Productivity Shortcuts
Ctrl+C # Stop current operation
Escape # Interrupt and redirect
Double Escape # Jump back in history
Tab # Autocomplete filenames
📋 Custom Slash Commands (Add to .claude/commands/)
analyze-pr.md
Please analyze PR #$ARGUMENTS:
1. Use 'gh pr view' to get details
2. Review code changes for bugs and security issues
3. Check test coverage
4. Verify against coding standards
5. Provide concise, actionable feedback
quick-fix.md
Fix the issue: $ARGUMENTS
1. Identify the root cause
2. Implement minimal necessary changes
3. Add tests if needed
4. Commit with descriptive message
5. Verify fix doesn't break other features
optimize-performance.md
Optimize performance for: $ARGUMENTS
1. Profile current performance
2. Identify bottlenecks
3. Implement optimizations
4. Benchmark improvements
5. Document changes
🎨 UI/Frontend Development Prompts
# Component from Mockup
"[Image] Build this UI component with:
- Semantic HTML
- Tailwind CSS (core utilities only)
- Proper accessibility
- Responsive design
- Smooth animations"
# Design System Integration
"Create a component that follows our design system. Reference CLAUDE.md for tokens and patterns"
# Interactive Prototyping
"Build an interactive prototype with:
- Realistic data
- Error states
- Loading states
- Edge case handling"
🔐 Security & Best Practices Prompts
# Security Audit
"Perform a security audit focusing on:
- Input validation
- Authentication/authorization
- Data sanitization
- SQL injection prevention
- XSS protection"
# Code Quality Check
"Review code quality:
- SOLID principles adherence
- DRY violations
- Code complexity
- Naming conventions
- Documentation completeness"
⚡ Automation & Hooks Configuration
# Pre-commit Hook
{
"matcher": "Bash(git commit:*)",
"preToolUse": "npm run lint && npm test"
}
# Post-edit Hook
{
"matcher": "Edit",
"postToolUse": "npm run format"
}
# Documentation Update Hook
{
"matcher": "Edit(*.js|*.ts)",
"postToolUse": "npm run docs:generate"
}
🚨 Emergency & Recovery Commands
# Quick Rollback
"Revert all changes since last commit"
# Safe Mode Flag
claude --dangerously-skip-permissions
# Snapshot Creation
"Create a git stash with descriptive name before making major changes"
# Conflict Resolution
"Resolve merge conflicts favoring [ours/theirs] and explain each decision"
💡 Pro Tips & Keyboard Shortcuts
- Stage Early, Stage Often: Use
git add
frequently to create restore points - Multiple Worktrees:
git worktree add ../project-feature-x
for parallel Claude sessions - Headless Mode:
claude -p "prompt" --output-format json
for CI/CD integration - Auto-accept Mode:
Shift+Tab
to toggle autonomous work - Context Preservation: Use
/summarize
before hitting token limits - custom command also use/compact
- Custom Themes:
/config
to match your terminal theme
🎮 Advanced Workflow Combinations
# Full Feature Development Flow
"1. Plan feature X → 2. Implement with TDD → 3. Review & optimize → 4. Document → 5. Create PR"
# Rapid Prototyping
"Build MVP for [IDEA] with basic functionality. Focus on core features, skip polish"
# Bug Hunt Mode
"Systematically find and fix all bugs in [MODULE]. Create test for each bug found"
# Refactoring Sprint
"Refactor for [GOAL] while maintaining 100% test coverage. Show before/after metrics"