Claude Code Automation: Building AI Agent Systems That Write, Test, and Improve Code Automatically

How I stopped hand-fixing generated code and started building a system of AI agents that improve themselves

Open Table of contents

The Core: Fix Prompts, Not Results
- The Minecraft Analogy
- Why This Matters
My Current Setup: The System in Action
- Managing Expectations
1. Planning Phase: Setting the Foundation
- Plan Output Structure
2. Execution Phase: Turning Plans into Code
- Critical Safety Feature
3. Verification Phase: Catching Issues Before They Compound
- Why Dual Verification Works
Getting Started: Essential Prompts
- Helpful Resource
Scaling Beyond Basic Workflows
- Specialized Agents for Complex Tasks
The Power of Composition
The Secret Sauce: Parallel Development with Git Worktrees
- How Parallel Development Works
- The Core Philosophy
I’m Working on Several Improvements
- Better Coordination
- Document Alignment
- Complex Workflow Expansion
The Bottom Line
Advanced Prompts & Shortcuts
🎯 Planning & Architecture Prompts
🚀 Execution & Implementation Prompts
✅ Verification & Quality Assurance Prompts
🛠️ Development Workflow Prompts
🔧 Advanced Claude Code CLI Commands
📋 Custom Slash Commands (Add to .claude/commands/)
- analyze-pr.md
- quick-fix.md
- optimize-performance.md
🎨 UI/Frontend Development Prompts
🔐 Security & Best Practices Prompts
⚡ Automation & Hooks Configuration
🚨 Emergency & Recovery Commands
💡 Pro Tips & Keyboard Shortcuts
🎮 Advanced Workflow Combinations

The Core: Fix Prompts, Not Results

After months of wrestling with AI-generated code, I figured out something that changed everything for me.

The breakthrough: When something breaks, don’t patch the results. Don’t argue with Claude. Instead:

Fix the plan
Fix the prompts
Fix the agent mix

So the next run works correctly from the start.

Basically, the problem is always me, not the machine. But then again, I’m the one who taught the machine to be problematic in the first place. So really, it’s me all the way down.

AI could truly replace engineers. So I built a system where AI agents manage other AI agents. Now I’m basically a middle manager for robots. (I hope they don’t find out.)

The Minecraft Analogy

Think of it like building redstone contraptions in Minecraft. Your goal isn’t to manually mine each block—it’s to build automated systems that mine, sort, and craft items for you.

The same principle applies to AI development: build a system of agents that can:

Produce code
Verify it
Improve themselves over time

Why This Matters

Results are disposable, but plans and prompts compound.

When you debug at the source, those improvements scale across every future task. It transforms agents from simple code printers into self-improving colleagues.

Key benefits:

No more hitting the same roadblock repeatedly
Faster scaling through systematic improvements
Experimental approach to find what works best

My Current Setup: The System in Action

I keep several Claude Code windows open, each running on its own git worktree. Here’s the team:

The Players:

o3 and Sonnet 4: Create detailed implementation plans
Sonnet 3.7 or Sonnet 4: Execute the plans and write code
o3: Checks results against original requirements (uncompromising and rude even)
Custom MCPs: Handle specialized tasks like style enforcement and library integration

AI could truly replace engineers and many jobs for that matter. So now I spend my days refereeing fights between different AI models about code quality. o3 is the harsh teacher who fails everything, while Claude just wants everyone to be happy.

The beauty: Any issues found get fed back into the plan template, not fixed inline. The system literally improves itself.

Managing Expectations

I often feel like it’s going slow because we’ve all been trained to expect things at the click of a button.

But it’s more like talking to a coworker and working out what’s going to happen, then reviewing the plan you both came up with.

The reality:

Still takes time
Way faster than me planning and coding everything myself
Catches issues before anything gets written into the codebase

1. Planning Phase: Setting the Foundation

I start by giving Claude Code a high-level task, which calls over to o3 to generate a comprehensive plan.

Why o3 excels at planning:

Asks clarifying questions to nail down job requirements
Easiest to work with so far
Thinks of things I miss
Still need to guide it through the codebase and pull in focused files

Plan Output Structure

The output is a <task>-plan.md file containing:

My original request
Detailed implementation plan
Potential edge cases and considerations
Things it thinks I’ve missed

Real example: When I asked for a form-block.vue processing update, o3’s plan included:

GraphQL updates
Advanced Custom Fields updates
Learned from previous requests that there are several codebases to review

2. Execution Phase: Turning Plans into Code

Step 1: Sonnet 4 reads the plan, verifies it makes sense, and breaks it into actionable tasks

Step 2: Claude Code executes the plan using either Sonnet 3.7 or Sonnet 4

I use Sonnet 4 when I need a great thinking model to ensure it comes out right
I feel this post will be outdated soon as new models come out, but I digress—a reason to update the post, I guess

Critical Safety Feature

Critical instruction: Claude writes commits for each task step

Benefits:

Either Claude or I can revert to any previous state if something goes sideways
Lifesaver feature—always have a way to roll back

3. Verification Phase: Catching Issues Before They Compound

Once code is generated, I run it through two verification layers:

Sonnet 4: Verifies code against the original plan
o3: Verifies against both the plan AND the original ask

Why Dual Verification Works

This dual verification catches different types of issues:

Claude’s tendencies:

Wants to please
Might keep unnecessary backwards compatibility code
Loves adding lint ignore flags

o3’s approach:

Ruthlessly calls out problems
Demands removal of unnecessary code
Flags problematic lint ignores

I can laugh because this is me all over the place (facepalm).

AI could truly replace engineers. So I built a good cop/bad cop system where Claude writes friendly code and o3 tears it apart. I’m basically running a digital therapy session.

Key principle: Any issue either model finds gets baked back into the plan template—not fixed inline. They gotta fix their own messups.

Getting Started: Essential Prompts

Here are simple prompts that I have added keyboard shortcuts to make these steps faster:

Planning:

Create a comprehensive plan for [TASK]. Include implementation requirements, potential edge cases, and dependencies. Output to [TASK]-plan.md

Execution:

Execute the plan in [TASK]-plan.md step by step. Write a commit for each completed task.

Verification:

Verify the implementation against both [TASK]-plan.md AND the original requirements. Flag any discrepancies.

Helpful Resource

One thing I’ve found useful is Prompts.chat. This has worked magic, and I thank everyone who contributed to it.

What it does:

Helps you get started on any project
Talk to the AI and assign it a role
Not just for programming

Scaling Beyond Basic Workflows

Specialized Agents for Complex Tasks

I’ve started encoding more complex workflows with specific agents behind MCPs (AI agents that use the Model Context Protocol to connect with tools and external systems):

Style Enforcement Agent:

Sweeps all generated code
Applies local style rules
Catches issues once Claude gets into the lint/test/debug cycle

Library Integration Agent:

Reviews generated code
Replaces generic patterns (like retries and Thread/sleep)
Uses internal retry library

How it works: Claude might write code that works but uses basic patterns like manual loops. Your Library Integration Agent scans that code and says “Hey, we have a better way to do this” and replaces it with your team’s established libraries and patterns. It’s like having a senior developer review junior code and say “Don’t reinvent the wheel - use our existing tools.”

API Integration Composer:

Takes API documentation and internal business cases
Orchestrates multiple agents to build:
Integrations
Tests
Documentation

Instead of you juggling multiple tasks and making sure everything works together, this agent acts as the conductor orchestrating all the moving pieces to deliver a complete, tested, documented integration. It’s basically automating the entire “add new API integration” workflow that normally takes multiple people or multiple days.

The Power of Composition

By building a collection of small, focused agents, I can compose them into more complex workflows.

The benefits:

Each agent handles one specific task exceptionally well
Together they can tackle substantial features without manual intervention
I’ve basically created a team to help me be better—or at least that’s the idea
I’m all about trying to move faster and understand the tools I use

The Secret Sauce: Parallel Development with Git Worktrees

Here’s what makes this approach powerful: it’s essentially free to fire off a dozen attempts at a task, so I do exactly that.

Git worktrees let me open multiple Claude Code instances side by side, each building different features simultaneously.

How Parallel Development Works

Each window runs independently with its own:

Conversation history
Context
Feature focus

Example setup:

Window 1: Finishing user authentication
Window 2: Building payment integration
Window 3: Fixing bugs in the dashboard

I still merge manually, but I’m no longer babysitting a single agent.

AI could truly replace engineers. So I built a system where multiple AIs work on different features at the same time. It’s like having a team of interns who never get tired, never complain, and never ask for raises or complain about work-life balance.

The Core Philosophy

The key insight: I resist the urge to fix results. Instead, I fix the prompts.

That loop IS the system.

The code itself is disposable
The instructions and agents are the real assets
Taking time to plan and work out prompts is everything

Just like you’d spend time training a new team member rather than constantly fixing their work—invest in the foundation, not the output.

I’m Working on Several Improvements

Better Coordination

Right now I kick things off manually
Want automated workflow management
Need dependency handling between agents

Document Alignment

Changing how we capture information
Moving to higher-level abstractions that agents can use more effectively
Focusing on use cases rather than low-level implementation details

Complex Workflow Expansion

Current setup handles pretty complex workflows
Want to push further with:
More agents
Better coordination
More sophisticated interactions

The Bottom Line

The AI handles the boring stuff, and I get to pretend I’m a visionary architect instead of someone who used to spend three hours debugging missing semicolons.

The tools will keep getting better, but the main idea stays the same: fix the instructions, not the output.

When you stop manually fixing what the AI spits out and start teaching it to do better work from the beginning, something cool happens.

You’re not just coding faster—you’re building a system that actually learns and improves itself.

That’s when things get interesting.

Advanced Prompts & Shortcuts

If you want to dive deeper, here’s a collection of more specialized prompts and Claude Code shortcuts that can handle complex workflows…

🎯 Planning & Architecture Prompts

# Initial Planning (with o3 or Claude)
"Create a comprehensive plan for [TASK]. Include:
- Implementation requirements
- Potential edge cases I might have missed
- Dependencies across codebases
- Architecture considerations
- Security implications
Output to [TASK]-plan.md"

# Plan Verification
"Read the plan in [TASK]-plan.md, verify it makes sense, and break it into actionable tasks with clear success criteria"

# Architecture Review
"Review the current architecture and suggest improvements for scalability, maintainability, and performance"

# Dependency Analysis
"Analyze all dependencies in this project and identify:
- Outdated packages
- Security vulnerabilities
- Unused dependencies
- Optimization opportunities"

🚀 Execution & Implementation Prompts

# Step-by-Step Implementation
"Execute the plan in [TASK]-plan.md step by step. Write a commit for each completed task. Use descriptive commit messages following conventional commits format"

# Safe Mode Execution
"Implement [FEATURE] but show me each change before applying it. Explain your reasoning for each modification"

# Test-Driven Development
"First write comprehensive tests for [FEATURE], then implement the feature to pass all tests"

# Refactoring with Safety
"Refactor [CODE/FILE] to improve [METRIC]. Create a backup first, then show me the diff before applying changes"

✅ Verification & Quality Assurance Prompts

# Dual Verification
"Verify the implementation against both [TASK]-plan.md AND the original requirements. Flag any discrepancies or missing features"

# Code Review
"Review this code for:
- Logic errors
- Security vulnerabilities
- Performance issues
- Code style violations
- Unnecessary backwards compatibility
Remove any lint ignore flags unless absolutely necessary"

# Test Coverage Analysis
"Analyze test coverage and write additional tests for uncovered edge cases"

# Performance Audit
"Profile this code and identify performance bottlenecks. Suggest optimizations with benchmarks"

🛠️ Development Workflow Prompts

# Project Onboarding
"Analyze this codebase and create a comprehensive CLAUDE.md file with:
- Project overview
- Architecture patterns
- Key commands
- Development workflow
- Coding conventions"

# Documentation Generation
"Generate comprehensive documentation for [MODULE/API] including:
- Usage examples
- API reference
- Common patterns
- Troubleshooting guide"

# Migration Planning
"Create a migration plan from [OLD_TECH] to [NEW_TECH] with:
- Step-by-step process
- Rollback strategy
- Risk assessment
- Testing checklist"

# Debugging Assistant
"Debug this issue: [DESCRIPTION]. Use console.log strategically, check error boundaries, and trace the execution flow"

🔧 Advanced Claude Code CLI Commands

# Session Management
claude --resume                        # Resume last session
/clear                                # Reset context
/summarize                            # Summarize long conversations - <mark>custom command</mark>

# Planning Mode
Shift+Tab (twice)                     # Enter planning mode
/plan                                 # Alternative planning command - <mark>custom command</mark>

# File Operations
/add **/*.js                          # Add all JS files to context
/remove tests/                        # Remove test files from context - <mark>custom command</mark>
/lint                                 # Run linter on modified files - <mark>custom command</mark>

# Git Integration
claude diff                                 # Show current changes
claude commit                               # Smart commit with message
create a pr                                 # Create pull request

# <mark>custom command</mark>s
/init                                 # Bootstrap CLAUDE.md
/memory                              # Edit project memory
/hooks                               # Configure automation hooks
/permissions                         # Manage Claude's permissions

# Model Selection
/model opus                          # Switch to Opus for complex tasks
/model sonnet                        # Switch to Sonnet for speed

# Productivity Shortcuts
Ctrl+C                               # Stop current operation
Escape                               # Interrupt and redirect
Double Escape                        # Jump back in history
Tab                                  # Autocomplete filenames

📋 Custom Slash Commands (Add to .claude/commands/)

analyze-pr.md

Please analyze PR #$ARGUMENTS:
1. Use 'gh pr view' to get details
2. Review code changes for bugs and security issues
3. Check test coverage
4. Verify against coding standards
5. Provide concise, actionable feedback

quick-fix.md

Fix the issue: $ARGUMENTS
1. Identify the root cause
2. Implement minimal necessary changes
3. Add tests if needed
4. Commit with descriptive message
5. Verify fix doesn't break other features

optimize-performance.md

Optimize performance for: $ARGUMENTS
1. Profile current performance
2. Identify bottlenecks
3. Implement optimizations
4. Benchmark improvements
5. Document changes

🎨 UI/Frontend Development Prompts

# Component from Mockup
"[Image] Build this UI component with:
- Semantic HTML
- Tailwind CSS (core utilities only)
- Proper accessibility
- Responsive design
- Smooth animations"

# Design System Integration
"Create a component that follows our design system. Reference CLAUDE.md for tokens and patterns"

# Interactive Prototyping
"Build an interactive prototype with:
- Realistic data
- Error states
- Loading states
- Edge case handling"

🔐 Security & Best Practices Prompts

# Security Audit
"Perform a security audit focusing on:
- Input validation
- Authentication/authorization
- Data sanitization
- SQL injection prevention
- XSS protection"

# Code Quality Check
"Review code quality:
- SOLID principles adherence
- DRY violations
- Code complexity
- Naming conventions
- Documentation completeness"

⚡ Automation & Hooks Configuration

# Pre-commit Hook
{
  "matcher": "Bash(git commit:*)",
  "preToolUse": "npm run lint && npm test"
}

# Post-edit Hook
{
  "matcher": "Edit",
  "postToolUse": "npm run format"
}

# Documentation Update Hook
{
  "matcher": "Edit(*.js|*.ts)",
  "postToolUse": "npm run docs:generate"
}

🚨 Emergency & Recovery Commands

# Quick Rollback
"Revert all changes since last commit"

# Safe Mode Flag
claude --dangerously-skip-permissions

# Snapshot Creation
"Create a git stash with descriptive name before making major changes"

# Conflict Resolution
"Resolve merge conflicts favoring [ours/theirs] and explain each decision"

💡 Pro Tips & Keyboard Shortcuts

Stage Early, Stage Often: Use git add frequently to create restore points
Multiple Worktrees: git worktree add ../project-feature-x for parallel Claude sessions
Headless Mode: claude -p "prompt" --output-format json for CI/CD integration
Auto-accept Mode: Shift+Tab to toggle autonomous work
Context Preservation: Use /summarize before hitting token limits - custom command also use /compact
Custom Themes: /config to match your terminal theme

🎮 Advanced Workflow Combinations

# Full Feature Development Flow
"1. Plan feature X → 2. Implement with TDD → 3. Review & optimize → 4. Document → 5. Create PR"

# Rapid Prototyping
"Build MVP for [IDEA] with basic functionality. Focus on core features, skip polish"

# Bug Hunt Mode
"Systematically find and fix all bugs in [MODULE]. Create test for each bug found"

# Refactoring Sprint
"Refactor for [GOAL] while maintaining 100% test coverage. Show before/after metrics"