Skip to content
Go back

Claude Code Automation: Building AI Agent Systems That Write, Test, and Improve Code Automatically

Published:  at  06:21 PM

How I stopped hand-fixing generated code and started building a system of AI agents that improve themselves

Table of contents

Open Table of contents

The Core: Fix Prompts, Not Results

After months of wrestling with AI-generated code, I figured out something that changed everything for me.

The breakthrough: When something breaks, don’t patch the results. Don’t argue with Claude. Instead:

So the next run works correctly from the start.

Basically, the problem is always me, not the machine. But then again, I’m the one who taught the machine to be problematic in the first place. So really, it’s me all the way down.

AI could truly replace engineers. So I built a system where AI agents manage other AI agents. Now I’m basically a middle manager for robots. (I hope they don’t find out.)

The Minecraft Analogy

Think of it like building redstone contraptions in Minecraft. Your goal isn’t to manually mine each block—it’s to build automated systems that mine, sort, and craft items for you.

The same principle applies to AI development: build a system of agents that can:

Why This Matters

Results are disposable, but plans and prompts compound.

When you debug at the source, those improvements scale across every future task. It transforms agents from simple code printers into self-improving colleagues.

Key benefits:


My Current Setup: The System in Action

I keep several Claude Code windows open, each running on its own git worktree. Here’s the team:

The Players:

AI could truly replace engineers and many jobs for that matter. So now I spend my days refereeing fights between different AI models about code quality. o3 is the harsh teacher who fails everything, while Claude just wants everyone to be happy.

The beauty: Any issues found get fed back into the plan template, not fixed inline. The system literally improves itself.

Managing Expectations

I often feel like it’s going slow because we’ve all been trained to expect things at the click of a button.

But it’s more like talking to a coworker and working out what’s going to happen, then reviewing the plan you both came up with.

The reality:


1. Planning Phase: Setting the Foundation

I start by giving Claude Code a high-level task, which calls over to o3 to generate a comprehensive plan.

Why o3 excels at planning:

Plan Output Structure

The output is a <task>-plan.md file containing:

Real example: When I asked for a form-block.vue processing update, o3’s plan included:


2. Execution Phase: Turning Plans into Code

Step 1: Sonnet 4 reads the plan, verifies it makes sense, and breaks it into actionable tasks

Step 2: Claude Code executes the plan using either Sonnet 3.7 or Sonnet 4

Critical Safety Feature

Critical instruction: Claude writes commits for each task step

Benefits:


3. Verification Phase: Catching Issues Before They Compound

Once code is generated, I run it through two verification layers:

Why Dual Verification Works

This dual verification catches different types of issues:

Claude’s tendencies:

o3’s approach:

I can laugh because this is me all over the place (facepalm).

AI could truly replace engineers. So I built a good cop/bad cop system where Claude writes friendly code and o3 tears it apart. I’m basically running a digital therapy session.

Key principle: Any issue either model finds gets baked back into the plan template—not fixed inline. They gotta fix their own messups.


Getting Started: Essential Prompts

Here are simple prompts that I have added keyboard shortcuts to make these steps faster:

Planning:

Create a comprehensive plan for [TASK]. Include implementation requirements, potential edge cases, and dependencies. Output to [TASK]-plan.md

Execution:

Execute the plan in [TASK]-plan.md step by step. Write a commit for each completed task.

Verification:

Verify the implementation against both [TASK]-plan.md AND the original requirements. Flag any discrepancies.

Helpful Resource

One thing I’ve found useful is Prompts.chat. This has worked magic, and I thank everyone who contributed to it.

What it does:


Scaling Beyond Basic Workflows

Specialized Agents for Complex Tasks

I’ve started encoding more complex workflows with specific agents behind MCPs (AI agents that use the Model Context Protocol to connect with tools and external systems):

Style Enforcement Agent:

Library Integration Agent:

How it works: Claude might write code that works but uses basic patterns like manual loops. Your Library Integration Agent scans that code and says “Hey, we have a better way to do this” and replaces it with your team’s established libraries and patterns. It’s like having a senior developer review junior code and say “Don’t reinvent the wheel - use our existing tools.”

API Integration Composer:

Instead of you juggling multiple tasks and making sure everything works together, this agent acts as the conductor orchestrating all the moving pieces to deliver a complete, tested, documented integration. It’s basically automating the entire “add new API integration” workflow that normally takes multiple people or multiple days.


The Power of Composition

By building a collection of small, focused agents, I can compose them into more complex workflows.

The benefits:


The Secret Sauce: Parallel Development with Git Worktrees

Here’s what makes this approach powerful: it’s essentially free to fire off a dozen attempts at a task, so I do exactly that.

Git worktrees let me open multiple Claude Code instances side by side, each building different features simultaneously.

How Parallel Development Works

Each window runs independently with its own:

Example setup:

I still merge manually, but I’m no longer babysitting a single agent.

AI could truly replace engineers. So I built a system where multiple AIs work on different features at the same time. It’s like having a team of interns who never get tired, never complain, and never ask for raises or complain about work-life balance.

The Core Philosophy

The key insight: I resist the urge to fix results. Instead, I fix the prompts.

That loop IS the system.

Just like you’d spend time training a new team member rather than constantly fixing their work—invest in the foundation, not the output.


I’m Working on Several Improvements

Better Coordination

Document Alignment

Complex Workflow Expansion


The Bottom Line

The AI handles the boring stuff, and I get to pretend I’m a visionary architect instead of someone who used to spend three hours debugging missing semicolons.

The tools will keep getting better, but the main idea stays the same: fix the instructions, not the output.

When you stop manually fixing what the AI spits out and start teaching it to do better work from the beginning, something cool happens.

You’re not just coding faster—you’re building a system that actually learns and improves itself.

That’s when things get interesting.


Advanced Prompts & Shortcuts

If you want to dive deeper, here’s a collection of more specialized prompts and Claude Code shortcuts that can handle complex workflows…

🎯 Planning & Architecture Prompts

# Initial Planning (with o3 or Claude)
"Create a comprehensive plan for [TASK]. Include:
- Implementation requirements
- Potential edge cases I might have missed
- Dependencies across codebases
- Architecture considerations
- Security implications
Output to [TASK]-plan.md"

# Plan Verification
"Read the plan in [TASK]-plan.md, verify it makes sense, and break it into actionable tasks with clear success criteria"

# Architecture Review
"Review the current architecture and suggest improvements for scalability, maintainability, and performance"

# Dependency Analysis
"Analyze all dependencies in this project and identify:
- Outdated packages
- Security vulnerabilities
- Unused dependencies
- Optimization opportunities"

🚀 Execution & Implementation Prompts

# Step-by-Step Implementation
"Execute the plan in [TASK]-plan.md step by step. Write a commit for each completed task. Use descriptive commit messages following conventional commits format"

# Safe Mode Execution
"Implement [FEATURE] but show me each change before applying it. Explain your reasoning for each modification"

# Test-Driven Development
"First write comprehensive tests for [FEATURE], then implement the feature to pass all tests"

# Refactoring with Safety
"Refactor [CODE/FILE] to improve [METRIC]. Create a backup first, then show me the diff before applying changes"

✅ Verification & Quality Assurance Prompts

# Dual Verification
"Verify the implementation against both [TASK]-plan.md AND the original requirements. Flag any discrepancies or missing features"

# Code Review
"Review this code for:
- Logic errors
- Security vulnerabilities
- Performance issues
- Code style violations
- Unnecessary backwards compatibility
Remove any lint ignore flags unless absolutely necessary"

# Test Coverage Analysis
"Analyze test coverage and write additional tests for uncovered edge cases"

# Performance Audit
"Profile this code and identify performance bottlenecks. Suggest optimizations with benchmarks"

🛠️ Development Workflow Prompts

# Project Onboarding
"Analyze this codebase and create a comprehensive CLAUDE.md file with:
- Project overview
- Architecture patterns
- Key commands
- Development workflow
- Coding conventions"

# Documentation Generation
"Generate comprehensive documentation for [MODULE/API] including:
- Usage examples
- API reference
- Common patterns
- Troubleshooting guide"

# Migration Planning
"Create a migration plan from [OLD_TECH] to [NEW_TECH] with:
- Step-by-step process
- Rollback strategy
- Risk assessment
- Testing checklist"

# Debugging Assistant
"Debug this issue: [DESCRIPTION]. Use console.log strategically, check error boundaries, and trace the execution flow"

🔧 Advanced Claude Code CLI Commands

# Session Management
claude --resume                        # Resume last session
/clear                                # Reset context
/summarize                            # Summarize long conversations - <mark>custom command</mark>

# Planning Mode
Shift+Tab (twice)                     # Enter planning mode
/plan                                 # Alternative planning command - <mark>custom command</mark>

# File Operations
/add **/*.js                          # Add all JS files to context
/remove tests/                        # Remove test files from context - <mark>custom command</mark>
/lint                                 # Run linter on modified files - <mark>custom command</mark>

# Git Integration
claude diff                                 # Show current changes
claude commit                               # Smart commit with message
create a pr                                 # Create pull request

# <mark>custom command</mark>s
/init                                 # Bootstrap CLAUDE.md
/memory                              # Edit project memory
/hooks                               # Configure automation hooks
/permissions                         # Manage Claude's permissions

# Model Selection
/model opus                          # Switch to Opus for complex tasks
/model sonnet                        # Switch to Sonnet for speed

# Productivity Shortcuts
Ctrl+C                               # Stop current operation
Escape                               # Interrupt and redirect
Double Escape                        # Jump back in history
Tab                                  # Autocomplete filenames

📋 Custom Slash Commands (Add to .claude/commands/)

analyze-pr.md

Please analyze PR #$ARGUMENTS:
1. Use 'gh pr view' to get details
2. Review code changes for bugs and security issues
3. Check test coverage
4. Verify against coding standards
5. Provide concise, actionable feedback

quick-fix.md

Fix the issue: $ARGUMENTS
1. Identify the root cause
2. Implement minimal necessary changes
3. Add tests if needed
4. Commit with descriptive message
5. Verify fix doesn't break other features

optimize-performance.md

Optimize performance for: $ARGUMENTS
1. Profile current performance
2. Identify bottlenecks
3. Implement optimizations
4. Benchmark improvements
5. Document changes

🎨 UI/Frontend Development Prompts

# Component from Mockup
"[Image] Build this UI component with:
- Semantic HTML
- Tailwind CSS (core utilities only)
- Proper accessibility
- Responsive design
- Smooth animations"

# Design System Integration
"Create a component that follows our design system. Reference CLAUDE.md for tokens and patterns"

# Interactive Prototyping
"Build an interactive prototype with:
- Realistic data
- Error states
- Loading states
- Edge case handling"

🔐 Security & Best Practices Prompts

# Security Audit
"Perform a security audit focusing on:
- Input validation
- Authentication/authorization
- Data sanitization
- SQL injection prevention
- XSS protection"

# Code Quality Check
"Review code quality:
- SOLID principles adherence
- DRY violations
- Code complexity
- Naming conventions
- Documentation completeness"

⚡ Automation & Hooks Configuration

# Pre-commit Hook
{
  "matcher": "Bash(git commit:*)",
  "preToolUse": "npm run lint && npm test"
}

# Post-edit Hook
{
  "matcher": "Edit",
  "postToolUse": "npm run format"
}

# Documentation Update Hook
{
  "matcher": "Edit(*.js|*.ts)",
  "postToolUse": "npm run docs:generate"
}

🚨 Emergency & Recovery Commands

# Quick Rollback
"Revert all changes since last commit"

# Safe Mode Flag
claude --dangerously-skip-permissions

# Snapshot Creation
"Create a git stash with descriptive name before making major changes"

# Conflict Resolution
"Resolve merge conflicts favoring [ours/theirs] and explain each decision"

💡 Pro Tips & Keyboard Shortcuts

  1. Stage Early, Stage Often: Use git add frequently to create restore points
  2. Multiple Worktrees: git worktree add ../project-feature-x for parallel Claude sessions
  3. Headless Mode: claude -p "prompt" --output-format json for CI/CD integration
  4. Auto-accept Mode: Shift+Tab to toggle autonomous work
  5. Context Preservation: Use /summarize before hitting token limits - custom command also use /compact
  6. Custom Themes: /config to match your terminal theme

🎮 Advanced Workflow Combinations

# Full Feature Development Flow
"1. Plan feature X → 2. Implement with TDD → 3. Review & optimize → 4. Document → 5. Create PR"

# Rapid Prototyping
"Build MVP for [IDEA] with basic functionality. Focus on core features, skip polish"

# Bug Hunt Mode
"Systematically find and fix all bugs in [MODULE]. Create test for each bug found"

# Refactoring Sprint
"Refactor for [GOAL] while maintaining 100% test coverage. Show before/after metrics"


Next Post
Will AI Replace Software Engineers?