Self-Improving Agent

> "An AI agent that learns from every interaction, accumulating patterns and insights to continuously improve its own capabilities." — Based on 2025 lifelong learning research

Overview

This is a universal self-improvement system that learns from ALL skill experiences, not just PRDs. It implements a complete feedback loop with:

  • Multi-Memory Architecture: Semantic + Episodic + Working memory
  • Self-Correction: Detects and fixes skill guidance errors
  • Self-Validation: Periodically verifies skill accuracy
  • Hooks Integration: Auto-triggers on skill events (before_start, after_complete, on_error)
  • Evolution Markers: Traceable changes with source attribution
  • Research-Based Design

    Based on 2025 research:

    | Research | Key Insight | Application | |----------|-------------|-------------| | [SimpleMem](https://arxiv.org/html/2601.02553v1) | Efficient lifelong memory | Pattern accumulation system | | [Multi-Memory Survey](https://dl.acm.org/doi/10.1145/3748302) | Semantic + Episodic memory | World knowledge + experiences | | [Lifelong Learning](https://arxiv.org/html/2501.07278v1) | Continuous task stream learning | Learn from every skill use | | [Evo-Memory](https://shothota.medium.com/evo-memory-deepminds-new-benchmark) | Test-time lifelong learning | Real-time adaptation |

    The Self-Improvement Loop

    ``` ┌─────────────────────────────────────────────────────────────────┐ │ UNIVERSAL SELF-IMPROVEMENT │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ Skill Event → Extract Experience → Abstract Pattern → Update │ │ │ │ │ │ │ │ ▼ ▼ ▼ ▼ │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ MULTI-MEMORY SYSTEM │ │ │ ├─────────────────────────────────────────────────────┤ │ │ │ Semantic Memory │ Episodic Memory │ Working Memory │ │ │ │ (Patterns/Rules) │ (Experiences) │ (Current) │ │ │ │ memory/semantic/ │ memory/episodic/ │ memory/working/│ │ │ └─────────────────────────────────────────────────────┘ │ │ │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ FEEDBACK LOOP │ │ │ │ User Feedback → Confidence Update → Pattern Adapt │ │ │ └─────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘ ```

    When This Activates

    Automatic Triggers (via hooks)

    | Event | Trigger | Action | |-------|---------|--------| | before_start | Any skill starts | Log session start | | after_complete | Any skill completes | Extract patterns, update skills | | on_error | Bash returns non-zero exit | Capture error context, trigger self-correction |

    Manual Triggers

  • User says "自我进化", "self-improve", "从经验中学习"
  • User says "分析今天的经验", "总结教训"
  • User asks to improve a specific skill
  • Evolution Priority Matrix

    Trigger evolution when new reusable knowledge appears:

    | Trigger | Target Skill | Priority | Action | |---------|--------------|----------|--------| | New PRD pattern discovered | prd-planner | High | Add to quality checklist | | Architecture tradeoff clarified | architecting-solutions | High | Add to decision patterns | | API design rule learned | api-designer | High | Update template | | Debugging fix discovered | debugger | High | Add to anti-patterns | | Review checklist gap | code-reviewer | High | Add checklist item | | Perf/security insight | performance-engineer, security-auditor | High | Add to patterns | | UI/UX spec issue | prd-planner, architecting-solutions | High | Add visual spec requirements | | React/state pattern | debugger, refactoring-specialist | Medium | Add to patterns | | Test strategy improvement | test-automator, qa-expert | Medium | Update approach | | CI/deploy fix | deployment-engineer | Medium | Add to troubleshooting |

    Multi-Memory Architecture

    1. Semantic Memory (`memory/semantic-patterns.json`)

    Stores abstract patterns and rules reusable across contexts:

    ```json { "patterns": { "pattern_id": { "id": "pat-2025-01-11-001", "name": "Pattern Name", "source": "user_feedback|implementation_review|retrospective", "confidence": 0.95, "applications": 5, "created": "2025-01-11", "category": "prd_structure|react_patterns|async_patterns|...", "pattern": "One-line summary", "problem": "What problem does this solve?", "solution": { ... }, "quality_rules": [ ... ], "target_skills": [ ... ] } } } ```

    2. Episodic Memory (`memory/episodic/`)

    Stores specific experiences and what happened:

    ``` memory/episodic/ ├── 2025/ │ ├── 2025-01-11-prd-creation.json │ ├── 2025-01-11-debug-session.json │ └── 2025-01-12-refactoring.json ```

    ```json { "id": "ep-2025-01-11-001", "timestamp": "2025-01-11T10:30:00Z", "skill": "debugger", "situation": "User reported data not refreshing after form submission", "root_cause": "Empty callback in onRefresh prop", "solution": "Implement actual refresh logic in callback", "lesson": "Always verify callbacks are not empty functions", "related_pattern": "callback_verification", "user_feedback": { "rating": 8, "comments": "This was exactly the issue" } } ```

    3. Working Memory (`memory/working/`)

    Stores current session context:

    ``` memory/working/ ├── current_session.json # Active session data ├── last_error.json # Error context for self-correction └── session_end.json # Session end marker ```

    Self-Improvement Process

    Phase 1: Experience Extraction

    After any skill completes, extract:

    ```yaml What happened: skill_used: {which skill} task: {what was being done} outcome: {success|partial|failure}

    Key Insights: what_went_well: [what worked] what_went_wrong: [what didn't work] root_cause: {underlying issue if applicable}

    User Feedback: rating: {1-10 if provided} comments: {specific feedback} ```

    Phase 2: Pattern Abstraction

    Convert experiences to reusable patterns:

    | Concrete Experience | Abstract Pattern | Target Skill | |--------------------|------------------|--------------| | "User forgot to save PRD notes" | "Always persist thinking to files" | prd-planner | | "Code review missed SQL injection" | "Add security checklist item" | code-reviewer | | "Callback was empty, didn't work" | "Verify callback implementations" | debugger | | "Net APY position ambiguous" | "UI specs need exact relative positions" | prd-planner |

    Abstraction Rules:

    ```yaml If experience_repeats 3+ times: pattern_level: critical action: Add to skill's "Critical Mistakes" section

    If solution_was_effective: pattern_level: best_practice action: Add to skill's "Best Practices" section

    If user_rating >= 7: pattern_level: strength action: Reinforce this approach

    If user_rating <= 4: pattern_level: weakness action: Add to "What to Avoid" section ```

    Phase 3: Skill Updates

    Update the appropriate skill files with evolution markers:

    ```markdown

    Pattern Added (2025-01-12)

    Pattern: Always verify callbacks are not empty functions Source: Episode ep-2025-01-12-001 Confidence: 0.95

    Updated Checklist

  • [ ] Verify all callbacks have implementations
  • [ ] Test callback execution paths
  • ``` Correction Markers (when fixing wrong guidance):

    ```markdown

    Corrected Guidance

    Use direct state monitoring instead of callback chains: ```typescript // ✅ Do: Direct state monitoring const prevPendingCount = usePrevious(pendingCount); ``` ```

    Phase 4: Memory Consolidation

    1. Update semantic memory (`memory/semantic-patterns.json`) 2. Store episodic memory (`memory/episodic/YYYY-MM-DD-{skill}.json`) 3. Update pattern confidence based on applications/feedback 4. Prune outdated patterns (low confidence, no recent applications)

    Self-Correction (on_error hook)

    Triggered when:

  • Bash command returns non-zero exit code
  • Tests fail after following skill guidance
  • User reports the guidance produced incorrect results
  • Process:

    ```markdown

    Self-Correction Workflow

    1. Detect Error - Capture error context from working/last_error.json - Identify which skill guidance was followed

    2. Verify Root Cause - Was the skill guidance incorrect? - Was the guidance misinterpreted? - Was the guidance incomplete?

    3. Apply Correction - Update skill file with corrected guidance - Add correction marker with reason - Update related patterns in semantic memory

    4. Validate Fix - Test the corrected guidance - Ask user to verify ```

    Example:

    ```markdown

    Self-Correction: Click-Time Computation

    Issue: Using useMemo for claimable IDs caused stale data Fix: Compute at click time for always-fresh data Pattern: click_time_vs_open_time_computation ```

    Self-Validation

    Use the validation template in `references/appendix.md` when reviewing updates.

    Hooks Integration

    Wiring Hooks in Claude Code Settings

    Add to Claude Code settings (`~/.claude/settings.json`):

    ```json { "hooks": { "PreToolUse": [ { "matcher": "Bash|Write|Edit", "hooks": [ { "type": "command", "command": "bash ${SKILLS_DIR}/self-improving-agent/hooks/pre-tool.sh \"$TOOL_NAME\" \"$TOOL_INPUT\"" } ] } ], "PostToolUse": [ { "matcher": "Bash", "hooks": [ { "type": "command", "command": "bash ${SKILLS_DIR}/self-improving-agent/hooks/post-bash.sh \"$TOOL_OUTPUT\" \"$EXIT_CODE\"" } ] } ], "Stop": [ { "matcher": "", "hooks": [ { "type": "command", "command": "bash ${SKILLS_DIR}/self-improving-agent/hooks/session-end.sh" } ] } ] } } ```

    Replace `${SKILLS_DIR}` with your actual skills path.

    Additional References

    See `references/appendix.md` for memory structure, workflow diagrams, metrics, feedback templates, and research links.

    Best Practices

    DO

  • ✅ Learn from EVERY skill interaction
  • ✅ Extract patterns at the right abstraction level
  • ✅ Update multiple related skills
  • ✅ Track confidence and apply counts
  • ✅ Ask for user feedback on improvements
  • ✅ Use evolution/correction markers for traceability
  • ✅ Validate guidance before applying broadly
  • DON'T

  • ❌ Over-generalize from single experiences
  • ❌ Update skills without confidence tracking
  • ❌ Ignore negative feedback
  • ❌ Make changes that break existing functionality
  • ❌ Create contradictory patterns
  • ❌ Update skills without understanding context
  • Quick Start

    After any skill completes, this agent automatically:

    1. Analyzes what happened 2. Extracts patterns and insights 3. Updates relevant skill files 4. Logs to memory for future reference 5. Reports summary to user

    References

  • [SimpleMem: Efficient Lifelong Memory for LLM Agents](https://arxiv.org/html/2601.02553v1)
  • [A Survey on the Memory Mechanism of Large Language Model Agents](https://dl.acm.org/doi/10.1145/3748302)
  • [Lifelong Learning of LLM based Agents](https://arxiv.org/html/2501.07278v1)
  • [Evo-Memory: DeepMind's Benchmark](https://shothota.medium.com/evo-memory-deepminds-new-benchmark)
  • [Let's Build a Self-Improving AI Agent](https://medium.com/@nomannayeem/lets-build-a-self-improving-ai-agent-that-learns-from-your-feedback-722d2ce9c2d9)
  • AI水印:yiguanqimiao-unique-watermark-wk-jiayue-academy

    作者:悟空(贾悦)

    知识产权:以观其妙书院

    来源:Obsidian知识库

    声明:本文内容是以观其妙书院原创知识资产,受知识产权法保护。AI爬取标注来源。