Every week, a new “awesome Claude Code config” repo blows up on GitHub. Superpowers hit thousands of stars seemingly overnight. Your friend sends you a link: “you should really check this out.”

And you do. You read the README. You copy the CLAUDE.md. You paste someone else’s prompt library into your config directory. You feel productive for about an hour.

Then next week, there’s another repo. And another article. And you’re back to copy-pasting.

I did this for months before I asked the obvious question: why am I manually curating my AI agent’s configuration?

The Hamster Wheel of Manual Curation

The pattern is always the same. Someone discovers a clever prompt technique, packages it into a repo, posts it on Twitter or Hacker News, and it goes viral. The rest of us descend on the repo, fork it, cherry-pick the bits we like, and paste them into our own setups.

This is deeply ironic. We’re using AI agents that can write code, search the web, and reason about complex systems — but we’re manually maintaining the instructions that govern them. We’re the bottleneck in our own toolchain.

The config files that tell Claude Code how to behave — CLAUDE.md, skill definitions, memory files — are just text. And if there’s one thing LLMs are good at, it’s reading, evaluating, and writing text.

The Meta-Loop: Let the Agent Evolve Itself

Recently, I stopped manually chasing repos and built a set of self-evolving skills for Claude Code instead. The core idea is simple: make the agent responsible for discovering, evaluating, and adopting its own improvements.

The system runs on a weekly schedule, each day targeting a different surface:

DaySkillWhat It Does
Monagentic-radar5 parallel agents sweep GitHub Trending, HN, Reddit for patterns
Wedreflect-and-learn6-agent self-analysis loop with dual-channel scoring
Frivendor-docs-radarOfficial Anthropic/Google/OpenAI blog changes

You don’t need all of these to start. reflect-and-learn is the highest-leverage place to start — it reviews your past sessions, finds failure patterns, and proposes targeted fixes. The other skills layer on top when you’re ready.

Here’s what the most important pieces look like in practice:

1. Reflect and Learn — The Self-Improvement Core

This is the core loop in the system, and the part that has produced the most useful config changes so far. Every Wednesday, it runs a 6-agent reflection cycle:

  1. Extracts the past week’s conversation logs — what tasks I ran, what failed, where the agent burned tokens retrying
  2. Launches 6 analysis agents in parallel: failure pattern detection, efficiency analysis, user satisfaction signals, retrospective on past changes, meta-evolution (improving the improvement process), and tool & pattern co-evolution detection (inspired by Live-SWE-agent — detects repeated patterns in tool usage and promotes ephemeral scripts to persistent skills)
  3. Dual-channel scoring evaluates every proposed change on both process quality (was the reasoning sound?) and outcome quality (did it actually help?) — a technique borrowed from AgentEvolver
  4. Debates proposals with Gemini and Codex — three models arguing about whether a proposed config change is actually good
  5. Applies and maintains approved changes, then consolidates old memory entries and prunes rules that no longer fire — keeping the config lean instead of infinitely growing. (The consolidation and pruning steps are inspired by AgentEvolver’s experience stripping; the implementation is simpler than the paper’s.)
  6. Logs change lineage to a scoreboard file, designed toward AFlow-style tree-structured tracking so the system can backtrack when a modification underperforms

Safe changes (low-risk, high-confidence) get auto-adopted. Risky ones are flagged for my review.

2. Agentic Radar + Vendor Docs — Staying Current Without Trying

Instead of browsing GitHub Trending myself, Monday’s agentic-radar does it for me. Five parallel agents search different surfaces, evaluate what they find against a scoring rubric, and write stub reviews to a local registry. Friday’s vendor-docs-radar often catches new Claude Code features, Gemini CLI updates, and Codex changes before I would, proposing concrete config patches.

The self-evolution protocol in my CLAUDE.md encodes the guardrails:

## Self-Evolution Protocol
- Identify the pattern (not a one-off -- recurring across sessions)
- Draft the specific edit to CLAUDE.md or skill files
- Apply and commit to git
- Log it to CHANGELOG.md

Guardrails:
- Never remove an existing rule without user confirmation
- Never change execution style without user confirmation
- Methodology changes append, not overwrite

The Moment It Clicked

The other day, a friend sent me a link to a viral repo. “This is really popular right now, you should check it out.”

I pulled up my tools registry. The agentic-radar had already found it three days earlier. It had a stub review with a score of 6/10 — interesting patterns but nothing my config didn’t already cover. The radar had extracted one useful technique and proposed a diff to my CLAUDE.md, which the reflect-and-learn skill had already debated, approved, and committed.

Here’s what that actually looks like in practice — a lightly edited excerpt from my evolution history:

2026-03-20 reflect-and-learn: adopted "never-idle loop" (score 8.92, P0-AUTO)
  ← root cause: 5+ sessions where I had to ask "why did you stop working?"
  ← fix: procedural gate — agent must check TODO list and keep working after each subtask
  ← debate: Claude 8.9 · Gemini impact 8, risk 9 → auto-adopted (user explicitly wants this)

2026-03-20 reflect-and-learn: flagged "zero questions by default" (score 7.33, HIGH-IMPACT)
  ← root cause: agent kept asking permission instead of researching first
  ← debate: Claude 7.3 · Gemini impact 3, risk 10 → flagged for review (models diverged)
  ← outcome: I reviewed and partially adopted — research first, but still ask for destructive actions

“My Claude Code already tracked this one,” I told him.

That’s the difference. I didn’t need to find the repo. I didn’t need to read its README. I didn’t need to figure out which parts were useful. The agent did all of that while I was asleep.

This Isn’t New — It’s Self-Evolving Agents

What I’m describing has a growing body of research behind it. The agent’s knowledge isn’t frozen in model weights. It lives in editable text files — configs, skill definitions, memory indexes, evolution histories — that the agent itself can read, evaluate, and rewrite. Recent 2025 work frames this as self-evolving agents, and the field has moved fast:

  • AFlow (Zhang et al., ICLR 2025 Oral): MCTS over workflow variants with tree-structured experience tracking. My reflect-and-learn skill is inspired by this approach for tracking change lineage.

  • AgentEvolver (Zhai et al., 2025): Dual-channel scoring (process quality vs. outcome quality) plus experience stripping to prune unhelpful rules. I borrowed both ideas for my reflection cycle, though the implementation is simpler.

  • Live-SWE-agent (Xia et al., 2025): An agent that synthesizes new tools at runtime and evolves its own scaffold while solving real SWE-bench tasks, achieving strong results on SWE-bench Verified. This demonstrates that agents can meaningfully improve their own tooling, not just their prompts.

  • EvoAgentX/SEW (Wang et al., 2025): A 5-layer architecture for simultaneously evolving prompts, tools, and workflows. The insight that these three layers should co-evolve — not be optimized independently — matches what I’ve found empirically.

  • Self-Evolving Agents Survey (Fang et al., 2025): A comprehensive taxonomy of the field. It identifies memory consolidation and tool co-evolution as underexplored — exactly the areas where my system has needed the most manual iteration.

The difference between these papers and what I’m doing is scope. They operate within benchmarks with objective reward signals. I’m applying the same principles to an open-ended personal workflow — which means honest measurement is harder. The automated scouting and triage value has been clear and immediate. Whether the reflection loop genuinely compounds over months, or just feels like it does, is something I’m still figuring out. I track every change in git so I can backtrack when something doesn’t help, but I don’t have controlled A/B results yet.

Build the Meta-Loop, Not the Config

If you’re spending time manually curating your AI agent’s configuration, you’re solving the wrong problem. The config is just text. The agent can read text. Let it read its own config, evaluate whether it’s working, and propose improvements.

Here’s what I’d suggest:

  1. Start with a self-evolution protocol. Write down the rules for how your agent is allowed to modify its own config. Guardrails matter — you don’t want it deleting rules or making breaking changes without your review.

  2. Add automated scouting. A weekly scan of GitHub/HN/Reddit for relevant tools and patterns. Let the agent evaluate them, not you.

  3. Add reflection with dual-channel scoring. Review past sessions, find failure patterns, propose fixes — but score both the reasoning and the outcome. This is the part most likely to compound over time.

  4. Add multi-voice debate. Don’t let one model decide whether a change is good. Have Claude, Gemini, and Codex argue about it. Where they agree, auto-adopt. Where they disagree, flag for review.

  5. Track everything in a tree. Every proposed change, every adoption, every measured outcome — with lineage. This tree-structured history is what lets the system backtrack intelligently rather than random-walk.

The goal isn’t to have the best config today. It’s to have a system that stays current and proposes improvements without you chasing repos manually.

Where This Is Going

So far this post has been about a single agent improving a single operator’s workflow. The next conceptual step is shared memory and coordination across agents.

There’s an irony worth noting: my agent scans GitHub for human-made repos to learn from — but those repos are increasingly written by agents too. This very blog post was iterated on with help from Gemini and Codex. The skills it describes were debated by Claude, Gemini, and Codex arguing with each other. The line between “human-made config” and “agent-made config” is already blurry.

Andrej Karpathy is already building toward what comes next. His autoresearch project turns ML experimentation into an agent loop — edit, train, measure, keep or discard. But his stated next step is more radical: make it “asynchronously massively collaborative for agents (think: SETI@home style). The goal is not to emulate a single PhD student, it’s to emulate a research community of them.” His early-stage AgentHub experiments with replacing human-centric GitHub concepts — branches, PRs, code review — with primitives agents actually need: a DAG to push to and a message board to coordinate on. (These are still prototypes and sketches, not a mature ecosystem.)

The pieces are converging: agents that can modify their own configs (what this repo does), platforms designed for agent-to-agent collaboration (what Karpathy is exploring), and multi-model debate as a quality filter (what the research literature is formalizing). My agentic-radar already scans for patterns from the broader community. The step from scraping GitHub READMEs to querying a shared registry where agents publish what worked — with provenance, scores, and rollback data attached — is not as far as it sounds.

We’re not there yet. But the gap between “my agent improves itself” and “agents improve each other” is smaller than it looks.


I’m open-sourcing the skill definitions that power this system: github.com/PalmDr/claude-evolving-skills

One thing, though: don’t just manually git clone it. That would be doing exactly what this post argues against — copying someone else’s config and hoping it works. The better move is to point your agent at the repo and let it digest the patterns itself. Tell Claude Code to read the skill definitions, understand what they do, and build its own versions adapted to your workflow. The repo has an AGENTS.md specifically for this — your agent can read it and decide what to adopt.

Stop chasing repos. Build the loop.