darwin-skill

Darwin Skill 2.0 is an autonomous skill optimizer that evaluates and improves SKILL.md files using a 9-dimension rubric based on Microsoft Research SkillLens and SkillOpt. It employs a hill-climbing approach with git version control, independent judge agents for blind evaluation, and human-in-the-loop checkpoints to ensure quality improvements.

8.3K
Installs
5
Use cases
9/10
Quality

Is darwin-skill safe to install?

Review the source first

Review the source first: our audit of darwin-skill's source files found 8 shell commands, 6 external URLs, file reads and writes (high risk). Every command and URL listed appears verbatim in the skill's source. The skill executes shell commands, performs git operations, and reads/writes local files to optimize and manage skill documentation.

How we audit skills: our security review methodology.

Who is this skill for?

Developers and AI agent users who need to maintain, optimize, and validate the quality of their agent skills.

What can you do with it?

  • Automatically optimize SKILL.md files for better performance.
  • Evaluate skill quality using a 9-dimension rubric.
  • Perform blind evaluations of skill improvements using independent judge agents.
  • Validate skill improvements through test prompts and automated regression testing.
  • Generate visual result cards for skill optimization reports.

How good is this skill?

Quality score: 9/10. The skill provides a comprehensive, well-documented framework for autonomous optimization with clear academic references and robust safety mechanisms like git version control and human-in-the-loop checkpoints.

What does the skill file contain?

SKILL.md
# Darwin Skill 2.0

> **v2.0 · 2026-05-28** — 吸收 Microsoft Research SkillLens(arXiv 2605.23899)的 9 维评分药方 + SkillOpt(arXiv 2605.23904)的 validation-gated 验证机制 + human in the loop 三层守关。
>
> 借鉴 Karpathy autoresearch 的自主实验循环,对 skills 进行持续优化。
> 核心理念:**评估 → 改进 → 实测验证 → 人类确认 → 保留或回滚 → 生成成果卡片**
> GitHub: https://github.com/alchaincyf/darwin-skill

---

## 设计哲学

autoresearch 的精髓:
1. **单一可编辑资产** — 每次只改一个 SKILL.md
2. **双重评估** — 结构评分(静态分析)+ 效果验证(跑测试看输出)
3. **棘轮机制** — 只保留改进,自动回滚退步
4. **独立评分** — 评分用子agent,避免「自己改自己评」的偏差
5. **人在回路** — 每个skill优化完后暂停,用户确认再继续

与纯结构审查的区别:不只看 SKILL.md 写得规不规范,更看改完后**实际跑出来的效果是否更好**。
...

Frequently asked questions

How does the skill evaluate quality?

It uses a 9-dimension rubric covering structure, effectiveness, and meta-skills, combined with independent judge agents to perform blind evaluations and test prompt validation.

Can I use this to optimize a single skill?

Yes, you can specify a single skill for optimization, and the agent will perform the evaluation and improvement cycle for that specific target.

What happens if an optimization makes the skill worse?

The skill uses git version control to revert changes if the new score is not strictly higher than the previous score.

Does it require human input?

Yes, the skill includes human-in-the-loop checkpoints at critical stages, such as after baseline evaluation and after each individual skill optimization.

Data sourced from alchaincyf/darwin-skill on GitHub. Install counts from skills.sh. The summary and security audit are derived from the skill's source files: every command and URL listed appears verbatim in the source.

Related skills

find-skills

2.3M

Users seeking to extend agent capabilities with specialized tools, workflows, or knowledge packages

The find-skills skill enables agents to search for, discover, and install modular packages from the open agent skills ecosystem using the Skills CLI.

highclipackage-managervercel-labs

video-edit

338.7K

Users of the RunComfy CLI who need to automate video editing tasks like restyling, background swapping, or motion transfer

The video-edit skill acts as a router for the RunComfy CLI, selecting between Wan 2.7 Edit-Video, Kling 2.6 Pro Motion Control, and Lucy Edit Restyle models based on user intent to perform video transformations.

highvideo-editingai-agentagentspace-so

lark-doc

305.2K

Users who need to automate document management, content updates, and media handling within the Lark/Feishu ecosystem

The lark-doc skill enables agents to read, create, and edit Lark (Feishu) documents, including Docx and Wiki formats. It supports content manipulation via XML or Markdown, media handling, and resource management for document covers. The skill integrates with other Lark skills by identifying and delegating operations for embedded objects like spreadsheets, databases, and mind notes.

highLarkFeishularksuite

supabase-postgres-best-practices

264.3K

Developers and database administrators working with Postgres and Supabase

This skill provides a structured set of Postgres performance optimization rules and best practices maintained by Supabase. It guides developers in writing, reviewing, and optimizing SQL queries, schema designs, and database configurations.

lowpostgressupabasesupabase