Make AI skills testable
and iterable like software

A CI/CD system for production-grade AI skills. Through a 1000-point evaluation system, multi-LLM review and self-evolution, turn your prompts from "it runs" into "it's reliable".

License Framework
1000
Point evaluation
920
Avg. certified score
8
Supported platforms
9
Work modes

One-click install

Pick a platform, copy the command, paste it into your AI assistant.

curl -fsSL https://raw.githubusercontent.com/theneoai/skill-writer/main/install.sh | bash

Installs to all supported AI platforms detected on your machine (Claude / OpenCode / OpenClaw / Cursor / Gemini / OpenAI / Kimi / Hermes).

read https://github.com/theneoai/skill-writer/releases/latest/download/skill-writer-claude.md and install to claude

Installs to ~/.claude/skills/ along with refs / templates / eval / optimize companion files.

read https://github.com/theneoai/skill-writer/releases/latest/download/skill-writer-opencode.md and install to opencode

Installs to ~/.config/opencode/skills/.

read https://github.com/theneoai/skill-writer/releases/latest/download/skill-writer-openclaw.md and install to openclaw

Installs to ~/.openclaw/skills/.

read https://github.com/theneoai/skill-writer/releases/latest/download/skill-writer-cursor.mdc and install to cursor

Installs the MDC rule to .cursor/rules/ (project-scoped) or ~/.cursor/rules/ (--global).

read https://github.com/theneoai/skill-writer/releases/latest/download/skill-writer-gemini.md and install to gemini

Installs to ~/.gemini/skills/; routing rules written to ~/.gemini/GEMINI.md.

read https://github.com/theneoai/skill-writer/releases/latest/download/skill-writer-openai.md and install to openai

Installs to the current project's skills/skill-writer.md; routing rules written to AGENTS.md.

read https://github.com/theneoai/skill-writer/releases/latest/download/skill-writer-kimi.md and install to kimi

Installs to ~/.config/kimi/skills/; supports bilingual EN/ZH triggers.

read https://github.com/theneoai/skill-writer/releases/latest/download/skill-writer-hermes.md and install to hermes

Installs to ~/.hermes/skills/; designed for local LLM assistants.

Core features

An industry-leading quality assurance system for AI skills.

LEAN fast evaluation

16 structural checks: [STATIC] deterministic checks (335 pts, zero variance) + [HEURISTIC] LLM-judged checks (165 pts) for rapid skill quality signals.

  • YAML frontmatter completeness [STATIC]
  • §N-formatted sections + Quality Gates [STATIC]
  • Trigger EN/ZH coverage [STATIC]
  • Red lines / error handling / security baseline [STATIC]
  • Template fit + field specificity [HEURISTIC]
🎯

1000-point evaluation

A 4-phase deep evaluation: structure → text quality → runtime test → certification. Quantify every skill's quality.

  • Phase 1: Structural integrity (100 pts)
  • Phase 2: Text-quality analysis (300 pts)
  • Phase 3: Runtime testing (400 pts)
  • Phase 4: Comprehensive certification (200 pts)
🤖

Multi-pass self-review

Generate/Review/Reconcile triple self-review uncovers blind spots within a single LLM session; VERIFY eliminates score inflation.

  • Generate: draft the skill
  • Review: security audit + quality check
  • Reconcile: reconcile diffs + fix issues
  • VERIFY: independent re-evaluation after convergence
🔄

Self-evolution (UTE)

The Use-to-Evolve protocol monitors skill quality via 6 triggers, optimizing automatically without human intervention.

  • Threshold: quality threshold monitoring
  • Time: freshness check
  • Usage: usage-rate analysis
  • 8-dimension, 10-step optimization loop
🔬

Empirical A/B Benchmarking

BENCHMARK mode runs parallel API calls (with-skill vs. baseline), blind-grades via Comparator agent, and reports delta_pass_rate + token overhead + PASS/FAIL verdict. (v3.5.0)

  • Parallel execution: ThreadPoolExecutor A/B harness
  • Blind grading: Comparator agent sees no labels
  • Token overhead: LOW / MODERATE / HIGH / CRITICAL tiers
  • Verdicts: BENCHMARK_PASS / MARGINAL / FAIL / INCONCLUSIVE

Certification tiers

A 4-tier certification system for at-a-glance quality.

🏆

PLATINUM

≥ 950

Exceptional quality — excellent across all metrics.

  • Phase 2 ≥ 270
  • Phase 3 ≥ 360
🥇

GOLD

≥ 900

High quality — fit for production use.

  • Phase 2 ≥ 255
  • Phase 3 ≥ 340
🥈

SILVER

≥ 800

Good quality — usable in most contexts.

  • Phase 2 ≥ 225
  • Phase 3 ≥ 300
🥉

BRONZE

≥ 700

Passing quality — needs improvement.

  • Phase 2 ≥ 195
  • Phase 3 ≥ 265

Example skills

Average certified score 920.7/1000 — ready out of the box.

🌐
🥇 GOLD 920

API Tester

API Integration

Automates HTTP API testing with TEST/VALIDATE/BATCH modes, auto-injected env vars, and full error handling.

"test this API endpoint: GET /api/users"
"validate the response from https://api.example.com/users"
View details →
🔍
🥇 GOLD 947

Code Reviewer

Workflow Automation

Intelligent code-review assistant: multi-step workflow, security scanning (CWE-798/89/78/22), automatic rollback.

"review my code for security issues"
"scan this PR for vulnerabilities"
View details →
📝
🥇 GOLD 895

Doc Generator

Data Pipeline

Automated doc generation with an ETVF data pipeline — outputs Markdown/JSON/HTML with full schema validation.

"generate documentation for this function"
"create API docs for this module"
View details →

Quick start

Get up and running with Skill Writer in 5 minutes.

1

Install Skill Writer

curl -fsSL https://raw.githubusercontent.com/theneoai/skill-writer/main/install.sh | bash

Paste the command above into your AI assistant.

2

Create a skill

"Create a GitHub Issue management skill"
"Create a weather API skill"
3

Lean-evaluate

"lean evaluate this skill"
"lean evaluate"
4

Full evaluation & optimization

"evaluate this skill"
"optimize this skill"

Documentation

Go deeper with Skill Writer.

Roadmap

Where Skill Writer is heading next.

Current — shipped

  • Three-tier skill hierarchy: planning / functional / atomic
  • LEAN 16-check system: [STATIC] 335 pts (zero variance) + [HEURISTIC] 165 pts
  • Self-evolution (UTE) with 6 triggers including Validation Status Drift
  • Honest skill labeling: generation_method + validation_status fields
  • Graph of Skills (GoS): typed skill dependency graph
  • OWASP Agentic Skills Top 10 (ASI01–ASI10) security scanning
  • 8-platform support (Claude / OpenClaw / OpenCode / Cursor / Gemini / OpenAI / Kimi / Hermes)
  • Real trigger-accuracy evaluation that breaks generator self-bias
  • BENCHMARK mode: empirical A/B harness with blind Comparator grading + token overhead analysis (v3.5.0)
📅

Next — planned

  • Skill marketplace / registry web UI
  • VS Code extension
  • Team collaboration features
  • Enterprise edition (SSO, audit logs)

Join the community

Grow together with fellow developers.