Eval: ponytail decision-ladder hooks + skills for CF dev workflow #2

Open
opened 2026-07-02 11:20:43 -07:00 by pyr0ball · 0 comments
Owner

Source

https://github.com/DietrichGebert/ponytail — MIT license

What it is

Ponytail is a "minimal code generation" constraint system for AI coding assistants
(Claude Code, Codex, Copilot CLI, Cursor, Windsurf, and 12+ others).

It implements a 7-rung decision ladder that runs before any code is written:

  1. Does this need to exist? → Skip it (YAGNI)
  2. Already in this codebase? → Reuse
  3. Stdlib does it? → Use stdlib
  4. Native platform feature? → Use native
  5. Installed dependency? → Use existing package
  6. One line? → Write one line
  7. Otherwise → Write the minimum that works

"Lazy about the solution, never about reading."

Six skills included

  • ponytail-review — finds over-engineering in current diffs, returns deletion list
  • ponytail-audit — scans whole repo for over-engineering
  • ponytail-debt — converts deferred ponytail: shortcuts into a ledger
  • ponytail-gain — displays benchmark impact scorecard
  • ponytail-help — quick command reference
  • ponytail [lite|full|ultra|off] — mode switching

Two Node.js lifecycle hooks

Always-on activation and mode switching without explicit user invocation per turn.
Requires node on PATH (Heimdall has this).

Benchmarks (12 tasks, FastAPI + React repo, Haiku 4.5, n=4)

Metric Ponytail Baseline
Lines of code -54%
Tokens -22%
Cost -20%
Time -27%
Safety 100% 100%

CF relevance

CF's existing CLAUDE.md already states: "Don't add features, refactor, or introduce
abstractions beyond what the task requires." Ponytail enforces this mechanically at
the AI layer rather than relying on instruction following.

CF uses Claude Code heavily across 20+ products. A -20% token cost reduction
compounds significantly at that scale.

The hooks pattern fits directly into circuitforge-hooks. The skills could be added
to the CF skills library alongside existing ones.

License

MIT — clean for any CF use including internal tooling and distribution in
circuitforge-hooks.

  1. Install ponytail skills in a Claude Code test session on one active product (kiwi
    or waxwing are good candidates — active but bounded scope)
  2. Run a few feature tickets with ponytail full enabled
  3. Compare token usage and code output against recent sessions without it
  4. If hooks work well, integrate the two Node.js hooks into circuitforge-hooks
  5. If the ponytail-review skill is useful, add to CF skills library

Note

The decision ladder aligns with CF's CLAUDE.md but doesn't replace it — ponytail
handles the "write minimum code" dimension; CF's rules cover privacy, safety,
accessibility, and tier architecture. They compose cleanly.

## Source https://github.com/DietrichGebert/ponytail — MIT license ## What it is Ponytail is a "minimal code generation" constraint system for AI coding assistants (Claude Code, Codex, Copilot CLI, Cursor, Windsurf, and 12+ others). It implements a 7-rung decision ladder that runs before any code is written: 1. Does this need to exist? → Skip it (YAGNI) 2. Already in this codebase? → Reuse 3. Stdlib does it? → Use stdlib 4. Native platform feature? → Use native 5. Installed dependency? → Use existing package 6. One line? → Write one line 7. Otherwise → Write the minimum that works "Lazy about the solution, never about reading." ## Six skills included - `ponytail-review` — finds over-engineering in current diffs, returns deletion list - `ponytail-audit` — scans whole repo for over-engineering - `ponytail-debt` — converts deferred `ponytail:` shortcuts into a ledger - `ponytail-gain` — displays benchmark impact scorecard - `ponytail-help` — quick command reference - `ponytail [lite|full|ultra|off]` — mode switching ## Two Node.js lifecycle hooks Always-on activation and mode switching without explicit user invocation per turn. Requires `node` on PATH (Heimdall has this). ## Benchmarks (12 tasks, FastAPI + React repo, Haiku 4.5, n=4) | Metric | Ponytail | Baseline | |--------|----------|----------| | Lines of code | **-54%** | — | | Tokens | **-22%** | — | | Cost | **-20%** | — | | Time | **-27%** | — | | Safety | **100%** | 100% | ## CF relevance CF's existing CLAUDE.md already states: "Don't add features, refactor, or introduce abstractions beyond what the task requires." Ponytail enforces this mechanically at the AI layer rather than relying on instruction following. CF uses Claude Code heavily across 20+ products. A -20% token cost reduction compounds significantly at that scale. The hooks pattern fits directly into circuitforge-hooks. The skills could be added to the CF skills library alongside existing ones. ## License MIT — clean for any CF use including internal tooling and distribution in circuitforge-hooks. ## Recommended eval steps 1. Install ponytail skills in a Claude Code test session on one active product (kiwi or waxwing are good candidates — active but bounded scope) 2. Run a few feature tickets with `ponytail full` enabled 3. Compare token usage and code output against recent sessions without it 4. If hooks work well, integrate the two Node.js hooks into circuitforge-hooks 5. If the `ponytail-review` skill is useful, add to CF skills library ## Note The decision ladder aligns with CF's CLAUDE.md but doesn't replace it — ponytail handles the "write minimum code" dimension; CF's rules cover privacy, safety, accessibility, and tier architecture. They compose cleanly.
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Circuit-Forge/circuitforge-hooks#2
No description provided.