Eval: ponytail decision-ladder hooks + skills for CF dev workflow #2

New issue

Open

opened 2026-07-02 11:20:43 -07:00 by pyr0ball · 0 comments

pyr0ball commented

2026-07-02 11:20:43 -07:00

Owner

Source

https://github.com/DietrichGebert/ponytail — MIT license

What it is

Ponytail is a "minimal code generation" constraint system for AI coding assistants
(Claude Code, Codex, Copilot CLI, Cursor, Windsurf, and 12+ others).

It implements a 7-rung decision ladder that runs before any code is written:

Does this need to exist? → Skip it (YAGNI)
Already in this codebase? → Reuse
Stdlib does it? → Use stdlib
Native platform feature? → Use native
Installed dependency? → Use existing package
One line? → Write one line
Otherwise → Write the minimum that works

"Lazy about the solution, never about reading."

Six skills included

ponytail-review — finds over-engineering in current diffs, returns deletion list
ponytail-audit — scans whole repo for over-engineering
ponytail-debt — converts deferred ponytail: shortcuts into a ledger
ponytail-gain — displays benchmark impact scorecard
ponytail-help — quick command reference
ponytail [lite|full|ultra|off] — mode switching

Two Node.js lifecycle hooks

Always-on activation and mode switching without explicit user invocation per turn.
Requires node on PATH (Heimdall has this).

Benchmarks (12 tasks, FastAPI + React repo, Haiku 4.5, n=4)

Metric	Ponytail	Baseline
Lines of code	-54%	—
Tokens	-22%	—
Cost	-20%	—
Time	-27%	—
Safety	100%	100%

CF relevance

CF's existing CLAUDE.md already states: "Don't add features, refactor, or introduce
abstractions beyond what the task requires." Ponytail enforces this mechanically at
the AI layer rather than relying on instruction following.

CF uses Claude Code heavily across 20+ products. A -20% token cost reduction
compounds significantly at that scale.

The hooks pattern fits directly into circuitforge-hooks. The skills could be added
to the CF skills library alongside existing ones.

License

MIT — clean for any CF use including internal tooling and distribution in
circuitforge-hooks.

Recommended eval steps

Install ponytail skills in a Claude Code test session on one active product (kiwi
or waxwing are good candidates — active but bounded scope)
Run a few feature tickets with ponytail full enabled
Compare token usage and code output against recent sessions without it
If hooks work well, integrate the two Node.js hooks into circuitforge-hooks
If the ponytail-review skill is useful, add to CF skills library

Note

The decision ladder aligns with CF's CLAUDE.md but doesn't replace it — ponytail
handles the "write minimum code" dimension; CF's rules cover privacy, safety,
accessibility, and tier architecture. They compose cleanly.

## Source https://github.com/DietrichGebert/ponytail — MIT license ## What it is Ponytail is a "minimal code generation" constraint system for AI coding assistants (Claude Code, Codex, Copilot CLI, Cursor, Windsurf, and 12+ others). It implements a 7-rung decision ladder that runs before any code is written: 1. Does this need to exist? → Skip it (YAGNI) 2. Already in this codebase? → Reuse 3. Stdlib does it? → Use stdlib 4. Native platform feature? → Use native 5. Installed dependency? → Use existing package 6. One line? → Write one line 7. Otherwise → Write the minimum that works "Lazy about the solution, never about reading." ## Six skills included - `ponytail-review` — finds over-engineering in current diffs, returns deletion list - `ponytail-audit` — scans whole repo for over-engineering - `ponytail-debt` — converts deferred `ponytail:` shortcuts into a ledger - `ponytail-gain` — displays benchmark impact scorecard - `ponytail-help` — quick command reference - `ponytail [lite|full|ultra|off]` — mode switching ## Two Node.js lifecycle hooks Always-on activation and mode switching without explicit user invocation per turn. Requires `node` on PATH (Heimdall has this). ## Benchmarks (12 tasks, FastAPI + React repo, Haiku 4.5, n=4) | Metric | Ponytail | Baseline | |--------|----------|----------| | Lines of code | **-54%** | — | | Tokens | **-22%** | — | | Cost | **-20%** | — | | Time | **-27%** | — | | Safety | **100%** | 100% | ## CF relevance CF's existing CLAUDE.md already states: "Don't add features, refactor, or introduce abstractions beyond what the task requires." Ponytail enforces this mechanically at the AI layer rather than relying on instruction following. CF uses Claude Code heavily across 20+ products. A -20% token cost reduction compounds significantly at that scale. The hooks pattern fits directly into circuitforge-hooks. The skills could be added to the CF skills library alongside existing ones. ## License MIT — clean for any CF use including internal tooling and distribution in circuitforge-hooks. ## Recommended eval steps 1. Install ponytail skills in a Claude Code test session on one active product (kiwi or waxwing are good candidates — active but bounded scope) 2. Run a few feature tickets with `ponytail full` enabled 3. Compare token usage and code output against recent sessions without it 4. If hooks work well, integrate the two Node.js hooks into circuitforge-hooks 5. If the `ponytail-review` skill is useful, add to CF skills library ## Note The decision ladder aligns with CF's CLAUDE.md but doesn't replace it — ponytail handles the "write minimum code" dimension; CF's rules cover privacy, safety, accessibility, and tier architecture. They compose cleanly.

No labels

No milestone

No project

No assignees

1 participant

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Circuit-Forge/circuitforge-hooks#2

No description provided.