docs: circuitforge-hooks design — gitleaks-based secret + PII scanning

Centralised pre-commit/pre-push hook repo design covering the token leak
root causes: unactivated hooksPath and insufficient regex coverage.
This commit is contained in:
pyr0ball 2026-03-07 12:23:54 -08:00
parent a620c87310
commit 3441924929

View file

@ -0,0 +1,161 @@
# CircuitForge Hooks — Secret & PII Scanning Design
**Date:** 2026-03-07
**Scope:** All CircuitForge repos (Peregrine first; others on public release)
**Status:** Approved, ready for implementation
## Problem
A live Forgejo API token was committed in `docs/plans/2026-03-03-feedback-button-plan.md`
and required emergency history scrubbing via `git-filter-repo`. Root causes:
1. `core.hooksPath` was never configured — the existing `.githooks/pre-commit` ran on zero commits
2. The token format (`FORGEJO_API_TOKEN=<hex>`) matched none of the hook's three regexes
3. No pre-push safety net existed
## Solution
Centralised hook repo (`circuitforge-hooks`) shared across all products.
Each repo activates it with one command. The heavy lifting is delegated to
`gitleaks` — an actively-maintained binary with 150+ built-in secret patterns,
native Forgejo/Gitea token detection, and a clean allowlist system.
## Repository Structure
```
/Library/Development/CircuitForge/circuitforge-hooks/
├── hooks/
│ ├── pre-commit # gitleaks --staged scan (fast, every commit)
│ ├── commit-msg # conventional commits enforcement
│ └── pre-push # gitleaks full-branch scan (safety net)
├── gitleaks.toml # shared base config
├── install.sh # wires core.hooksPath in the calling repo
├── tests/
│ └── test_hooks.sh # migrated + extended from Peregrine
└── README.md
```
Forgejo remote: `git.opensourcesolarpunk.com/pyr0ball/circuitforge-hooks`
## Hook Behaviour
### pre-commit
- Runs `gitleaks protect --staged` — scans only the staged diff
- Sub-second on typical commits
- Blocks commit and prints redacted match on failure
- Merges per-repo `.gitleaks.toml` allowlist if present
### pre-push
- Runs `gitleaks git` — scans full branch history not yet on remote
- Catches anything committed with `--no-verify` or before hooks were wired
- Same config resolution as pre-commit
### commit-msg
- Enforces conventional commits format (`type(scope): subject`)
- Migrated unchanged from `peregrine/.githooks/commit-msg`
## gitleaks Config
### Shared base (`circuitforge-hooks/gitleaks.toml`)
```toml
title = "CircuitForge secret + PII scanner"
[extend]
useDefault = true # inherit all 150+ built-in rules
[[rules]]
id = "cf-generic-env-token"
description = "Generic KEY=<token> in env-style assignment"
regex = '''(?i)(token|secret|key|password|passwd|pwd|api_key)\s*[=:]\s*['\"]?[A-Za-z0-9\-_]{20,}['\"]?'''
[rules.allowlist]
regexes = ['api_key:\s*ollama', 'api_key:\s*any']
[[rules]]
id = "cf-phone-number"
description = "US phone number in source or config"
regex = '''\b(\+1[\s\-.]?)?\(?\d{3}\)?[\s\-.]?\d{3}[\s\-.]?\d{4}\b'''
[rules.allowlist]
regexes = ['555-\d{4}', '555\.\d{4}', '5550', '1234567890', '0000000000']
[[rules]]
id = "cf-personal-email"
description = "Personal email address in source/config (not .example files)"
regex = '''[a-zA-Z0-9._%+\-]+@(gmail|yahoo|icloud|hotmail|outlook|proton)\.(com|me)'''
[rules.allowlist]
paths = ['.*\.example$', '.*test.*', '.*docs/.*']
[allowlist]
description = "CircuitForge global allowlist"
paths = [
'.*\.example$',
'docs/reference/.*',
'gitleaks\.toml$',
]
regexes = [
'sk-abcdefghijklmnopqrstuvwxyz',
'your-forgejo-api-token-here',
]
```
### Per-repo override (e.g. `peregrine/.gitleaks.toml`)
```toml
[extend]
path = "/Library/Development/CircuitForge/circuitforge-hooks/gitleaks.toml"
[allowlist]
regexes = [
'\d{10}\.html', # Craigslist listing IDs (10-digit, look like phone numbers)
]
```
## Activation Per Repo
Each repo's `setup.sh` or `manage.sh` calls:
```bash
bash /Library/Development/CircuitForge/circuitforge-hooks/install.sh
```
`install.sh` does exactly one thing:
```bash
git config core.hooksPath /Library/Development/CircuitForge/circuitforge-hooks/hooks
```
For Heimdall live deploys (`/devl/<repo>/`), the same line goes in the deploy
script / post-receive hook.
## Migration from Peregrine
- `peregrine/.githooks/pre-commit` → replaced by gitleaks wrapper
- `peregrine/.githooks/commit-msg` → copied verbatim to hooks repo
- `peregrine/tests/test_hooks.sh` → migrated and extended in hooks repo
- `peregrine/.githooks/` directory → kept temporarily, then removed after cutover
## Rollout Order
1. `circuitforge-hooks` repo — create, implement, test
2. `peregrine` — activate (highest priority, already public)
3. `circuitforge-license` (heimdall) — activate before any public release
4. All subsequent repos — activate as part of their public-release checklist
## Testing
`tests/test_hooks.sh` covers:
- Staged file with live-format token → blocked
- Staged file with phone number → blocked
- Staged file with personal email in source → blocked
- `.example` file with placeholders → allowed
- Craigslist URL with 10-digit ID → allowed (Peregrine allowlist)
- Valid conventional commit message → accepted
- Non-conventional commit message → rejected
## What This Does Not Cover
- Scanning existing history on new repos (run `gitleaks git` manually before
making any repo public — add to the public-release checklist)
- CI/server-side enforcement (future: Forgejo Actions job on push to main)
- Binary files or encrypted secrets at rest