docs: circuitforge-hooks design — gitleaks-based secret + PII scanning
Centralised pre-commit/pre-push hook repo design covering the token leak root causes: unactivated hooksPath and insufficient regex coverage.
This commit is contained in:
parent
a620c87310
commit
3441924929
1 changed files with 161 additions and 0 deletions
161
docs/plans/2026-03-07-circuitforge-hooks-design.md
Normal file
161
docs/plans/2026-03-07-circuitforge-hooks-design.md
Normal file
|
|
@ -0,0 +1,161 @@
|
|||
# CircuitForge Hooks — Secret & PII Scanning Design
|
||||
|
||||
**Date:** 2026-03-07
|
||||
**Scope:** All CircuitForge repos (Peregrine first; others on public release)
|
||||
**Status:** Approved, ready for implementation
|
||||
|
||||
## Problem
|
||||
|
||||
A live Forgejo API token was committed in `docs/plans/2026-03-03-feedback-button-plan.md`
|
||||
and required emergency history scrubbing via `git-filter-repo`. Root causes:
|
||||
|
||||
1. `core.hooksPath` was never configured — the existing `.githooks/pre-commit` ran on zero commits
|
||||
2. The token format (`FORGEJO_API_TOKEN=<hex>`) matched none of the hook's three regexes
|
||||
3. No pre-push safety net existed
|
||||
|
||||
## Solution
|
||||
|
||||
Centralised hook repo (`circuitforge-hooks`) shared across all products.
|
||||
Each repo activates it with one command. The heavy lifting is delegated to
|
||||
`gitleaks` — an actively-maintained binary with 150+ built-in secret patterns,
|
||||
native Forgejo/Gitea token detection, and a clean allowlist system.
|
||||
|
||||
## Repository Structure
|
||||
|
||||
```
|
||||
/Library/Development/CircuitForge/circuitforge-hooks/
|
||||
├── hooks/
|
||||
│ ├── pre-commit # gitleaks --staged scan (fast, every commit)
|
||||
│ ├── commit-msg # conventional commits enforcement
|
||||
│ └── pre-push # gitleaks full-branch scan (safety net)
|
||||
├── gitleaks.toml # shared base config
|
||||
├── install.sh # wires core.hooksPath in the calling repo
|
||||
├── tests/
|
||||
│ └── test_hooks.sh # migrated + extended from Peregrine
|
||||
└── README.md
|
||||
```
|
||||
|
||||
Forgejo remote: `git.opensourcesolarpunk.com/pyr0ball/circuitforge-hooks`
|
||||
|
||||
## Hook Behaviour
|
||||
|
||||
### pre-commit
|
||||
- Runs `gitleaks protect --staged` — scans only the staged diff
|
||||
- Sub-second on typical commits
|
||||
- Blocks commit and prints redacted match on failure
|
||||
- Merges per-repo `.gitleaks.toml` allowlist if present
|
||||
|
||||
### pre-push
|
||||
- Runs `gitleaks git` — scans full branch history not yet on remote
|
||||
- Catches anything committed with `--no-verify` or before hooks were wired
|
||||
- Same config resolution as pre-commit
|
||||
|
||||
### commit-msg
|
||||
- Enforces conventional commits format (`type(scope): subject`)
|
||||
- Migrated unchanged from `peregrine/.githooks/commit-msg`
|
||||
|
||||
## gitleaks Config
|
||||
|
||||
### Shared base (`circuitforge-hooks/gitleaks.toml`)
|
||||
|
||||
```toml
|
||||
title = "CircuitForge secret + PII scanner"
|
||||
|
||||
[extend]
|
||||
useDefault = true # inherit all 150+ built-in rules
|
||||
|
||||
[[rules]]
|
||||
id = "cf-generic-env-token"
|
||||
description = "Generic KEY=<token> in env-style assignment"
|
||||
regex = '''(?i)(token|secret|key|password|passwd|pwd|api_key)\s*[=:]\s*['\"]?[A-Za-z0-9\-_]{20,}['\"]?'''
|
||||
[rules.allowlist]
|
||||
regexes = ['api_key:\s*ollama', 'api_key:\s*any']
|
||||
|
||||
[[rules]]
|
||||
id = "cf-phone-number"
|
||||
description = "US phone number in source or config"
|
||||
regex = '''\b(\+1[\s\-.]?)?\(?\d{3}\)?[\s\-.]?\d{3}[\s\-.]?\d{4}\b'''
|
||||
[rules.allowlist]
|
||||
regexes = ['555-\d{4}', '555\.\d{4}', '5550', '1234567890', '0000000000']
|
||||
|
||||
[[rules]]
|
||||
id = "cf-personal-email"
|
||||
description = "Personal email address in source/config (not .example files)"
|
||||
regex = '''[a-zA-Z0-9._%+\-]+@(gmail|yahoo|icloud|hotmail|outlook|proton)\.(com|me)'''
|
||||
[rules.allowlist]
|
||||
paths = ['.*\.example$', '.*test.*', '.*docs/.*']
|
||||
|
||||
[allowlist]
|
||||
description = "CircuitForge global allowlist"
|
||||
paths = [
|
||||
'.*\.example$',
|
||||
'docs/reference/.*',
|
||||
'gitleaks\.toml$',
|
||||
]
|
||||
regexes = [
|
||||
'sk-abcdefghijklmnopqrstuvwxyz',
|
||||
'your-forgejo-api-token-here',
|
||||
]
|
||||
```
|
||||
|
||||
### Per-repo override (e.g. `peregrine/.gitleaks.toml`)
|
||||
|
||||
```toml
|
||||
[extend]
|
||||
path = "/Library/Development/CircuitForge/circuitforge-hooks/gitleaks.toml"
|
||||
|
||||
[allowlist]
|
||||
regexes = [
|
||||
'\d{10}\.html', # Craigslist listing IDs (10-digit, look like phone numbers)
|
||||
]
|
||||
```
|
||||
|
||||
## Activation Per Repo
|
||||
|
||||
Each repo's `setup.sh` or `manage.sh` calls:
|
||||
|
||||
```bash
|
||||
bash /Library/Development/CircuitForge/circuitforge-hooks/install.sh
|
||||
```
|
||||
|
||||
`install.sh` does exactly one thing:
|
||||
|
||||
```bash
|
||||
git config core.hooksPath /Library/Development/CircuitForge/circuitforge-hooks/hooks
|
||||
```
|
||||
|
||||
For Heimdall live deploys (`/devl/<repo>/`), the same line goes in the deploy
|
||||
script / post-receive hook.
|
||||
|
||||
## Migration from Peregrine
|
||||
|
||||
- `peregrine/.githooks/pre-commit` → replaced by gitleaks wrapper
|
||||
- `peregrine/.githooks/commit-msg` → copied verbatim to hooks repo
|
||||
- `peregrine/tests/test_hooks.sh` → migrated and extended in hooks repo
|
||||
- `peregrine/.githooks/` directory → kept temporarily, then removed after cutover
|
||||
|
||||
## Rollout Order
|
||||
|
||||
1. `circuitforge-hooks` repo — create, implement, test
|
||||
2. `peregrine` — activate (highest priority, already public)
|
||||
3. `circuitforge-license` (heimdall) — activate before any public release
|
||||
4. All subsequent repos — activate as part of their public-release checklist
|
||||
|
||||
## Testing
|
||||
|
||||
`tests/test_hooks.sh` covers:
|
||||
|
||||
- Staged file with live-format token → blocked
|
||||
- Staged file with phone number → blocked
|
||||
- Staged file with personal email in source → blocked
|
||||
- `.example` file with placeholders → allowed
|
||||
- Craigslist URL with 10-digit ID → allowed (Peregrine allowlist)
|
||||
- Valid conventional commit message → accepted
|
||||
- Non-conventional commit message → rejected
|
||||
|
||||
## What This Does Not Cover
|
||||
|
||||
- Scanning existing history on new repos (run `gitleaks git` manually before
|
||||
making any repo public — add to the public-release checklist)
|
||||
- CI/server-side enforcement (future: Forgejo Actions job on push to main)
|
||||
- Binary files or encrypted secrets at rest
|
||||
Loading…
Reference in a new issue