peregrine/docs/developer-guide/contributing.md
pyr0ball 64554dbef1
Some checks failed
CI / test (push) Failing after 19s
feat(#43): numbered SQL migration runner (Rails-style)
- migrations/001_baseline.sql: full schema baseline (all tables/cols)
- scripts/db_migrate.py: apply sorted *.sql files, track in schema_migrations
- Wired into FastAPI startup and Streamlit app.py startup
- Replaces ad-hoc digest_queue CREATE in _startup()
- 6 tests covering apply, idempotency, partial apply, failure rollback
- docs/developer-guide/contributing.md: migration authoring guide
2026-04-04 22:17:42 -07:00

5 KiB

Contributing

Thank you for your interest in contributing to Peregrine. This guide covers the development environment, code standards, test requirements, and pull request process.

!!! note "License" Peregrine uses a dual licence. The discovery pipeline (scripts/discover.py, scripts/match.py, scripts/db.py, scripts/custom_boards/) is MIT. All AI features, the UI, and everything else is BSL 1.1. Do not add Co-Authored-By: trailers or AI-attribution notices to commits — this is a commercial repository.


Fork and Clone

git clone https://git.circuitforge.io/circuitforge/peregrine
cd peregrine

Create a feature branch from main:

git checkout -b feat/my-feature

Dev Environment Setup

Peregrine's Python dependencies are managed with conda. The same job-seeker environment is used for both the legacy personal app and Peregrine.

# Create the environment from the lockfile
conda env create -f environment.yml

# Activate
conda activate job-seeker

Alternatively, install from requirements.txt into an existing Python 3.12 environment:

pip install -r requirements.txt

!!! warning "Keep the env lightweight" Do not add torch, sentence-transformers, bitsandbytes, transformers, or any other CUDA/GPU package to the main environment. These live in separate conda environments (job-seeker-vision for the vision service, ogma for fine-tuning). Adding them to the main env causes out-of-memory failures during test runs.


Running Tests

conda run -n job-seeker python -m pytest tests/ -v

Or with the direct binary (avoids runaway process spawning):

/path/to/miniconda3/envs/job-seeker/bin/pytest tests/ -v

The pytest.ini file scopes collection to the tests/ directory only — do not widen this.

All tests must pass before submitting a PR. See Testing for patterns and conventions.


Code Style

  • PEP 8 for all Python code — use flake8 or ruff to check
  • Type hints preferred on function signatures — not required but strongly encouraged
  • Docstrings on all public functions and classes
  • No print statements in library code (scripts/); use Python's logging module or return status in the return value. print is acceptable in one-off scripts and discover.py-style entry points.

Branch Naming

Prefix Use for
feat/ New features
fix/ Bug fixes
docs/ Documentation only
refactor/ Code reorganisation without behaviour change
test/ Test additions or corrections
chore/ Dependency updates, CI, tooling

Example: feat/add-greenhouse-scraper, fix/email-imap-timeout, docs/add-integration-guide


PR Checklist

Before opening a pull request:

  • All tests pass: conda run -n job-seeker python -m pytest tests/ -v
  • New behaviour is covered by at least one test
  • No new dependencies added to environment.yml or requirements.txt without a clear justification in the PR description
  • Documentation updated if the PR changes user-visible behaviour (update the relevant page in docs/)
  • Config file changes are reflected in the .example file
  • No secrets, tokens, or personal data in any committed file
  • Gitignored files (config/*.yaml, staging.db, aihawk/, .env) are not committed

Database Migrations

Peregrine uses a numbered SQL migration system (Rails-style). Each migration is a .sql file in the migrations/ directory at the repo root, named NNN_description.sql (e.g. 002_add_foo_column.sql). Applied migrations are tracked in a schema_migrations table in each user database.

Adding a migration

  1. Create migrations/NNN_description.sql where NNN is the next sequential number (zero-padded to 3 digits).
  2. Write standard SQL — CREATE TABLE IF NOT EXISTS, ALTER TABLE ADD COLUMN, etc. Keep each migration idempotent where possible.
  3. Do not modify scripts/db.py's legacy _MIGRATIONS lists — those are superseded and will be removed once all active databases have been bootstrapped by the migration runner.
  4. The runner (scripts/db_migrate.py) applies pending migrations at startup automatically (both FastAPI and Streamlit paths call migrate_db(db_path)).

Rollbacks

SQLite does not support transactional DDL for all statement types. Write forward-only migrations. If you need to undo a schema change, add a new migration that reverses it.


What NOT to Do

  • Do not commit config/user.yaml, config/notion.yaml, config/email.yaml, config/adzuna.yaml, or any config/integrations/*.yaml — all are gitignored
  • Do not commit staging.db
  • Do not add torch, bitsandbytes, transformers, or sentence-transformers to the main environment
  • Do not add Co-Authored-By: or AI-attribution lines to commit messages
  • Do not force-push to main

Getting Help

Open an issue on the repository with the question label. Include:

  • Your OS and Docker version
  • The inference_profile from your config/user.yaml
  • Relevant log output from make logs