peregrine/docs/developer-guide/contributing.md
pyr0ball 64554dbef1
Some checks failed
CI / test (push) Failing after 19s
feat(#43): numbered SQL migration runner (Rails-style)
- migrations/001_baseline.sql: full schema baseline (all tables/cols)
- scripts/db_migrate.py: apply sorted *.sql files, track in schema_migrations
- Wired into FastAPI startup and Streamlit app.py startup
- Replaces ad-hoc digest_queue CREATE in _startup()
- 6 tests covering apply, idempotency, partial apply, failure rollback
- docs/developer-guide/contributing.md: migration authoring guide
2026-04-04 22:17:42 -07:00

137 lines
5 KiB
Markdown

# Contributing
Thank you for your interest in contributing to Peregrine. This guide covers the development environment, code standards, test requirements, and pull request process.
!!! note "License"
Peregrine uses a dual licence. The discovery pipeline (`scripts/discover.py`, `scripts/match.py`, `scripts/db.py`, `scripts/custom_boards/`) is MIT. All AI features, the UI, and everything else is BSL 1.1.
Do not add `Co-Authored-By:` trailers or AI-attribution notices to commits — this is a commercial repository.
---
## Fork and Clone
```bash
git clone https://git.circuitforge.io/circuitforge/peregrine
cd peregrine
```
Create a feature branch from `main`:
```bash
git checkout -b feat/my-feature
```
---
## Dev Environment Setup
Peregrine's Python dependencies are managed with conda. The same `job-seeker` environment is used for both the legacy personal app and Peregrine.
```bash
# Create the environment from the lockfile
conda env create -f environment.yml
# Activate
conda activate job-seeker
```
Alternatively, install from `requirements.txt` into an existing Python 3.12 environment:
```bash
pip install -r requirements.txt
```
!!! warning "Keep the env lightweight"
Do not add `torch`, `sentence-transformers`, `bitsandbytes`, `transformers`, or any other CUDA/GPU package to the main environment. These live in separate conda environments (`job-seeker-vision` for the vision service, `ogma` for fine-tuning). Adding them to the main env causes out-of-memory failures during test runs.
---
## Running Tests
```bash
conda run -n job-seeker python -m pytest tests/ -v
```
Or with the direct binary (avoids runaway process spawning):
```bash
/path/to/miniconda3/envs/job-seeker/bin/pytest tests/ -v
```
The `pytest.ini` file scopes collection to the `tests/` directory only — do not widen this.
All tests must pass before submitting a PR. See [Testing](testing.md) for patterns and conventions.
---
## Code Style
- **PEP 8** for all Python code — use `flake8` or `ruff` to check
- **Type hints preferred** on function signatures — not required but strongly encouraged
- **Docstrings** on all public functions and classes
- **No print statements** in library code (`scripts/`); use Python's `logging` module or return status in the return value. `print` is acceptable in one-off scripts and `discover.py`-style entry points.
---
## Branch Naming
| Prefix | Use for |
|--------|---------|
| `feat/` | New features |
| `fix/` | Bug fixes |
| `docs/` | Documentation only |
| `refactor/` | Code reorganisation without behaviour change |
| `test/` | Test additions or corrections |
| `chore/` | Dependency updates, CI, tooling |
Example: `feat/add-greenhouse-scraper`, `fix/email-imap-timeout`, `docs/add-integration-guide`
---
## PR Checklist
Before opening a pull request:
- [ ] All tests pass: `conda run -n job-seeker python -m pytest tests/ -v`
- [ ] New behaviour is covered by at least one test
- [ ] No new dependencies added to `environment.yml` or `requirements.txt` without a clear justification in the PR description
- [ ] Documentation updated if the PR changes user-visible behaviour (update the relevant page in `docs/`)
- [ ] Config file changes are reflected in the `.example` file
- [ ] No secrets, tokens, or personal data in any committed file
- [ ] Gitignored files (`config/*.yaml`, `staging.db`, `aihawk/`, `.env`) are not committed
---
## Database Migrations
Peregrine uses a numbered SQL migration system (Rails-style). Each migration is a `.sql` file in the `migrations/` directory at the repo root, named `NNN_description.sql` (e.g. `002_add_foo_column.sql`). Applied migrations are tracked in a `schema_migrations` table in each user database.
### Adding a migration
1. Create `migrations/NNN_description.sql` where `NNN` is the next sequential number (zero-padded to 3 digits).
2. Write standard SQL — `CREATE TABLE IF NOT EXISTS`, `ALTER TABLE ADD COLUMN`, etc. Keep each migration idempotent where possible.
3. Do **not** modify `scripts/db.py`'s legacy `_MIGRATIONS` lists — those are superseded and will be removed once all active databases have been bootstrapped by the migration runner.
4. The runner (`scripts/db_migrate.py`) applies pending migrations at startup automatically (both FastAPI and Streamlit paths call `migrate_db(db_path)`).
### Rollbacks
SQLite does not support transactional DDL for all statement types. Write forward-only migrations. If you need to undo a schema change, add a new migration that reverses it.
---
## What NOT to Do
- Do not commit `config/user.yaml`, `config/notion.yaml`, `config/email.yaml`, `config/adzuna.yaml`, or any `config/integrations/*.yaml` — all are gitignored
- Do not commit `staging.db`
- Do not add `torch`, `bitsandbytes`, `transformers`, or `sentence-transformers` to the main environment
- Do not add `Co-Authored-By:` or AI-attribution lines to commit messages
- Do not force-push to `main`
---
## Getting Help
Open an issue on the repository with the `question` label. Include:
- Your OS and Docker version
- The `inference_profile` from your `config/user.yaml`
- Relevant log output from `make logs`