Commit graph

6 commits

Author SHA1 Message Date
0311d72e53 feat: dual-backend SQLite/Postgres + multi-tenant source namespacing
- Add app/db/ abstraction layer: Backend enum, DbConn wrapper,
  dialect helper (q() for ? vs %s paramstyle), get_conn(), tenant_id()
- Auto-detect backend from DATABASE_URL; SQLite remains default when
  unset — no config change for local deployments
- Add tenant_id column to all three logical DBs (main, context, incidents);
  idempotent ALTER TABLE migration runs before schema scripts on existing DBs
- All INSERTs inject tenant_id; SELECTs use (tenant_id = ? OR tenant_id = '')
  for backward compat with pre-namespacing rows
- Add docker-compose.yml with named volume turnstone_pgdata (survives rebuilds)
  and optional external Postgres support via DATABASE_URL override
- Add scripts/migrate_sqlite_to_postgres.py — one-shot idempotent migration
  for existing SQLite data; ON CONFLICT DO NOTHING for safe re-runs
- Fix SSH glean path in pipeline.py to use ensure_schema + get_conn
  (was still using raw sqlite3.connect + old _SCHEMA without tenant_id)
- Fix FTS5 JOIN ambiguity: qualify repeat_count as f.repeat_count in search
- Update all tests to use ensure_*_schema fixtures; add row_factory where needed
- 394/394 tests passing

Closes: #42
Closes: #50
2026-06-08 08:37:54 -07:00
1131816666 feat: bundle PII sanitization, onboarding wizard, NL source addition (#51, #52, #53)
Bundle export (#51):
- _redact_text() with 5 compiled regex patterns (IPv4, email, user=, host=, password=)
- build_bundle(sanitize=False) — per-entry redaction at export time
- sent_bundles table tracks every outgoing export (GET and POST /send)
- GET /api/sent-bundles exposes history; SentBundle model added
- BundlesView: Received/Sent tabs, sanitized badge, 5-entry preview, re-download
- IncidentsView: Sanitize PII checkbox next to Send Bundle

Onboarding wizard (#52):
- app/services/discover.py: journald/Docker/file detection (best-effort, safe in containers)
- GET /api/setup/status, /discover, POST /api/setup/write (additive, appends to existing)
- SetupWizard.vue: 3-step Detect → Select → Confirm
  - Step 1 shows grouped summary (journald/file/docker counts)
  - Step 2: collapsible groups with All/None section toggles
    - journald + file: pre-selected; docker: collapsed, none pre-selected
  - Step 3: YAML preview before write
- SourcesView: shows wizard on first run; Add Source button reuses it

NL source addition (#53):
- app/services/nl_source.py: keyword shortcut (13 well-known apps) + LLM fallback
- POST /api/setup/interpret: keyword → LLM → null (graceful fallback)
- NL field in wizard step 2; manual form shown when interpretation fails
- Added sources appear in grouped list immediately
2026-05-29 14:14:28 -07:00
7f49961ec4 fix(db): add timeout=30s to all sqlite3.connect() calls across app
Watcher, REST endpoints, services (search, incidents, blocklist),
MCP server, context retriever, embedder, glean_scheduler, and
doc_upload all used the default 5-second SQLite busy timeout.
During collect glean write phases, watcher flush threads were hitting
'database is locked' errors when the glean held the write lock longer
than 5 seconds.

All connections now use timeout=30.0, matching the pipeline fix
from commit 5a9281a. No logic changes.
2026-05-26 23:12:48 -07:00
828b69768a refactor: rename ingest → glean throughout codebase
Renames the app/ingest/ package to app/glean/ and updates all
references across Python modules, shell scripts, Vue components,
tests, and documentation.

Intentionally preserved:
- SQLite column name ingest_time (avoids schema migration)
- RetrievedEntry.ingest_time field (maps to the column above)
- Any public-facing JSON keys that reference ingest_time

Changes by category:
- app/ingest/ → app/glean/ (full package move, all parsers)
- app/tasks/ingest_scheduler.py → app/tasks/glean_scheduler.py
- scripts/ingest_corpus.py → scripts/glean_corpus.py
- tests/test_ingest_*.py → tests/test_glean_*.py
- Docstrings, log messages, comments: ingest → glean
- Env var: TURNSTONE_INGEST_INTERVAL → TURNSTONE_GLEAN_INTERVAL
- Shell scripts: glean.log, glean_corpus.py references
- README.md: multi-source ingest → multi-source glean
- .env.example: updated env var name
- patterns/: new diagnostic patterns from 2026-05-20 SSH incident
  (service_crash_loop, pkg_daemon_restart, ssh_forward_conflict)
- SourcesView.vue: pipeline label updated
- All test import paths updated to app.glean.*

285 tests passing.
2026-05-20 23:02:55 -07:00
457b4fd7ae feat: incident labeling, bundle export, and push/receive flow
Turnstone incidents now carry an issue_type tag (free-text with datalist
suggestions) used to categorize patterns for signature building.

Backend:
- Incident model gains issue_type; additive ALTER TABLE migration keeps
  existing DBs working without a full schema rebuild
- New received_bundles table stores incoming JSON bundles with indexes on
  bundled_at and issue_type
- build_bundle() assembles incident + related log entries into a versioned
  bundle dict; store_bundle()/list_bundles()/get_bundle() for the receiver
- POST /api/incidents/{id}/send — pushes bundle to TURNSTONE_BUNDLE_ENDPOINT
- GET  /api/incidents/{id}/bundle — export without sending
- POST /api/bundles — receive and store an incoming bundle
- GET  /api/bundles — list all received bundles
- TURNSTONE_SOURCE_HOST and TURNSTONE_BUNDLE_ENDPOINT env vars; auto-set
  source host from hostname in podman-standalone.sh

Frontend:
- Incidents form: issue_type field with datalist suggestions; Type column
  in the table; Send Bundle button + status feedback in the detail drawer
- New BundlesView: collapsible bundle rows, inline JSON parse (no extra
  round-trip), Export JSON download button
- Router and nav updated with /bundles route
2026-05-11 05:23:55 -07:00
d63dc2a714 feat: incident tagging — DB schema, CRUD service, REST API (#1)
- Add `incidents` table to SQLite schema (id, label, started_at, ended_at,
  notes, created_at, severity)
- Extract `ensure_schema()` from ingest pipeline so tables are always
  created at startup, not only during ingest
- New `app/services/incidents.py`: create/list/get/delete + time-window
  entry association (FTS keyword search + raw window fallback)
- New `entries_in_window()` in search.py: plain SQL scan for incident
  detail when keyword FTS returns nothing
- REST endpoints: POST/GET /api/incidents, GET/DELETE /api/incidents/{id}
- Incident detail returns up to 100 associated log entries sorted by
  timestamp, prioritising FTS keyword hits then ERROR/CRITICAL then all
2026-05-09 15:37:14 -07:00