4a2fd0fb0d
feat(pipeline): add TURNSTONE_CLASSIFIER_MODEL env var for Stage 2 ML config
...
Makes the HuggingFace classifier model for Stage 2 configurable via
TURNSTONE_CLASSIFIER_MODEL. When unset (default), Stage 2 falls back
to pattern_tags then regex — no download required on first run.
Also documents TURNSTONE_MULTI_AGENT_DIAGNOSE, TURNSTONE_CLASSIFIER_MODEL,
TURNSTONE_EMBED_BACKEND/MODEL/DEVICE in .env.example.
2026-05-25 19:11:32 -07:00
12cd0a23d5
refactor: rename ingest → glean throughout codebase
...
Renames the app/ingest/ package to app/glean/ and updates all
references across Python modules, shell scripts, Vue components,
tests, and documentation.
Intentionally preserved:
- SQLite column name ingest_time (avoids schema migration)
- RetrievedEntry.ingest_time field (maps to the column above)
- Any public-facing JSON keys that reference ingest_time
Changes by category:
- app/ingest/ → app/glean/ (full package move, all parsers)
- app/tasks/ingest_scheduler.py → app/tasks/glean_scheduler.py
- scripts/ingest_corpus.py → scripts/glean_corpus.py
- tests/test_ingest_*.py → tests/test_glean_*.py
- Docstrings, log messages, comments: ingest → glean
- Env var: TURNSTONE_INGEST_INTERVAL → TURNSTONE_GLEAN_INTERVAL
- Shell scripts: glean.log, glean_corpus.py references
- README.md: multi-source ingest → multi-source glean
- .env.example: updated env var name
- patterns/: new diagnostic patterns from 2026-05-20 SSH incident
(service_crash_loop, pkg_daemon_restart, ssh_forward_conflict)
- SourcesView.vue: pipeline label updated
- All test import paths updated to app.glean.*
285 tests passing.
2026-05-20 23:02:55 -07:00
82977f365b
feat: periodic ingest scheduler + Orchard submission pipeline
...
Adds asyncio-native background scheduler (TURNSTONE_INGEST_INTERVAL,
default 900s) that runs batch ingest then pushes pattern-matched entries
to a remote CF harvest endpoint (TURNSTONE_SUBMIT_ENDPOINT).
- app/tasks/ingest_scheduler.py: IngestState, scheduler_loop, run_once,
submit_matched, _query_matched_since — asyncio.Lock prevents concurrent runs
- app/rest.py: POST /api/ingest/batch (pre-parsed entry receiver),
GET /api/tasks/ingest/status, POST /api/tasks/ingest (manual trigger),
TURNSTONE_INGEST_INTERVAL + TURNSTONE_SUBMIT_ENDPOINT env wiring in lifespan
- docker-compose.submissions.yml: segregated daniel (8536) + xander (8537)
receiving instances on Heimdall, isolated DBs under
/devl/docker/turnstone-submissions/<node>/
- podman-standalone.sh: pass-through for TURNSTONE_SUBMIT_ENDPOINT +
TURNSTONE_SOURCE_HOST
- app/ingest/mqtt_subscriber.py: MQTT log source adapter
- app/ingest/wazuh.py: Wazuh alert JSON adapter
- tests/test_ingest_wazuh.py: Wazuh adapter test suite
2026-05-20 08:57:25 -07:00
16fe5f70a5
feat: Alpha milestone — corpus management, upload ingest, harvester agent
...
Closes #1 (incident tagging — already implemented), #2 , #3 , #5 .
- feat(api): DELETE /api/sources/{id} — purge entries + FTS rows for a source
- feat(api): POST /api/sources/{id}/ingest — re-ingest from sources.yaml
- feat(api): POST /api/ingest/upload — multipart log file upload with auto-detect
- feat(ui): SourcesView reingest + delete buttons and upload file input (#2 )
- feat(harvester): harvester.py push + incident subcommands (#5 )
- feat(harvester): Dockerfile, docker-compose.yml, harvester.sh (containerless)
- feat(config): GPU_SERVER_URL → CF_ORCH_URL resolution + write-back (#20 )
- docs: .env.example, README Configuration table, version bump to 0.5.0
2026-05-19 07:45:58 -07:00