context_facts, context_documents, and context_chunks now live in
turnstone-context.db (sibling of turnstone.db). The glean scheduler
held write locks on the main DB long enough to cause 5-second timeout
failures on context fact inserts; separate files have independent WAL
write locks so they never contend.
Changes:
- pipeline.py: extract _CONTEXT_SCHEMA + ensure_context_schema()
- rest.py: CONTEXT_DB_PATH (TURNSTONE_CONTEXT_DB env var, defaults to
sibling file); init via ensure_context_schema(); all context routes
pass CONTEXT_DB_PATH; diagnose_stream receives context_db_path kwarg
- diagnose/__init__.py: diagnose_stream() accepts context_db_path
(falls back to db_path for backward compat); retrieve_context uses it
- store.py: sqlite3.connect() timeout=30.0 — Python driver retry loop
is independent of PRAGMA busy_timeout; needed for any remaining
contention during test or single-file deployments
Closes: #42
Closes turnstone#22.
## Transport layer (app/glean/ssh.py)
- SSHTransport context manager: key-only auth, paramiko backend
- SSHConnectionError / SSHCommandError exception hierarchy
- exec_stream() generator: yields stdout lines, raises SSHCommandError on
non-zero exit (isinstance(int) guard for test-mock safety)
- Command builders: _build_journald_command, _build_syslog_command,
_build_plaintext_command, _build_docker_command
- 18 unit tests in tests/test_glean_ssh.py
## Pipeline integration (app/glean/pipeline.py)
- _stream_and_write(): per-item error isolation — SSHCommandError skips
one glean item without aborting the rest of the host connection
- _glean_ssh_source(): one SSHTransport per host, dispatches all glean
items (journald/syslog/plaintext/docker); SSHConnectionError aborts host
- glean_sources(): splits local vs SSH sources; local → _glean_files();
SSH → _glean_ssh_source(); shared compiled patterns and DB connection
- glean_ssh_source(): public wrapper for REST use — manages DB connection,
pattern compilation, FTS rebuild lifecycle
- 15 integration tests in tests/test_glean_pipeline_ssh.py
- All 285 tests passing
## REST layer (app/rest.py)
- GET /api/sources/configured: reads sources.yaml and enriches with DB
stats; SSH sources appear before first glean (entry_count=0); sub-source
IDs (rack01/journald, rack01/docker/myapp) aggregated per host entry
- POST /api/sources/{id}/glean: detects transport:ssh and dispatches to
glean_ssh_source() wrapper; local sources unchanged
- Import: glean_ssh_source as _glean_ssh_source
## Frontend (web/src/views/SourcesView.vue)
- Fetches /api/sources/configured (primary) + /api/sources (DB-only) in
parallel; merges into unified SourceRow list
- SSH sources show: ssh badge (with user@host tooltip), glean-type pills
(journald/syslog/docker/etc.), host subtitle
- SSH sub-source IDs (rack01/journald) suppressed from the DB-only list
since they are covered by the parent SSH row
- DB-only sources (uploads) appear below configured sources with 'uploaded'
badge; reglean button disabled (not in sources.yaml)
- Delete zeroes out configured-source stats in-place rather than removing
the row (so the source remains visible for re-gleaning)