feat: SSH remote host glean (#22) #28

Open
pyr0ball wants to merge 0 commits from feat/ssh-remote-glean into main
Owner

Implements Turnstone issue #22 — SSH remote log collection across all three layers.

What ships

Transport layer (app/glean/ssh.py)

  • SSHTransport context manager: key-only auth, paramiko backend
  • SSHConnectionError / SSHCommandError exception hierarchy with two-tier isolation
  • exec_stream() generator: zero-copy stdout streaming; raises SSHCommandError on non-zero exit
  • Command builders for journald, syslog, plaintext, docker
  • 18 unit tests

Pipeline integration (app/glean/pipeline.py)

  • _stream_and_write(): per-glean-item error isolation
  • _glean_ssh_source(): one connection per host, dispatches all glean: items; SSHConnectionError aborts host gracefully
  • glean_sources(): splits local vs SSH sources; unified FTS rebuild at end
  • glean_ssh_source(): public wrapper for REST use
  • 15 integration tests

REST layer (app/rest.py)

  • GET /api/sources/configured: reads sources.yaml, enriches with DB stats — SSH sources visible before first glean; sub-source IDs aggregated per host
  • POST /api/sources/{id}/glean: detects transport:ssh, dispatches to glean_ssh_source() wrapper

Frontend (web/src/views/SourcesView.vue)

  • Parallel fetch of /api/sources/configured + /api/sources; merged into unified table
  • SSH rows: ssh badge with user@host tooltip, glean-type pills, host subtitle
  • DB-only sources (uploads) show uploaded badge; reglean disabled
  • Delete zeroes out configured-source stats in place

All 285 tests passing.

Closes #22

Implements Turnstone issue #22 — SSH remote log collection across all three layers. ## What ships **Transport layer** (`app/glean/ssh.py`) - `SSHTransport` context manager: key-only auth, paramiko backend - `SSHConnectionError` / `SSHCommandError` exception hierarchy with two-tier isolation - `exec_stream()` generator: zero-copy stdout streaming; raises `SSHCommandError` on non-zero exit - Command builders for `journald`, `syslog`, `plaintext`, `docker` - 18 unit tests **Pipeline integration** (`app/glean/pipeline.py`) - `_stream_and_write()`: per-glean-item error isolation - `_glean_ssh_source()`: one connection per host, dispatches all `glean:` items; `SSHConnectionError` aborts host gracefully - `glean_sources()`: splits local vs SSH sources; unified FTS rebuild at end - `glean_ssh_source()`: public wrapper for REST use - 15 integration tests **REST layer** (`app/rest.py`) - `GET /api/sources/configured`: reads `sources.yaml`, enriches with DB stats — SSH sources visible before first glean; sub-source IDs aggregated per host - `POST /api/sources/{id}/glean`: detects `transport:ssh`, dispatches to `glean_ssh_source()` wrapper **Frontend** (`web/src/views/SourcesView.vue`) - Parallel fetch of `/api/sources/configured` + `/api/sources`; merged into unified table - SSH rows: `ssh` badge with `user@host` tooltip, glean-type pills, host subtitle - DB-only sources (uploads) show `uploaded` badge; reglean disabled - Delete zeroes out configured-source stats in place **All 285 tests passing.** Closes #22
pyr0ball added 3 commits 2026-05-21 12:38:04 -07:00
Renames the app/ingest/ package to app/glean/ and updates all
references across Python modules, shell scripts, Vue components,
tests, and documentation.

Intentionally preserved:
- SQLite column name ingest_time (avoids schema migration)
- RetrievedEntry.ingest_time field (maps to the column above)
- Any public-facing JSON keys that reference ingest_time

Changes by category:
- app/ingest/ → app/glean/ (full package move, all parsers)
- app/tasks/ingest_scheduler.py → app/tasks/glean_scheduler.py
- scripts/ingest_corpus.py → scripts/glean_corpus.py
- tests/test_ingest_*.py → tests/test_glean_*.py
- Docstrings, log messages, comments: ingest → glean
- Env var: TURNSTONE_INGEST_INTERVAL → TURNSTONE_GLEAN_INTERVAL
- Shell scripts: glean.log, glean_corpus.py references
- README.md: multi-source ingest → multi-source glean
- .env.example: updated env var name
- patterns/: new diagnostic patterns from 2026-05-20 SSH incident
  (service_crash_loop, pkg_daemon_restart, ssh_forward_conflict)
- SourcesView.vue: pipeline label updated
- All test import paths updated to app.glean.*

285 tests passing.
Adds SSH-based log collection from remote hosts via Paramiko.
One SSH connection per host, multiple log types per connection.

New files:
- app/glean/ssh.py: SSHTransport context manager + command builders
  for journald, syslog, plaintext, and docker log types
- tests/test_glean_ssh.py: 18 tests for transport layer (all mocked)
- tests/test_glean_pipeline_ssh.py: 15 tests for pipeline integration

Pipeline changes (app/glean/pipeline.py):
- glean_sources() now splits sources into local-file and SSH categories
- SSH sources use transport: ssh + glean: list schema in sources.yaml
- _glean_ssh_source(): one SSHTransport per host, N commands per connection
- _stream_and_write(): SSHCommandError caught per-item so one bad
  command does not abort the rest of the host's glean items
- SSHConnectionError skips the entire host with a warning log

SSH source schema (sources.yaml):
  - id: rack01
    transport: ssh
    host: 192.168.1.10
    user: admin
    key_path: ~/.ssh/id_ed25519
    glean:
      - type: journald
        args: [--since, 2 hours ago]
      - type: syslog
        path: /var/log/syslog
      - type: plaintext
        path: /var/log/app/error.log
      - type: docker
        containers: [myapp, nginx]

Key design decisions:
- Key-based auth only (no password prompts in daemon context)
- exit-status check fires after all stdout lines yielded; callers
  drain the iterator to trigger it
- Local file sources path unchanged; SSH sources co-exist in same yaml
- Docker multi-container: one exec_stream call per container,
  source_id scoped as host_id/type/container_name

Remaining for #22: REST endpoint, SourcesView UI, sources.yaml docs.
285 → 285 tests passing (33 new SSH tests).
Closes turnstone#22.

## Transport layer (app/glean/ssh.py)
- SSHTransport context manager: key-only auth, paramiko backend
- SSHConnectionError / SSHCommandError exception hierarchy
- exec_stream() generator: yields stdout lines, raises SSHCommandError on
  non-zero exit (isinstance(int) guard for test-mock safety)
- Command builders: _build_journald_command, _build_syslog_command,
  _build_plaintext_command, _build_docker_command
- 18 unit tests in tests/test_glean_ssh.py

## Pipeline integration (app/glean/pipeline.py)
- _stream_and_write(): per-item error isolation — SSHCommandError skips
  one glean item without aborting the rest of the host connection
- _glean_ssh_source(): one SSHTransport per host, dispatches all glean
  items (journald/syslog/plaintext/docker); SSHConnectionError aborts host
- glean_sources(): splits local vs SSH sources; local → _glean_files();
  SSH → _glean_ssh_source(); shared compiled patterns and DB connection
- glean_ssh_source(): public wrapper for REST use — manages DB connection,
  pattern compilation, FTS rebuild lifecycle
- 15 integration tests in tests/test_glean_pipeline_ssh.py
- All 285 tests passing

## REST layer (app/rest.py)
- GET /api/sources/configured: reads sources.yaml and enriches with DB
  stats; SSH sources appear before first glean (entry_count=0); sub-source
  IDs (rack01/journald, rack01/docker/myapp) aggregated per host entry
- POST /api/sources/{id}/glean: detects transport:ssh and dispatches to
  glean_ssh_source() wrapper; local sources unchanged
- Import: glean_ssh_source as _glean_ssh_source

## Frontend (web/src/views/SourcesView.vue)
- Fetches /api/sources/configured (primary) + /api/sources (DB-only) in
  parallel; merges into unified SourceRow list
- SSH sources show: ssh badge (with user@host tooltip), glean-type pills
  (journald/syslog/docker/etc.), host subtitle
- SSH sub-source IDs (rack01/journald) suppressed from the DB-only list
  since they are covered by the parent SSH row
- DB-only sources (uploads) appear below configured sources with 'uploaded'
  badge; reglean button disabled (not in sources.yaml)
- Delete zeroes out configured-source stats in-place rather than removing
  the row (so the source remains visible for re-gleaning)
pyr0ball force-pushed feat/ssh-remote-glean from e746d55730 to 3e7a1fa064 2026-06-13 21:54:51 -07:00 Compare
pyr0ball force-pushed feat/ssh-remote-glean from 3e7a1fa064 to f7bcc6c9b7 2026-06-13 22:18:28 -07:00 Compare
This branch is already included in the target branch. There is nothing to merge.
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin feat/ssh-remote-glean:feat/ssh-remote-glean
git checkout feat/ssh-remote-glean
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Circuit-Forge/turnstone#28
No description provided.