feat: SSH remote host ingest — pull logs from remote systems over SSH #22

Closed
opened 2026-05-17 14:17:35 -07:00 by pyr0ball · 0 comments
Owner

Summary

Add SSH as a transport layer for log ingestion, so Turnstone can pull logs from remote hosts without requiring a persistent agent on that host. The user registers a remote host with SSH credentials; Turnstone connects, runs the appropriate remote command, and pipes the output through the existing local parsers.

Motivation

All current ingest modules (journald, docker_log, syslog, caddy, etc.) read from the local system only. A common field workflow is: technician opens Turnstone on their own machine, connects to a remote system over SSH, pulls logs, and analyzes them locally. This requires no changes to the remote host — only SSH access and the standard tools (journalctl, docker, etc.) already present there.

Design

SSH is a transport wrapper, not a new parser. The existing ingest modules handle parsing; SSH handles delivery.

Remote host
  journalctl -o json --since="1 hour ago"
  docker logs <container>
  cat /var/log/syslog
    ↓ (SSH pipe)
Local Turnstone
  → journald.py parser
  → docker_log.py parser  
  → syslog.py parser
  → SQLite index + FTS

New module: app/ingest/ssh.py

Wrap paramiko (or subprocess ssh) to:

  • Open an SSH connection to a registered remote host
  • Execute a command string
  • Return stdout as a stream
  • Pass stream to the appropriate existing parser

Source registration

Extend source config to support a transport field:

sources:
  - name: rack-server-01
    transport: ssh
    host: 192.168.1.10
    user: admin
    key_path: ~/.ssh/id_ed25519
    ingest:
      - type: journald
        args: ["--since", "2 hours ago", "--unit", "myservice"]
      - type: docker_log
        containers: ["myapp", "nginx"]
      - type: plaintext
        path: /var/log/app/error.log

Sources without transport (or transport: local) continue to work as today — no regression.

Acceptance Criteria

  • app/ingest/ssh.py — SSH transport; connects, runs command, returns stdout stream
  • journald, docker_log, syslog, and plaintext parsers usable over SSH transport
  • Source config extended with transport: ssh, host, user, key_path fields
  • POST /api/sources accepts SSH source registration
  • SSH sources appear in the sources list UI alongside local sources
  • On-demand pull: user triggers ingest for a specific remote source from the UI
  • Connection errors (host unreachable, auth failure) reported clearly — no silent failure
  • Local sources unaffected (no regression)

Implementation notes

  • Prefer paramiko over subprocess ssh for programmatic control and better error handling
  • Key-based auth only for now; password auth is out of scope (security posture)
  • Streaming is preferred over buffering the full output — large log pulls should not OOM
  • The remote command should be configurable (not hardcoded) so non-standard log setups can be accommodated

Out of scope

  • Persistent agent on the remote host
  • Scheduled/automatic polling over SSH (can be added later)
  • SSH tunneling or jump hosts (post-launch backlog)
## Summary Add SSH as a transport layer for log ingestion, so Turnstone can pull logs from remote hosts without requiring a persistent agent on that host. The user registers a remote host with SSH credentials; Turnstone connects, runs the appropriate remote command, and pipes the output through the existing local parsers. ## Motivation All current ingest modules (`journald`, `docker_log`, `syslog`, `caddy`, etc.) read from the local system only. A common field workflow is: technician opens Turnstone on their own machine, connects to a remote system over SSH, pulls logs, and analyzes them locally. This requires no changes to the remote host — only SSH access and the standard tools (`journalctl`, `docker`, etc.) already present there. ## Design SSH is a **transport wrapper**, not a new parser. The existing ingest modules handle parsing; SSH handles delivery. ``` Remote host journalctl -o json --since="1 hour ago" docker logs <container> cat /var/log/syslog ↓ (SSH pipe) Local Turnstone → journald.py parser → docker_log.py parser → syslog.py parser → SQLite index + FTS ``` ### New module: `app/ingest/ssh.py` Wrap `paramiko` (or `subprocess ssh`) to: - Open an SSH connection to a registered remote host - Execute a command string - Return stdout as a stream - Pass stream to the appropriate existing parser ### Source registration Extend source config to support a `transport` field: ```yaml sources: - name: rack-server-01 transport: ssh host: 192.168.1.10 user: admin key_path: ~/.ssh/id_ed25519 ingest: - type: journald args: ["--since", "2 hours ago", "--unit", "myservice"] - type: docker_log containers: ["myapp", "nginx"] - type: plaintext path: /var/log/app/error.log ``` Sources without `transport` (or `transport: local`) continue to work as today — no regression. ## Acceptance Criteria - [ ] `app/ingest/ssh.py` — SSH transport; connects, runs command, returns stdout stream - [ ] `journald`, `docker_log`, `syslog`, and `plaintext` parsers usable over SSH transport - [ ] Source config extended with `transport: ssh`, `host`, `user`, `key_path` fields - [ ] `POST /api/sources` accepts SSH source registration - [ ] SSH sources appear in the sources list UI alongside local sources - [ ] On-demand pull: user triggers ingest for a specific remote source from the UI - [ ] Connection errors (host unreachable, auth failure) reported clearly — no silent failure - [ ] Local sources unaffected (no regression) ## Implementation notes - Prefer `paramiko` over subprocess ssh for programmatic control and better error handling - Key-based auth only for now; password auth is out of scope (security posture) - Streaming is preferred over buffering the full output — large log pulls should not OOM - The remote command should be configurable (not hardcoded) so non-standard log setups can be accommodated ## Out of scope - Persistent agent on the remote host - Scheduled/automatic polling over SSH (can be added later) - SSH tunneling or jump hosts (post-launch backlog)
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Circuit-Forge/turnstone#22
No description provided.