feat: log corpus receiver — accept Turnstone push batches, label for logreading fine-tune #61

Open
opened 2026-05-11 16:17:47 -07:00 by pyr0ball · 0 comments
Owner

Avocet receives push batches from consented Turnstone nodes, stores log entries with consent metadata, and provides a labeling UI to annotate failure type + plain-language explanation for logreading fine-tune training.

Design spec: circuitforge-plans/turnstone/superpowers/specs/2026-05-11-log-corpus-pipeline-design.md

New DB tables

  • corpus_sources — registered Turnstone nodes + consent tokens + owner metadata
  • corpus_batches — received push batches with source + watermark
  • corpus_entries — individual log entries, one per row, with label state

New API module: app/data/log_corpus.py

POST /api/corpus/log-batch           Receive push from Turnstone node (validates consent token)
GET  /api/corpus/entries             Queue of unlabeled entries
POST /api/corpus/entries/{id}/label  Submit failure_type + plain_explanation + known_pattern
POST /api/corpus/entries/{id}/skip   Skip
GET  /api/corpus/stats               Counts by source/severity/label_state
GET  /api/corpus/export              Download labeled JSONL for SFT harness

Labeling UI: new Log Corpus tab

Label fields per entry:

  • Failure type: hardware / software / network / security / application / none / other
  • Plain explanation: what a non-sysadmin should be told (1-3 sentences)
  • Known pattern: yes / no / unsure
  • Optional fix hint: action that resolves it

Matches existing bucket-drag ASMR labeling pattern where possible. Display: service name + timestamp + severity + log text.

Fine-tune integration

GET /api/corpus/export outputs JSONL compatible with existing SFT harness:

{"input": "<log text>", "output": "<plain explanation>", "metadata": {"failure_type": "...", "source": "..."}}

New job type: type = "logreading" in app/train/train.py.

Seed data

On first run: populate corpus_sources with consent tokens for xanderland + orchard nodes.

Privacy

  • Consent token per node; token issuance = consent confirmation
  • Revoke by setting corpus_sources.active = false + deleting entries
  • Flag in labeling UI for entries that may contain PII (personally identifiable information) — exclude from export

See also: turnstone#6 (push exporter)

Avocet receives push batches from consented Turnstone nodes, stores log entries with consent metadata, and provides a labeling UI to annotate failure type + plain-language explanation for logreading fine-tune training. **Design spec:** `circuitforge-plans/turnstone/superpowers/specs/2026-05-11-log-corpus-pipeline-design.md` ## New DB tables - `corpus_sources` — registered Turnstone nodes + consent tokens + owner metadata - `corpus_batches` — received push batches with source + watermark - `corpus_entries` — individual log entries, one per row, with label state ## New API module: `app/data/log_corpus.py` ``` POST /api/corpus/log-batch Receive push from Turnstone node (validates consent token) GET /api/corpus/entries Queue of unlabeled entries POST /api/corpus/entries/{id}/label Submit failure_type + plain_explanation + known_pattern POST /api/corpus/entries/{id}/skip Skip GET /api/corpus/stats Counts by source/severity/label_state GET /api/corpus/export Download labeled JSONL for SFT harness ``` ## Labeling UI: new Log Corpus tab Label fields per entry: - **Failure type**: hardware / software / network / security / application / none / other - **Plain explanation**: what a non-sysadmin should be told (1-3 sentences) - **Known pattern**: yes / no / unsure - **Optional fix hint**: action that resolves it Matches existing bucket-drag ASMR labeling pattern where possible. Display: service name + timestamp + severity + log text. ## Fine-tune integration `GET /api/corpus/export` outputs JSONL compatible with existing SFT harness: ```json {"input": "<log text>", "output": "<plain explanation>", "metadata": {"failure_type": "...", "source": "..."}} ``` New job type: `type = "logreading"` in `app/train/train.py`. ## Seed data On first run: populate `corpus_sources` with consent tokens for xanderland + orchard nodes. ## Privacy - Consent token per node; token issuance = consent confirmation - Revoke by setting `corpus_sources.active = false` + deleting entries - Flag in labeling UI for entries that may contain PII (personally identifiable information) — exclude from export See also: turnstone#6 (push exporter)
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Circuit-Forge/avocet#61
No description provided.