Ingest pipeline scrape logs from shared dir into log corpus #67
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Avocet should be able to ingest structured pipeline log files from the shared log directory
/Library/Assets/logs/pipeline/into the log corpus for Turnstone logreading model training. This is the pull-side companion to kiwi#141.Shared log directory
Files are JSONL, one per scrape run, named
<script>_<ts>.jsonl. Each line is a structured log record:Implementation options
POST /api/log-corpusendpoint/Library/Assets/logs/pipeline/for new files and auto-ingests (similar to how the email corpus works)Option 1 is simpler for now; Option 2 is better long-term if scrape runs happen frequently.
Label assignment
Pipeline log lines should get a default label (e.g.
pipeline_scrape) so they are kept separate from app/service logs in the Turnstone training split. The label schema inapp/data/log_corpus.pymay need a new source type.Notes
/Library/Assets/