avocet/app
pyr0ball 9bb88b168f feat(corpus): pipeline log ingest from shared dir (closes #67)
Pull-side companion to kiwi#141. Ingests structured JSONL pipeline logs
from /Library/Assets/logs/pipeline/ into the log corpus for Turnstone
logreading model training.

- app/data/log_corpus.py: add ingested_pipeline_files tracking table,
  _pipeline_ingest_dir() config helper, _ingest_one_file() parser, and
  POST /api/corpus/pipeline-ingest endpoint
- source_host = "pipeline_scrape"; source_id from logger field; extra
  dict stored as matched_patterns; batch_type = "pipeline_log"
- Idempotent by filename: skips files already in ingested_pipeline_files
- config/label_tool.yaml.example: add corpus section with pipeline_ingest_dir
  and push sources comment block
- tests/test_log_corpus.py: 8 new tests covering ingest, idempotency,
  non-JSONL filtering, malformed line resilience, incremental runs
2026-05-17 11:28:33 -07:00
..
data feat(corpus): pipeline log ingest from shared dir (closes #67) 2026-05-17 11:28:33 -07:00
eval feat: add embed-bench rate and export endpoints 2026-05-11 08:07:17 -07:00
train fix: align train job/results API envelope, config_json key, progress SSE, dashboard model_key 2026-05-02 21:22:18 -07:00
api.py feat: log corpus receiver — accept Turnstone push batches and label for logreading fine-tune 2026-05-11 17:07:54 -07:00
cforch.py fix(tests): resolve 5 pre-existing test failures on main (closes #56) 2026-05-17 11:21:58 -07:00
cloud_session.py refactor: import detect_byok from cf-core, remove local copy 2026-04-25 16:45:47 -07:00
dashboard.py feat: multi-bench dashboard, API path migration, benchmark reliability fixes 2026-05-11 09:05:12 -07:00
imap_fetch.py feat: extract fetch routes and IMAP helpers into app/data/fetch.py 2026-05-01 21:57:31 -07:00
imitate.py feat: move imitate API into app/data/imitate.py 2026-05-01 22:12:19 -07:00
models.py fix(tests): resolve 5 pre-existing test failures on main (closes #56) 2026-05-17 11:21:58 -07:00
nodes.py feat(fleet): profile editor, assignments tab, node management polish 2026-05-17 11:23:47 -07:00
plans_bench.py chore(models): refresh model registries with current cluster catalog 2026-05-17 11:24:03 -07:00
sft.py feat: move SFT corrections API into app/data/corrections.py 2026-05-01 22:02:22 -07:00
style.py refactor(bench): extract benchmark tabs — classifier, compare, llm-eval, style, voice 2026-04-24 14:56:17 -07:00
utils.py fix: restore ensure_ascii=False in utils jsonl helpers; remove dead _last_action from api.py 2026-05-01 20:59:44 -07:00
voice.py refactor(bench): extract benchmark tabs — classifier, compare, llm-eval, style, voice 2026-04-24 14:56:17 -07:00