avocet/tests
pyr0ball 9633d9a535 feat: add failure_category field to SFT corrections (#16)
Adds optional failure_category to SubmitRequest and candidate records so
reviewers can classify why a model response was wrong, not just what to do
with it. Enables the fine-tune harness to filter training data by failure
type (e.g. exclude scoring artifacts, train only on genuine wrong answers).

Taxonomy: scoring_artifact | style_violation | partial_answer |
          wrong_answer | format_error | hallucination

- app/sft.py: FailureCategory Literal type; SubmitRequest.failure_category;
  stored on candidate record in POST /submit correct branch
- tests/test_sft.py: 3 new tests (stores value, null round-trip, 422 on invalid)
- stores/sft.ts: SftFailureCategory type exported; SftQueueItem + SftLastAction
  updated; setLastAction accepts optional category param
- SftCard.vue: chip-group selector shown during correct/discard/flag flow;
  two-step confirm for discard/flag reveals chips before emitting; category
  forwarded in all emit payloads
- CorrectionsView.vue: handleCorrect/Discard/Flag accept and forward category
  to POST /api/sft/submit body and store.setLastAction
- SftCard.test.ts: 11 new tests covering chip visibility, selection,
  single-active enforcement, pending-action flow, emit payloads, cancel
2026-04-08 22:10:26 -07:00
..
__init__.py feat: initial avocet repo — email classifier training tool 2026-02-27 14:07:38 -08:00
test_api.py fix(avocet): narrow cancel except clause, clear stale cancel flags on new run 2026-03-15 18:13:01 -07:00
test_benchmark_classifier.py fix(avocet): guard discover_finetuned_models against malformed/incomplete training_info.json 2026-03-15 15:18:13 -07:00
test_classifier_adapters.py fix(avocet): FineTunedAdapter GPU device routing + precise body truncation test 2026-03-15 10:56:47 -07:00
test_finetune.py fix(avocet): move TorchDataset import to top; split sample_count into total+train 2026-03-15 16:02:43 -07:00
test_imap_fetch.py feat: extract IMAP logic to app/imap_fetch.py for reuse by API 2026-03-04 11:42:22 -08:00
test_label_tool.py refactor: consolidate HTML extraction into app/utils.py 2026-04-08 06:52:15 -07:00
test_sft.py feat: add failure_category field to SFT corrections (#16) 2026-04-08 22:10:26 -07:00
test_sft_import.py fix: log warning when sft record is missing id field 2026-04-08 07:30:46 -07:00