feat: SFT failure_category — classify why a model response was wrong #17
Loading…
Reference in a new issue
No description provided.
Delete branch "feat/sft-failure-category"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Closes #16
Summary
failure_categoryfield (6-value Literal enum) toSubmitRequestand stored candidate recordsscoring_artifact,style_violation,partial_answer,wrong_answer,format_error,hallucinationTest plan
failure_category: nullin exportfailure_category: "nonsense"directly to/api/sft/submit— verify 422Adds optional failure_category to SubmitRequest and candidate records so reviewers can classify why a model response was wrong, not just what to do with it. Enables the fine-tune harness to filter training data by failure type (e.g. exclude scoring artifacts, train only on genuine wrong answers). Taxonomy: scoring_artifact | style_violation | partial_answer | wrong_answer | format_error | hallucination - app/sft.py: FailureCategory Literal type; SubmitRequest.failure_category; stored on candidate record in POST /submit correct branch - tests/test_sft.py: 3 new tests (stores value, null round-trip, 422 on invalid) - stores/sft.ts: SftFailureCategory type exported; SftQueueItem + SftLastAction updated; setLastAction accepts optional category param - SftCard.vue: chip-group selector shown during correct/discard/flag flow; two-step confirm for discard/flag reveals chips before emitting; category forwarded in all emit payloads - CorrectionsView.vue: handleCorrect/Discard/Flag accept and forward category to POST /api/sft/submit body and store.setLastAction - SftCard.test.ts: 11 new tests covering chip visibility, selection, single-active enforcement, pending-action flow, emit payloads, cancel