Adds optional failure_category to SubmitRequest and candidate records so
reviewers can classify why a model response was wrong, not just what to do
with it. Enables the fine-tune harness to filter training data by failure
type (e.g. exclude scoring artifacts, train only on genuine wrong answers).
Taxonomy: scoring_artifact | style_violation | partial_answer |
wrong_answer | format_error | hallucination
- app/sft.py: FailureCategory Literal type; SubmitRequest.failure_category;
stored on candidate record in POST /submit correct branch
- tests/test_sft.py: 3 new tests (stores value, null round-trip, 422 on invalid)
- stores/sft.ts: SftFailureCategory type exported; SftQueueItem + SftLastAction
updated; setLastAction accepts optional category param
- SftCard.vue: chip-group selector shown during correct/discard/flag flow;
two-step confirm for discard/flag reveals chips before emitting; category
forwarded in all emit payloads
- CorrectionsView.vue: handleCorrect/Discard/Flag accept and forward category
to POST /api/sft/submit body and store.setLastAction
- SftCard.test.ts: 11 new tests covering chip visibility, selection,
single-active enforcement, pending-action flow, emit payloads, cancel
- Add v-if guard on failure-reason <p> so null renders no element (not literal "null")
- Clarify mid-quality test description: score is 0.4 to <0.7 (exclusive upper bound)
- Add test: renders nothing for failure_reason when null (+1 → 14 SftCard tests)