SFT corrections: add failure_category field for richer candidate classification #16

New issue

Closed

opened 2026-04-08 21:43:15 -07:00 by pyr0ball · 0 comments

pyr0ball commented

2026-04-08 21:43:15 -07:00

Owner

Context: The current Corrections tab only captures what to do with a candidate (correct/discard/flag), not why it failed. During the first live benchmark run we identified at least two distinct failure modes that need to be distinguished for useful SFT training data: scoring artifacts (model was right, benchmark pattern was too strict) and style violations (correct answer but violated instruction constraints like verbosity or unsolicited advice). Without categorization, all corrections are mixed together and the fine-tune harness can't filter by failure type.

Scope:

Add optional failure_category field to SubmitRequest Pydantic model (app/sft.py)
Store failure_category on the candidate record when submitting
Include failure_category in the SFT export JSONL
Add category selector to SftCard.vue (dropdown or chip group, shown before submitting)
Update SftQueueItem TypeScript interface to include the field
Update useSftStore submit action to pass category
Add tests for new field (backend + frontend)

Failure category taxonomy:

scoring_artifact — model was right, benchmark pattern too strict
style_violation — correct answer but violated instruction constraints (verbosity, format, unsolicited advice)
partial_answer — right direction, missing key elements
wrong_answer — genuinely incorrect reasoning
format_error — correct content but wrong output structure
hallucination — invented facts

Out of scope: Changing how the benchmark harness scores candidates; adding category-based filtering to the export endpoint (follow-on).

Acceptance criteria:

Reviewer can optionally select a failure category before submitting a correction
Category is stored on the candidate record and appears in exported JSONL
Null/omitted category is valid (field is optional, existing records unaffected)
Backend and frontend tests cover the new field

Related: Builds on feat/sft-corrections (PR #15); discovered during first ollama benchmark run 2026-04-08

**Context:** The current Corrections tab only captures *what to do* with a candidate (correct/discard/flag), not *why it failed*. During the first live benchmark run we identified at least two distinct failure modes that need to be distinguished for useful SFT training data: scoring artifacts (model was right, benchmark pattern was too strict) and style violations (correct answer but violated instruction constraints like verbosity or unsolicited advice). Without categorization, all corrections are mixed together and the fine-tune harness can't filter by failure type. **Scope:** - [ ] Add optional `failure_category` field to `SubmitRequest` Pydantic model (`app/sft.py`) - [ ] Store `failure_category` on the candidate record when submitting - [ ] Include `failure_category` in the SFT export JSONL - [ ] Add category selector to `SftCard.vue` (dropdown or chip group, shown before submitting) - [ ] Update `SftQueueItem` TypeScript interface to include the field - [ ] Update `useSftStore` submit action to pass category - [ ] Add tests for new field (backend + frontend) **Failure category taxonomy:** - `scoring_artifact` — model was right, benchmark pattern too strict - `style_violation` — correct answer but violated instruction constraints (verbosity, format, unsolicited advice) - `partial_answer` — right direction, missing key elements - `wrong_answer` — genuinely incorrect reasoning - `format_error` — correct content but wrong output structure - `hallucination` — invented facts **Out of scope:** Changing how the benchmark harness scores candidates; adding category-based filtering to the export endpoint (follow-on). **Acceptance criteria:** - Reviewer can optionally select a failure category before submitting a correction - Category is stored on the candidate record and appears in exported JSONL - Null/omitted category is valid (field is optional, existing records unaffected) - Backend and frontend tests cover the new field **Related:** Builds on `feat/sft-corrections` (PR #15); discovered during first ollama benchmark run 2026-04-08