avocet

Circuit-Forge/avocet

Fork 0

Commit graph

Author	SHA1	Message	Date
pyr0ball	9633d9a535	feat: add failure_category field to SFT corrections (#16 ) Adds optional failure_category to SubmitRequest and candidate records so reviewers can classify why a model response was wrong, not just what to do with it. Enables the fine-tune harness to filter training data by failure type (e.g. exclude scoring artifacts, train only on genuine wrong answers). Taxonomy: scoring_artifact \| style_violation \| partial_answer \| wrong_answer \| format_error \| hallucination - app/sft.py: FailureCategory Literal type; SubmitRequest.failure_category; stored on candidate record in POST /submit correct branch - tests/test_sft.py: 3 new tests (stores value, null round-trip, 422 on invalid) - stores/sft.ts: SftFailureCategory type exported; SftQueueItem + SftLastAction updated; setLastAction accepts optional category param - SftCard.vue: chip-group selector shown during correct/discard/flag flow; two-step confirm for discard/flag reveals chips before emitting; category forwarded in all emit payloads - CorrectionsView.vue: handleCorrect/Discard/Flag accept and forward category to POST /api/sft/submit body and store.setLastAction - SftCard.test.ts: 11 new tests covering chip visibility, selection, single-active enforcement, pending-action flow, emit payloads, cancel	2026-04-08 22:10:26 -07:00
pyr0ball	03e5f9f9b4	fix: guard null failure_reason render, fix mid-quality test description - Add v-if guard on failure-reason <p> so null renders no element (not literal "null") - Clarify mid-quality test description: score is 0.4 to <0.7 (exclusive upper bound) - Add test: renders nothing for failure_reason when null (+1 → 14 SftCard tests)	2026-04-08 15:23:19 -07:00
pyr0ball	8873920b83	feat: SftCard — quality chip, prompt collapsible, action buttons, correction area slot	2026-04-08 15:19:37 -07:00

Author

SHA1

Message

Date

pyr0ball

9633d9a535

feat: add failure_category field to SFT corrections (#16 )

Adds optional failure_category to SubmitRequest and candidate records so
reviewers can classify why a model response was wrong, not just what to do
with it. Enables the fine-tune harness to filter training data by failure
type (e.g. exclude scoring artifacts, train only on genuine wrong answers).

Taxonomy: scoring_artifact | style_violation | partial_answer |
          wrong_answer | format_error | hallucination

- app/sft.py: FailureCategory Literal type; SubmitRequest.failure_category;
  stored on candidate record in POST /submit correct branch
- tests/test_sft.py: 3 new tests (stores value, null round-trip, 422 on invalid)
- stores/sft.ts: SftFailureCategory type exported; SftQueueItem + SftLastAction
  updated; setLastAction accepts optional category param
- SftCard.vue: chip-group selector shown during correct/discard/flag flow;
  two-step confirm for discard/flag reveals chips before emitting; category
  forwarded in all emit payloads
- CorrectionsView.vue: handleCorrect/Discard/Flag accept and forward category
  to POST /api/sft/submit body and store.setLastAction
- SftCard.test.ts: 11 new tests covering chip visibility, selection,
  single-active enforcement, pending-action flow, emit payloads, cancel

2026-04-08 22:10:26 -07:00

pyr0ball

03e5f9f9b4

fix: guard null failure_reason render, fix mid-quality test description

- Add v-if guard on failure-reason <p> so null renders no element (not literal "null")
- Clarify mid-quality test description: score is 0.4 to <0.7 (exclusive upper bound)
- Add test: renders nothing for failure_reason when null (+1 → 14 SftCard tests)

2026-04-08 15:23:19 -07:00

pyr0ball

8873920b83

feat: SftCard — quality chip, prompt collapsible, action buttons, correction area slot

2026-04-08 15:19:37 -07:00

3 commits