feat: domain-stratified metrics in benchmark reports #26
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context: Aggregate benchmark metrics across a mixed-domain dataset mask domain-specific failure modes. A model that scores 0.82 overall might score 0.91 on acted speech and 0.43 on naturalistic British speech — the aggregate hides the real problem. This was surfaced during SER evaluation against British comedy panel show audio.
Scope:
audio_domaintags are present on samples, break out per-domain precision/recall/F1 alongside the aggregate in benchmark reportsOut of scope: Domain tagging itself (see audio domain tagging issue). UI visualization of per-domain metrics (can follow in a separate issue).
Acceptance criteria:
benchmark runproduces per-domain metric breakdown when domain tags are presentdomain_breakdownkey, not a replacement)Related: Depends on audio domain tagging issue.
circuitforge-plans/avocet/— audio model evaluation extension.