{{ stats.total.toLocaleString() }} emails labeled total
| Model | {{ m.label }} |
|---|---|
| {{ row.name }} | {{ formatMetric(row.result[m.key]) }} |
Highlighted cells are the best-scoring model per metric.
| Model | overall | {{ col }} | tok/s |
|---|---|---|---|
| {{ row.model_name }} | {{ llmPct(row.avg_quality_score) }} | {{ row.quality_by_task_type[col] != null ? llmPct(row.quality_by_task_type[col]) : 'β' }} | {{ row.avg_tokens_per_sec.toFixed(1) }} |
Run LLM Eval on the Benchmark tab to refresh. Highlighted = best per column.
data/email_score.jsonl
{{ fileSizeLabel }}