Compare commits

...

No commits in common. "main" and "feat/vue-label-tab" have entirely different histories.

34 changed files with 4382 additions and 4723 deletions

6
.gitignore vendored
View file

@ -11,12 +11,6 @@ config/label_tool.yaml
data/email_score.jsonl
data/email_label_queue.jsonl
data/email_compare_sample.jsonl
data/sft_candidates.jsonl
data/sft_approved.jsonl
# Conda/pip artifacts
.env
# Claude context — BSL 1.1, keep out of version control
CLAUDE.md
docs/superpowers/

173
CLAUDE.md Normal file
View file

@ -0,0 +1,173 @@
# Avocet — Email Classifier Training Tool
## What it is
Shared infrastructure for building and benchmarking email classifiers across the CircuitForge menagerie.
Named for the avocet's sweeping-bill technique — it sweeps through email streams and filters out categories.
**Pipeline:**
```
Scrape (IMAP, wide search, multi-account) → data/email_label_queue.jsonl
Label (card-stack UI) → data/email_score.jsonl
Benchmark (HuggingFace NLI/reranker) → per-model macro-F1 + latency
```
## Environment
- Python env: `conda run -n job-seeker <cmd>` for basic use (streamlit, yaml, stdlib only)
- Classifier env: `conda run -n job-seeker-classifiers <cmd>` for benchmark (transformers, FlagEmbedding, gliclass)
- Run tests: `/devl/miniconda3/envs/job-seeker/bin/pytest tests/ -v`
(direct binary — `conda run pytest` can spawn runaway processes)
- Create classifier env: `conda env create -f environment.yml`
## Label Tool (app/label_tool.py)
Card-stack Streamlit UI for manually labeling recruitment emails.
```
conda run -n job-seeker streamlit run app/label_tool.py --server.port 8503
```
- Config: `config/label_tool.yaml` (gitignored — copy from `.example`, or use ⚙️ Settings tab)
- Queue: `data/email_label_queue.jsonl` (gitignored)
- Output: `data/email_score.jsonl` (gitignored)
- Four tabs: 🃏 Label, 📥 Fetch, 📊 Stats, ⚙️ Settings
- Keyboard shortcuts: 19 = label, 0 = Other (wildcard, prompts free-text input), S = skip, U = undo
- Dedup: MD5 of `(subject + body[:100])` — cross-account safe
### Settings Tab (⚙️)
- Add / edit / remove IMAP accounts via form UI — no manual YAML editing required
- Per-account fields: display name, host, port, SSL toggle, username, password (masked), folder, days back
- **🔌 Test connection** button per account — connects, logs in, selects folder, reports message count
- Global: max emails per account per fetch
- **💾 Save** writes `config/label_tool.yaml`; **↩ Reload** discards unsaved changes
- `_sync_settings_to_state()` collects widget values before any add/remove to avoid index-key drift
## Benchmark (scripts/benchmark_classifier.py)
```
# List available models
conda run -n job-seeker-classifiers python scripts/benchmark_classifier.py --list-models
# Score against labeled JSONL
conda run -n job-seeker-classifiers python scripts/benchmark_classifier.py --score
# Visual comparison on live IMAP emails
conda run -n job-seeker-classifiers python scripts/benchmark_classifier.py --compare --limit 20
# Include slow/large models
conda run -n job-seeker-classifiers python scripts/benchmark_classifier.py --score --include-slow
# Export DB-labeled emails (⚠️ LLM-generated labels — review first)
conda run -n job-seeker-classifiers python scripts/benchmark_classifier.py --export-db --db /path/to/staging.db
```
## Labels (peregrine defaults — configurable per product)
| Label | Key | Meaning |
|-------|-----|---------|
| `interview_scheduled` | 1 | Phone screen, video call, or on-site invitation |
| `offer_received` | 2 | Formal job offer or offer letter |
| `rejected` | 3 | Application declined or not moving forward |
| `positive_response` | 4 | Recruiter interest or request to connect |
| `survey_received` | 5 | Culture-fit survey or assessment invitation |
| `neutral` | 6 | ATS confirmation (application received, etc.) |
| `event_rescheduled` | 7 | Interview or event moved to a new time |
| `digest` | 8 | Job digest or multi-listing email (scrapeable) |
| `new_lead` | 9 | Unsolicited recruiter outreach or cold contact |
| `hired` | h | Offer accepted, onboarding, welcome email, start date |
## Model Registry (13 models, 7 defaults)
See `scripts/benchmark_classifier.py:MODEL_REGISTRY`.
Default models run without `--include-slow`.
Add `--models deberta-small deberta-small-2pass` to test a specific subset.
## Config Files
- `config/label_tool.yaml` — gitignored; multi-account IMAP config
- `config/label_tool.yaml.example` — committed template
## Data Files
- `data/email_score.jsonl` — gitignored; manually-labeled ground truth
- `data/email_score.jsonl.example` — committed sample for CI
- `data/email_label_queue.jsonl` — gitignored; IMAP fetch queue
## Key Design Notes
- `ZeroShotAdapter.load()` instantiates the pipeline object; `classify()` calls the object.
Tests patch `scripts.classifier_adapters.pipeline` (the module-level factory) with a
two-level mock: `mock_factory.return_value = MagicMock(return_value={...})`.
- `two_pass=True` on ZeroShotAdapter: first pass ranks all 6 labels; second pass re-runs
with only top-2, forcing a binary choice. 2× cost, better confidence.
- `--compare` uses the first account in `label_tool.yaml` for live IMAP emails.
- DB export labels are llama3.1:8b-generated — treat as noisy, not gold truth.
## Vue Label UI (app/api.py + web/)
FastAPI on port 8503 serves both the REST API and the built Vue SPA (`web/dist/`).
```
./manage.sh start-api # build Vue SPA + start FastAPI (binds 0.0.0.0:8503 — LAN accessible)
./manage.sh stop-api
./manage.sh open-api # xdg-open http://localhost:8503
```
Logs: `log/api.log`
## Email Field Schema — IMPORTANT
Two schemas exist. The normalization layer in `app/api.py` bridges them automatically.
### JSONL on-disk schema (written by `label_tool.py` and `label_tool.py`'s IMAP fetch)
| Field | Type | Notes |
|-------|------|-------|
| `subject` | str | Email subject line |
| `body` | str | Plain-text body, truncated at 800 chars; HTML stripped by `_strip_html()` |
| `from_addr` | str | Sender address string (`"Name <addr>"`) |
| `date` | str | Raw RFC 2822 date string |
| `account` | str | Display name of the IMAP account that fetched it |
| *(no `id`)* | — | Dedup key is MD5 of `(subject + body[:100])` — never stored on disk |
### Vue API schema (returned by `GET /api/queue`, required by POST endpoints)
| Field | Type | Notes |
|-------|------|-------|
| `id` | str | MD5 content hash, or stored `id` if item has one |
| `subject` | str | Unchanged |
| `body` | str | Unchanged |
| `from` | str | Mapped from `from_addr` (or `from` if already present) |
| `date` | str | Unchanged |
| `source` | str | Mapped from `account` (or `source` if already present) |
### Normalization layer (`_normalize()` in `app/api.py`)
`_normalize(item)` handles the mapping and ID generation. All `GET /api/queue` responses
pass through it. Mutating endpoints (`/api/label`, `/api/skip`, `/api/discard`) look up
items via `_normalize(x)["id"]`, so both real data (no `id`, uses content hash) and test
fixtures (explicit `id` field) work transparently.
### Peregrine integration
Peregrine's `staging.db` uses different field names again:
| staging.db column | Maps to avocet JSONL field |
|-------------------|---------------------------|
| `subject` | `subject` |
| `body` | `body` (may contain HTML — run through `_strip_html()` before queuing) |
| `from_address` | `from_addr` |
| `received_date` | `date` |
| `account` or source context | `account` |
When exporting from Peregrine's DB for avocet labeling, transform to the JSONL schema above
(not the Vue API schema). The `--export-db` flag in `benchmark_classifier.py` does this.
Any new export path should also call `_strip_html()` on the body before writing.
## Relationship to Peregrine
Avocet started as `peregrine/tools/label_tool.py` + `peregrine/scripts/classifier_adapters.py`.
Peregrine retains copies during stabilization; once avocet is proven, peregrine will import from here.

View file

@ -142,13 +142,6 @@ def _normalize(item: dict) -> dict:
app = FastAPI(title="Avocet API")
from app.sft import router as sft_router
app.include_router(sft_router, prefix="/api/sft")
from app.models import router as models_router
import app.models as _models_module
app.include_router(models_router, prefix="/api/models")
# In-memory last-action store (single user, local tool — in-memory is fine)
_last_action: dict | None = None
@ -302,18 +295,10 @@ def get_stats():
lbl = r.get("label", "")
if lbl:
counts[lbl] = counts.get(lbl, 0) + 1
benchmark_results: dict = {}
benchmark_path = _DATA_DIR / "benchmark_results.json"
if benchmark_path.exists():
try:
benchmark_results = json.loads(benchmark_path.read_text(encoding="utf-8"))
except Exception:
pass
return {
"total": len(records),
"counts": counts,
"score_file_bytes": _score_file().stat().st_size if _score_file().exists() else 0,
"benchmark_results": benchmark_results,
}
@ -348,36 +333,6 @@ from fastapi.responses import StreamingResponse
# Benchmark endpoints
# ---------------------------------------------------------------------------
@app.get("/api/benchmark/models")
def get_benchmark_models() -> dict:
"""Return installed models grouped by adapter_type category."""
models_dir: Path = _models_module._MODELS_DIR
categories: dict[str, list[dict]] = {
"ZeroShotAdapter": [],
"RerankerAdapter": [],
"GenerationAdapter": [],
"Unknown": [],
}
if models_dir.exists():
for sub in models_dir.iterdir():
if not sub.is_dir():
continue
info_path = sub / "model_info.json"
adapter_type = "Unknown"
repo_id: str | None = None
if info_path.exists():
try:
info = json.loads(info_path.read_text(encoding="utf-8"))
adapter_type = info.get("adapter_type") or info.get("adapter_recommendation") or "Unknown"
repo_id = info.get("repo_id")
except Exception:
pass
bucket = adapter_type if adapter_type in categories else "Unknown"
entry: dict = {"name": sub.name, "repo_id": repo_id, "adapter_type": adapter_type}
categories[bucket].append(entry)
return {"categories": categories}
@app.get("/api/benchmark/results")
def get_benchmark_results():
"""Return the most recently saved benchmark results, or an empty envelope."""
@ -388,17 +343,13 @@ def get_benchmark_results():
@app.get("/api/benchmark/run")
def run_benchmark(include_slow: bool = False, model_names: str = ""):
def run_benchmark(include_slow: bool = False):
"""Spawn the benchmark script and stream stdout as SSE progress events."""
python_bin = "/devl/miniconda3/envs/job-seeker-classifiers/bin/python"
script = str(_ROOT / "scripts" / "benchmark_classifier.py")
cmd = [python_bin, script, "--score", "--save"]
if include_slow:
cmd.append("--include-slow")
if model_names:
names = [n.strip() for n in model_names.split(",") if n.strip()]
if names:
cmd.extend(["--models"] + names)
def generate():
try:

View file

@ -1,6 +1,6 @@
"""Avocet — IMAP fetch utilities.
Shared between app/api.py (FastAPI SSE endpoint) and the label UI.
Shared between app/api.py (FastAPI SSE endpoint) and app/label_tool.py (Streamlit).
No Streamlit imports here stdlib + imaplib only.
"""
from __future__ import annotations
@ -8,11 +8,36 @@ from __future__ import annotations
import email as _email_lib
import hashlib
import imaplib
import re
from datetime import datetime, timedelta
from email.header import decode_header as _raw_decode
from html.parser import HTMLParser
from typing import Any, Iterator
from app.utils import extract_body, strip_html # noqa: F401 (strip_html re-exported for callers)
# ── HTML → plain text ────────────────────────────────────────────────────────
class _TextExtractor(HTMLParser):
def __init__(self):
super().__init__()
self._parts: list[str] = []
def handle_data(self, data: str) -> None:
stripped = data.strip()
if stripped:
self._parts.append(stripped)
def get_text(self) -> str:
return " ".join(self._parts)
def strip_html(html_str: str) -> str:
try:
ex = _TextExtractor()
ex.feed(html_str)
return ex.get_text()
except Exception:
return re.sub(r"<[^>]+>", " ", html_str).strip()
# ── IMAP decode helpers ───────────────────────────────────────────────────────
@ -30,6 +55,37 @@ def _decode_str(value: str | None) -> str:
return " ".join(out).strip()
def _extract_body(msg: Any) -> str:
if msg.is_multipart():
html_fallback: str | None = None
for part in msg.walk():
ct = part.get_content_type()
if ct == "text/plain":
try:
charset = part.get_content_charset() or "utf-8"
return part.get_payload(decode=True).decode(charset, errors="replace")
except Exception:
pass
elif ct == "text/html" and html_fallback is None:
try:
charset = part.get_content_charset() or "utf-8"
raw = part.get_payload(decode=True).decode(charset, errors="replace")
html_fallback = strip_html(raw)
except Exception:
pass
return html_fallback or ""
else:
try:
charset = msg.get_content_charset() or "utf-8"
raw = msg.get_payload(decode=True).decode(charset, errors="replace")
if msg.get_content_type() == "text/html":
return strip_html(raw)
return raw
except Exception:
pass
return ""
def entry_key(e: dict) -> str:
"""Stable MD5 content-hash for dedup — matches label_tool.py _entry_key."""
key = (e.get("subject", "") + (e.get("body", "") or "")[:100])
@ -137,7 +193,7 @@ def fetch_account_stream(
subj = _decode_str(msg.get("Subject", ""))
from_addr = _decode_str(msg.get("From", ""))
date = _decode_str(msg.get("Date", ""))
body = extract_body(msg)[:800]
body = _extract_body(msg)[:800]
entry = {"subject": subj, "body": body, "from_addr": from_addr,
"date": date, "account": name}
k = entry_key(entry)

1186
app/label_tool.py Normal file

File diff suppressed because it is too large Load diff

View file

@ -1,428 +0,0 @@
"""Avocet — HF model lifecycle API.
Handles model metadata lookup, approval queue, download with progress,
and installed model management.
All endpoints are registered on `router` (a FastAPI APIRouter).
api.py includes this router with prefix="/api/models".
Module-level globals (_MODELS_DIR, _QUEUE_DIR) follow the same
testability pattern as sft.py override them via set_models_dir() and
set_queue_dir() in test fixtures.
"""
from __future__ import annotations
import json
import logging
import shutil
import threading
from datetime import datetime, timezone
from pathlib import Path
from typing import Any
from uuid import uuid4
import httpx
from fastapi import APIRouter, HTTPException
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
from app.utils import read_jsonl, write_jsonl
try:
from huggingface_hub import snapshot_download
except ImportError: # pragma: no cover
snapshot_download = None # type: ignore[assignment]
logger = logging.getLogger(__name__)
_ROOT = Path(__file__).parent.parent
_MODELS_DIR: Path = _ROOT / "models"
_QUEUE_DIR: Path = _ROOT / "data"
router = APIRouter()
# ── Download progress shared state ────────────────────────────────────────────
# Updated by the background download thread; read by GET /download/stream.
_download_progress: dict[str, Any] = {}
# ── HF pipeline_tag → adapter recommendation ──────────────────────────────────
_TAG_TO_ADAPTER: dict[str, str] = {
"zero-shot-classification": "ZeroShotAdapter",
"text-classification": "ZeroShotAdapter",
"natural-language-inference": "ZeroShotAdapter",
"sentence-similarity": "RerankerAdapter",
"text-ranking": "RerankerAdapter",
"text-generation": "GenerationAdapter",
"text2text-generation": "GenerationAdapter",
}
# ── Testability seams ──────────────────────────────────────────────────────────
def set_models_dir(path: Path) -> None:
global _MODELS_DIR
_MODELS_DIR = path
def set_queue_dir(path: Path) -> None:
global _QUEUE_DIR
_QUEUE_DIR = path
# ── Internal helpers ───────────────────────────────────────────────────────────
def _queue_file() -> Path:
return _QUEUE_DIR / "model_queue.jsonl"
def _read_queue() -> list[dict]:
return read_jsonl(_queue_file())
def _write_queue(records: list[dict]) -> None:
write_jsonl(_queue_file(), records)
def _safe_model_name(repo_id: str) -> str:
"""Convert repo_id to a filesystem-safe directory name (HF convention)."""
return repo_id.replace("/", "--")
def _is_installed(repo_id: str) -> bool:
"""Check if a model is already downloaded in _MODELS_DIR."""
safe_name = _safe_model_name(repo_id)
model_dir = _MODELS_DIR / safe_name
return model_dir.exists() and (
(model_dir / "config.json").exists()
or (model_dir / "training_info.json").exists()
or (model_dir / "model_info.json").exists()
)
def _is_queued(repo_id: str) -> bool:
"""Check if repo_id is already in the queue (non-dismissed)."""
for entry in _read_queue():
if entry.get("repo_id") == repo_id and entry.get("status") != "dismissed":
return True
return False
def _update_queue_entry(entry_id: str, updates: dict) -> dict | None:
"""Update a queue entry by id. Returns updated entry or None if not found."""
records = _read_queue()
for i, r in enumerate(records):
if r.get("id") == entry_id:
records[i] = {**r, **updates}
_write_queue(records)
return records[i]
return None
def _get_queue_entry(entry_id: str) -> dict | None:
for r in _read_queue():
if r.get("id") == entry_id:
return r
return None
# ── Background download ────────────────────────────────────────────────────────
def _run_download(entry_id: str, repo_id: str, pipeline_tag: str | None, adapter_recommendation: str | None) -> None:
"""Background thread: download model via huggingface_hub.snapshot_download."""
global _download_progress
safe_name = _safe_model_name(repo_id)
local_dir = _MODELS_DIR / safe_name
_download_progress = {
"active": True,
"repo_id": repo_id,
"downloaded_bytes": 0,
"total_bytes": 0,
"pct": 0.0,
"done": False,
"error": None,
}
try:
if snapshot_download is None:
raise RuntimeError("huggingface_hub is not installed")
snapshot_download(
repo_id=repo_id,
local_dir=str(local_dir),
)
# Write model_info.json alongside downloaded files
model_info = {
"repo_id": repo_id,
"pipeline_tag": pipeline_tag,
"adapter_recommendation": adapter_recommendation,
"downloaded_at": datetime.now(timezone.utc).isoformat(),
}
local_dir.mkdir(parents=True, exist_ok=True)
(local_dir / "model_info.json").write_text(
json.dumps(model_info, indent=2), encoding="utf-8"
)
_download_progress["done"] = True
_download_progress["pct"] = 100.0
_update_queue_entry(entry_id, {"status": "ready"})
except Exception as exc:
logger.exception("Download failed for %s: %s", repo_id, exc)
_download_progress["error"] = str(exc)
_download_progress["done"] = True
_update_queue_entry(entry_id, {"status": "failed", "error": str(exc)})
finally:
_download_progress["active"] = False
# ── GET /lookup ────────────────────────────────────────────────────────────────
@router.get("/lookup")
def lookup_model(repo_id: str) -> dict:
"""Validate repo_id and fetch metadata from the HF API."""
# Validate: must contain exactly one '/', no whitespace
if "/" not in repo_id or any(c.isspace() for c in repo_id):
raise HTTPException(422, f"Invalid repo_id {repo_id!r}: must be 'owner/model-name' with no whitespace")
hf_url = f"https://huggingface.co/api/models/{repo_id}"
try:
resp = httpx.get(hf_url, timeout=10.0)
except httpx.RequestError as exc:
raise HTTPException(502, f"Network error reaching HuggingFace API: {exc}") from exc
if resp.status_code == 404:
raise HTTPException(404, f"Model {repo_id!r} not found on HuggingFace")
if resp.status_code != 200:
raise HTTPException(502, f"HuggingFace API returned status {resp.status_code}")
data = resp.json()
pipeline_tag = data.get("pipeline_tag")
adapter_recommendation = _TAG_TO_ADAPTER.get(pipeline_tag) if pipeline_tag else None
if pipeline_tag and adapter_recommendation is None:
logger.warning("Unknown pipeline_tag %r for %s — no adapter recommendation", pipeline_tag, repo_id)
# Estimate model size from siblings list
siblings = data.get("siblings") or []
model_size_bytes: int = sum(s.get("size", 0) for s in siblings if isinstance(s, dict))
# Description: first 300 chars of card data (modelId field used as fallback)
card_data = data.get("cardData") or {}
description_raw = card_data.get("description") or data.get("modelId") or ""
description = description_raw[:300] if description_raw else ""
return {
"repo_id": repo_id,
"pipeline_tag": pipeline_tag,
"adapter_recommendation": adapter_recommendation,
"model_size_bytes": model_size_bytes,
"description": description,
"tags": data.get("tags") or [],
"downloads": data.get("downloads") or 0,
"already_installed": _is_installed(repo_id),
"already_queued": _is_queued(repo_id),
}
# ── GET /queue ─────────────────────────────────────────────────────────────────
@router.get("/queue")
def get_queue() -> list[dict]:
"""Return all non-dismissed queue entries sorted newest-first."""
records = _read_queue()
active = [r for r in records if r.get("status") != "dismissed"]
return sorted(active, key=lambda r: r.get("queued_at", ""), reverse=True)
# ── POST /queue ────────────────────────────────────────────────────────────────
class QueueAddRequest(BaseModel):
repo_id: str
pipeline_tag: str | None = None
adapter_recommendation: str | None = None
@router.post("/queue", status_code=201)
def add_to_queue(req: QueueAddRequest) -> dict:
"""Add a model to the approval queue with status 'pending'."""
if _is_installed(req.repo_id):
raise HTTPException(409, f"{req.repo_id!r} is already installed")
if _is_queued(req.repo_id):
raise HTTPException(409, f"{req.repo_id!r} is already in the queue")
entry = {
"id": str(uuid4()),
"repo_id": req.repo_id,
"pipeline_tag": req.pipeline_tag,
"adapter_recommendation": req.adapter_recommendation,
"status": "pending",
"queued_at": datetime.now(timezone.utc).isoformat(),
}
records = _read_queue()
records.append(entry)
_write_queue(records)
return entry
# ── POST /queue/{id}/approve ───────────────────────────────────────────────────
@router.post("/queue/{entry_id}/approve")
def approve_queue_entry(entry_id: str) -> dict:
"""Approve a pending queue entry and start background download."""
entry = _get_queue_entry(entry_id)
if entry is None:
raise HTTPException(404, f"Queue entry {entry_id!r} not found")
if entry.get("status") != "pending":
raise HTTPException(409, f"Entry is not in pending state (current: {entry.get('status')!r})")
updated = _update_queue_entry(entry_id, {"status": "downloading"})
thread = threading.Thread(
target=_run_download,
args=(entry_id, entry["repo_id"], entry.get("pipeline_tag"), entry.get("adapter_recommendation")),
daemon=True,
name=f"model-download-{entry_id}",
)
thread.start()
return {"ok": True}
# ── DELETE /queue/{id} ─────────────────────────────────────────────────────────
@router.delete("/queue/{entry_id}")
def dismiss_queue_entry(entry_id: str) -> dict:
"""Dismiss (soft-delete) a queue entry."""
entry = _get_queue_entry(entry_id)
if entry is None:
raise HTTPException(404, f"Queue entry {entry_id!r} not found")
_update_queue_entry(entry_id, {"status": "dismissed"})
return {"ok": True}
# ── GET /download/stream ───────────────────────────────────────────────────────
@router.get("/download/stream")
def download_stream() -> StreamingResponse:
"""SSE stream of download progress. Yields one idle event if no download active."""
def generate():
prog = _download_progress
if not prog.get("active") and not (prog.get("done") and not prog.get("error")):
yield f"data: {json.dumps({'type': 'idle'})}\n\n"
return
if prog.get("done"):
if prog.get("error"):
yield f"data: {json.dumps({'type': 'error', 'error': prog['error']})}\n\n"
else:
yield f"data: {json.dumps({'type': 'done', 'repo_id': prog.get('repo_id')})}\n\n"
return
# Stream live progress
import time
while True:
p = dict(_download_progress)
if p.get("done"):
if p.get("error"):
yield f"data: {json.dumps({'type': 'error', 'error': p['error']})}\n\n"
else:
yield f"data: {json.dumps({'type': 'done', 'repo_id': p.get('repo_id')})}\n\n"
break
event = json.dumps({
"type": "progress",
"repo_id": p.get("repo_id"),
"downloaded_bytes": p.get("downloaded_bytes", 0),
"total_bytes": p.get("total_bytes", 0),
"pct": p.get("pct", 0.0),
})
yield f"data: {event}\n\n"
time.sleep(0.5)
return StreamingResponse(
generate(),
media_type="text/event-stream",
headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
)
# ── GET /installed ─────────────────────────────────────────────────────────────
@router.get("/installed")
def list_installed() -> list[dict]:
"""Scan _MODELS_DIR and return info on each installed model."""
if not _MODELS_DIR.exists():
return []
results: list[dict] = []
for sub in _MODELS_DIR.iterdir():
if not sub.is_dir():
continue
has_training_info = (sub / "training_info.json").exists()
has_config = (sub / "config.json").exists()
has_model_info = (sub / "model_info.json").exists()
if not (has_training_info or has_config or has_model_info):
continue
model_type = "finetuned" if has_training_info else "downloaded"
# Compute directory size
size_bytes = sum(f.stat().st_size for f in sub.rglob("*") if f.is_file())
# Load adapter/model_id from model_info.json or training_info.json
adapter: str | None = None
model_id: str | None = None
if has_model_info:
try:
info = json.loads((sub / "model_info.json").read_text(encoding="utf-8"))
adapter = info.get("adapter_recommendation")
model_id = info.get("repo_id")
except Exception:
pass
elif has_training_info:
try:
info = json.loads((sub / "training_info.json").read_text(encoding="utf-8"))
adapter = info.get("adapter")
model_id = info.get("base_model") or info.get("model_id")
except Exception:
pass
results.append({
"name": sub.name,
"path": str(sub),
"type": model_type,
"adapter": adapter,
"size_bytes": size_bytes,
"model_id": model_id,
})
return results
# ── DELETE /installed/{name} ───────────────────────────────────────────────────
@router.delete("/installed/{name}")
def delete_installed(name: str) -> dict:
"""Remove an installed model directory by name. Blocks path traversal."""
# Validate: single path component, no slashes or '..'
if "/" in name or "\\" in name or ".." in name or not name or name.startswith("."):
raise HTTPException(400, f"Invalid model name {name!r}: must be a single directory name with no path separators or '..'")
model_path = _MODELS_DIR / name
# Extra safety: confirm resolved path is inside _MODELS_DIR
try:
model_path.resolve().relative_to(_MODELS_DIR.resolve())
except ValueError:
raise HTTPException(400, f"Path traversal detected for name {name!r}")
if not model_path.exists():
raise HTTPException(404, f"Installed model {name!r} not found")
shutil.rmtree(model_path)
return {"ok": True}

View file

@ -1,326 +0,0 @@
"""Avocet — SFT candidate import and correction API.
All endpoints are registered on `router` (a FastAPI APIRouter).
api.py includes this router with prefix="/api/sft".
Module-level globals (_SFT_DATA_DIR, _SFT_CONFIG_DIR) follow the same
testability pattern as api.py override them via set_sft_data_dir() and
set_sft_config_dir() in test fixtures.
"""
from __future__ import annotations
import json
import logging
from datetime import datetime, timezone
from pathlib import Path
from typing import Literal
import yaml
from fastapi import APIRouter, HTTPException
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
from app.utils import append_jsonl, read_jsonl, write_jsonl
logger = logging.getLogger(__name__)
_ROOT = Path(__file__).parent.parent
_SFT_DATA_DIR: Path = _ROOT / "data"
_SFT_CONFIG_DIR: Path | None = None
router = APIRouter()
# ── Testability seams ──────────────────────────────────────────────────────
def set_sft_data_dir(path: Path) -> None:
global _SFT_DATA_DIR
_SFT_DATA_DIR = path
def set_sft_config_dir(path: Path | None) -> None:
global _SFT_CONFIG_DIR
_SFT_CONFIG_DIR = path
# ── Internal helpers ───────────────────────────────────────────────────────
def _config_file() -> Path:
if _SFT_CONFIG_DIR is not None:
return _SFT_CONFIG_DIR / "label_tool.yaml"
return _ROOT / "config" / "label_tool.yaml"
def _get_bench_results_dir() -> Path:
f = _config_file()
if not f.exists():
return Path("/nonexistent-bench-results")
try:
raw = yaml.safe_load(f.read_text(encoding="utf-8")) or {}
except yaml.YAMLError as exc:
logger.warning("Failed to parse SFT config %s: %s", f, exc)
return Path("/nonexistent-bench-results")
d = raw.get("sft", {}).get("bench_results_dir", "")
return Path(d) if d else Path("/nonexistent-bench-results")
def _candidates_file() -> Path:
return _SFT_DATA_DIR / "sft_candidates.jsonl"
def _approved_file() -> Path:
return _SFT_DATA_DIR / "sft_approved.jsonl"
def _read_candidates() -> list[dict]:
return read_jsonl(_candidates_file())
def _write_candidates(records: list[dict]) -> None:
write_jsonl(_candidates_file(), records)
def _is_exportable(r: dict) -> bool:
"""Return True if an approved record is ready to include in SFT export."""
return (
r.get("status") == "approved"
and bool(r.get("corrected_response"))
and str(r["corrected_response"]).strip() != ""
)
# ── GET /runs ──────────────────────────────────────────────────────────────
@router.get("/runs")
def get_runs():
"""List available benchmark runs in the configured bench_results_dir."""
from scripts.sft_import import discover_runs
bench_dir = _get_bench_results_dir()
existing = _read_candidates()
# benchmark_run_id in each record equals the run's directory name by cf-orch convention
imported_run_ids = {
r["benchmark_run_id"]
for r in existing
if r.get("benchmark_run_id") is not None
}
runs = discover_runs(bench_dir)
return [
{
"run_id": r["run_id"],
"timestamp": r["timestamp"],
"candidate_count": r["candidate_count"],
"already_imported": r["run_id"] in imported_run_ids,
}
for r in runs
]
# ── POST /import ───────────────────────────────────────────────────────────
class ImportRequest(BaseModel):
run_id: str
@router.post("/import")
def post_import(req: ImportRequest):
"""Import one benchmark run's sft_candidates.jsonl into the local data dir."""
from scripts.sft_import import discover_runs, import_run
bench_dir = _get_bench_results_dir()
runs = discover_runs(bench_dir)
run = next((r for r in runs if r["run_id"] == req.run_id), None)
if run is None:
raise HTTPException(404, f"Run {req.run_id!r} not found in bench_results_dir")
return import_run(run["sft_path"], _SFT_DATA_DIR)
# ── GET /queue ─────────────────────────────────────────────────────────────
@router.get("/queue")
def get_queue(page: int = 1, per_page: int = 20):
"""Return paginated needs_review candidates."""
records = _read_candidates()
pending = [r for r in records if r.get("status") == "needs_review"]
start = (page - 1) * per_page
return {
"items": pending[start:start + per_page],
"total": len(pending),
"page": page,
"per_page": per_page,
}
# ── POST /submit ───────────────────────────────────────────────────────────
FailureCategory = Literal[
"scoring_artifact",
"style_violation",
"partial_answer",
"wrong_answer",
"format_error",
"hallucination",
]
class SubmitRequest(BaseModel):
id: str
action: Literal["correct", "discard", "flag"]
corrected_response: str | None = None
failure_category: FailureCategory | None = None
@router.post("/submit")
def post_submit(req: SubmitRequest):
"""Record a reviewer decision for one SFT candidate."""
if req.action == "correct":
if not req.corrected_response or not req.corrected_response.strip():
raise HTTPException(422, "corrected_response must be non-empty when action is 'correct'")
records = _read_candidates()
idx = next((i for i, r in enumerate(records) if r.get("id") == req.id), None)
if idx is None:
raise HTTPException(404, f"Record {req.id!r} not found")
record = records[idx]
if record.get("status") != "needs_review":
raise HTTPException(409, f"Record is not in needs_review state (current: {record.get('status')})")
if req.action == "correct":
records[idx] = {
**record,
"status": "approved",
"corrected_response": req.corrected_response,
"failure_category": req.failure_category,
}
_write_candidates(records)
append_jsonl(_approved_file(), records[idx])
elif req.action == "discard":
records[idx] = {**record, "status": "discarded"}
_write_candidates(records)
else: # flag
records[idx] = {**record, "status": "model_rejected"}
_write_candidates(records)
return {"ok": True}
# ── POST /undo ─────────────────────────────────────────────────────────────
class UndoRequest(BaseModel):
id: str
@router.post("/undo")
def post_undo(req: UndoRequest):
"""Restore a previously actioned candidate back to needs_review."""
records = _read_candidates()
idx = next((i for i, r in enumerate(records) if r.get("id") == req.id), None)
if idx is None:
raise HTTPException(404, f"Record {req.id!r} not found")
record = records[idx]
old_status = record.get("status")
if old_status == "needs_review":
raise HTTPException(409, "Record is already in needs_review state")
records[idx] = {**record, "status": "needs_review", "corrected_response": None}
_write_candidates(records)
# If it was approved, remove from the approved file too
if old_status == "approved":
approved = read_jsonl(_approved_file())
write_jsonl(_approved_file(), [r for r in approved if r.get("id") != req.id])
return {"ok": True}
# ── GET /export ─────────────────────────────────────────────────────────────
@router.get("/export")
def get_export() -> StreamingResponse:
"""Stream approved records as SFT-ready JSONL for download."""
exportable = [r for r in read_jsonl(_approved_file()) if _is_exportable(r)]
def generate():
for r in exportable:
record = {
"messages": r.get("prompt_messages", []) + [
{"role": "assistant", "content": r["corrected_response"]}
]
}
yield json.dumps(record) + "\n"
timestamp = datetime.now(timezone.utc).strftime("%Y%m%d-%H%M%S")
return StreamingResponse(
generate(),
media_type="application/x-ndjson",
headers={
"Content-Disposition": f'attachment; filename="sft_export_{timestamp}.jsonl"'
},
)
# ── GET /stats ──────────────────────────────────────────────────────────────
@router.get("/stats")
def get_stats() -> dict[str, object]:
"""Return counts by status, model, and task type."""
records = _read_candidates()
by_status: dict[str, int] = {}
by_model: dict[str, int] = {}
by_task_type: dict[str, int] = {}
for r in records:
status = r.get("status", "unknown")
by_status[status] = by_status.get(status, 0) + 1
model = r.get("model_name", "unknown")
by_model[model] = by_model.get(model, 0) + 1
task_type = r.get("task_type", "unknown")
by_task_type[task_type] = by_task_type.get(task_type, 0) + 1
approved = read_jsonl(_approved_file())
export_ready = sum(1 for r in approved if _is_exportable(r))
return {
"total": len(records),
"by_status": by_status,
"by_model": by_model,
"by_task_type": by_task_type,
"export_ready": export_ready,
}
# ── GET /config ─────────────────────────────────────────────────────────────
@router.get("/config")
def get_sft_config() -> dict:
"""Return the current SFT configuration (bench_results_dir)."""
f = _config_file()
if not f.exists():
return {"bench_results_dir": ""}
try:
raw = yaml.safe_load(f.read_text(encoding="utf-8")) or {}
except yaml.YAMLError:
return {"bench_results_dir": ""}
sft_section = raw.get("sft") or {}
return {"bench_results_dir": sft_section.get("bench_results_dir", "")}
class SftConfigPayload(BaseModel):
bench_results_dir: str
@router.post("/config")
def post_sft_config(payload: SftConfigPayload) -> dict:
"""Write the bench_results_dir setting to the config file."""
f = _config_file()
f.parent.mkdir(parents=True, exist_ok=True)
try:
raw = yaml.safe_load(f.read_text(encoding="utf-8")) if f.exists() else {}
raw = raw or {}
except yaml.YAMLError:
raw = {}
raw["sft"] = {"bench_results_dir": payload.bench_results_dir}
tmp = f.with_suffix(".tmp")
tmp.write_text(yaml.dump(raw, allow_unicode=True, sort_keys=False), encoding="utf-8")
tmp.rename(f)
return {"ok": True}

View file

@ -1,117 +0,0 @@
"""Shared email utility functions for Avocet.
Pure-stdlib helpers extracted from the retired label_tool.py Streamlit app.
These are reused by the FastAPI backend and the test suite.
"""
from __future__ import annotations
import json
import re
from html.parser import HTMLParser
from pathlib import Path
from typing import Any
# ── HTML → plain-text extractor ──────────────────────────────────────────────
class _TextExtractor(HTMLParser):
"""Extract visible text from an HTML email body, preserving line breaks."""
_BLOCK = {"p", "div", "br", "li", "tr", "h1", "h2", "h3", "h4", "h5", "h6", "blockquote"}
_SKIP = {"script", "style", "head", "noscript"}
def __init__(self):
super().__init__(convert_charrefs=True)
self._parts: list[str] = []
self._depth_skip = 0
def handle_starttag(self, tag, attrs):
tag = tag.lower()
if tag in self._SKIP:
self._depth_skip += 1
elif tag in self._BLOCK:
self._parts.append("\n")
def handle_endtag(self, tag):
if tag.lower() in self._SKIP:
self._depth_skip = max(0, self._depth_skip - 1)
def handle_data(self, data):
if not self._depth_skip:
self._parts.append(data)
def get_text(self) -> str:
text = "".join(self._parts)
lines = [ln.strip() for ln in text.splitlines()]
return "\n".join(ln for ln in lines if ln)
def strip_html(html_str: str) -> str:
"""Convert HTML email body to plain text. Pure stdlib, no dependencies."""
try:
extractor = _TextExtractor()
extractor.feed(html_str)
return extractor.get_text()
except Exception:
return re.sub(r"<[^>]+>", " ", html_str).strip()
def extract_body(msg: Any) -> str:
"""Return plain-text body. Strips HTML when no text/plain part exists."""
if msg.is_multipart():
html_fallback: str | None = None
for part in msg.walk():
ct = part.get_content_type()
if ct == "text/plain":
try:
charset = part.get_content_charset() or "utf-8"
return part.get_payload(decode=True).decode(charset, errors="replace")
except Exception:
pass
elif ct == "text/html" and html_fallback is None:
try:
charset = part.get_content_charset() or "utf-8"
raw = part.get_payload(decode=True).decode(charset, errors="replace")
html_fallback = strip_html(raw)
except Exception:
pass
return html_fallback or ""
else:
try:
charset = msg.get_content_charset() or "utf-8"
raw = msg.get_payload(decode=True).decode(charset, errors="replace")
if msg.get_content_type() == "text/html":
return strip_html(raw)
return raw
except Exception:
pass
return ""
def read_jsonl(path: Path) -> list[dict]:
"""Read a JSONL file, returning valid records. Skips blank lines and malformed JSON."""
if not path.exists():
return []
records: list[dict] = []
for line in path.read_text(encoding="utf-8").splitlines():
line = line.strip()
if not line:
continue
try:
records.append(json.loads(line))
except json.JSONDecodeError:
pass
return records
def write_jsonl(path: Path, records: list[dict]) -> None:
"""Write records to a JSONL file, overwriting any existing content."""
path.parent.mkdir(parents=True, exist_ok=True)
content = "\n".join(json.dumps(r) for r in records)
path.write_text(content + ("\n" if records else ""), encoding="utf-8")
def append_jsonl(path: Path, record: dict) -> None:
"""Append a single record to a JSONL file."""
path.parent.mkdir(parents=True, exist_ok=True)
with open(path, "a", encoding="utf-8") as fh:
fh.write(json.dumps(record) + "\n")

View file

@ -21,8 +21,3 @@ accounts:
# Optional: limit emails fetched per account per run (0 = unlimited)
max_per_account: 500
# cf-orch SFT candidate import — path to the bench_results/ directory
# produced by circuitforge-orch's benchmark harness.
sft:
bench_results_dir: /path/to/circuitforge-orch/scripts/bench_results

View file

@ -0,0 +1,95 @@
# Anime.js Animation Integration — Design
**Date:** 2026-03-08
**Status:** Approved
**Branch:** feat/vue-label-tab
## Problem
The current animation system mixes CSS keyframes, CSS transitions, and imperative inline-style bindings across three files. The seams between systems produce:
- Abrupt ball pickup (instant scale/borderRadius jump)
- No spring snap-back on release to no target
- Rigid CSS dismissals with no timing control
- Bucket grid and badge pop on basic `@keyframes`
## Decision
Integrate **Anime.js v4** as a single animation layer. Vue reactive state is unchanged; Anime.js owns all DOM motion imperatively.
## Architecture
One new composable, minimal changes to two existing files, CSS cleanup in two files.
```
web/src/composables/useCardAnimation.ts ← NEW
web/src/components/EmailCardStack.vue ← modify
web/src/views/LabelView.vue ← modify
```
**Data flow:**
```
pointer events → Vue refs (isHeld, deltaX, deltaY, dismissType)
↓ watched by
useCardAnimation(cardEl, stackEl, isHeld, ...)
↓ imperatively drives
Anime.js → DOM transforms
```
`useCardAnimation` is a pure side-effect composable — returns nothing to the template. The `cardStyle` computed in `EmailCardStack.vue` is removed; Anime.js owns the element's transform directly.
## Animation Surfaces
### Pickup morph
```
animate(cardEl, { scale: 0.55, borderRadius: '50%', y: -80 }, { duration: 200, ease: spring(1, 80, 10) })
```
Replaces the instant CSS transform jump on `onPointerDown`.
### Drag tracking
Raw `cardEl.style.translate` update on `onPointerMove` — no animation, just position. Easing only at boundaries (pickup / release), not during active drag.
### Snap-back
```
animate(cardEl, { x: 0, y: 0, scale: 1, borderRadius: '1rem' }, { ease: spring(1, 80, 10) })
```
Fires on `onPointerUp` when no zone/bucket target was hit.
### Dismissals (replace CSS `@keyframes`)
- **fileAway**`animate(cardEl, { y: '-120%', scale: 0.85, opacity: 0 }, { duration: 280, ease: 'out(3)' })`
- **crumple** — 2-step timeline: shrink + redden → `scale(0)` + rotate
- **slideUnder**`animate(cardEl, { x: '110%', rotate: 5, opacity: 0 }, { duration: 260 })`
### Bucket grid rise
`animate(gridEl, { y: -8, opacity: 0.45 })` on `isHeld` → true; reversed on false. Spring easing.
### Badge pop
`animate(badgeEl, { scale: [0.6, 1], opacity: [0, 1] }, { ease: spring(1.5, 80, 8), duration: 300 })` triggered on badge mount via Vue's `onMounted` lifecycle hook in a `BadgePop` wrapper component or `v-enter-active` transition hook.
## Constraints
### Reduced motion
`useCardAnimation` checks `motion.rich.value` before firing any Anime.js call. If false, all animations are skipped — instant state changes only. Consistent with existing `useMotion` pattern.
### Bundle size
Anime.js v4 core ~17KB gzipped. Only `animate`, `spring`, and `createTimeline` are imported — Vite ESM tree-shaking keeps footprint minimal. The `draggable` module is not used.
### Tests
Existing `EmailCardStack.test.ts` tests emit behavior, not animation — they remain passing. Anime.js mocked at module level in Vitest via `vi.mock('animejs')` where needed.
### CSS cleanup
Remove from `EmailCardStack.vue` and `LabelView.vue`:
- `@keyframes fileAway`, `crumple`, `slideUnder`
- `@keyframes badge-pop`
- `.dismiss-label`, `.dismiss-skip`, `.dismiss-discard` classes (Anime.js fires on element refs directly)
- The `dismissClass` computed in `EmailCardStack.vue`
## Files Changed
| File | Change |
|------|--------|
| `web/package.json` | Add `animejs` dependency |
| `web/src/composables/useCardAnimation.ts` | New — all Anime.js animation logic |
| `web/src/components/EmailCardStack.vue` | Remove `cardStyle` computed + dismiss classes; call `useCardAnimation` |
| `web/src/views/LabelView.vue` | Badge pop + bucket grid rise via Anime.js |
| `web/src/assets/avocet.css` | Remove any global animation keyframes if present |

View file

@ -0,0 +1,573 @@
# Anime.js Animation Integration — Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
**Goal:** Replace the current mixed CSS keyframes / inline-style animation system with Anime.js v4 for all card motion — pickup morph, drag tracking, spring snap-back, dismissals, bucket grid rise, and badge pop.
**Architecture:** A new `useCardAnimation` composable owns all Anime.js calls imperatively against DOM refs. Vue reactive state (`isHeld`, `deltaX`, `deltaY`, `dismissType`) is unchanged. `cardStyle` computed and `dismissClass` computed are deleted; Anime.js writes to the element directly.
**Tech Stack:** Anime.js v4 (`animejs`), Vue 3 Composition API, `@vue/test-utils` + Vitest for tests.
---
## Task 1: Install Anime.js
**Files:**
- Modify: `web/package.json`
**Step 1: Install the package**
```bash
cd /Library/Development/CircuitForge/avocet/web
npm install animejs
```
**Step 2: Verify the import resolves**
Create a throwaway check — open `web/src/main.ts` briefly and confirm:
```ts
import { animate, spring } from 'animejs'
```
resolves without error in the editor (TypeScript types ship with animejs v4).
Remove the import immediately after verifying — do not commit it.
**Step 3: Commit**
```bash
cd /Library/Development/CircuitForge/avocet/web
git add package.json package-lock.json
git commit -m "feat(avocet): add animejs v4 dependency"
```
---
## Task 2: Create `useCardAnimation` composable
**Files:**
- Create: `web/src/composables/useCardAnimation.ts`
- Create: `web/src/composables/useCardAnimation.test.ts`
**Background — Anime.js v4 transform model:**
Anime.js v4 tracks `x`, `y`, `scale`, `rotate`, etc. as separate transform components internally.
Use `utils.set(el, props)` for instant (no-animation) property updates — this keeps the internal cache consistent.
Never mix direct `el.style.transform = "..."` with Anime.js on the same element, or the cache desyncs.
**Step 1: Write the failing tests**
`web/src/composables/useCardAnimation.test.ts`:
```ts
import { ref, nextTick } from 'vue'
import { describe, it, expect, vi, beforeEach } from 'vitest'
// Mock animejs before importing the composable
vi.mock('animejs', () => ({
animate: vi.fn(),
spring: vi.fn(() => 'mock-spring'),
utils: { set: vi.fn() },
}))
import { useCardAnimation } from './useCardAnimation'
import { animate, utils } from 'animejs'
const mockAnimate = animate as ReturnType<typeof vi.fn>
const mockSet = utils.set as ReturnType<typeof vi.fn>
function makeEl() {
return document.createElement('div')
}
describe('useCardAnimation', () => {
beforeEach(() => {
vi.clearAllMocks()
})
it('pickup() calls animate with ball shape', () => {
const el = makeEl()
const cardEl = ref<HTMLElement | null>(el)
const motion = { rich: ref(true) }
const { pickup } = useCardAnimation(cardEl, motion)
pickup()
expect(mockAnimate).toHaveBeenCalledWith(
el,
expect.objectContaining({ scale: 0.55, borderRadius: '50%' }),
expect.anything(),
)
})
it('pickup() is a no-op when motion.rich is false', () => {
const el = makeEl()
const cardEl = ref<HTMLElement | null>(el)
const motion = { rich: ref(false) }
const { pickup } = useCardAnimation(cardEl, motion)
pickup()
expect(mockAnimate).not.toHaveBeenCalled()
})
it('setDragPosition() calls utils.set with translated coords', () => {
const el = makeEl()
const cardEl = ref<HTMLElement | null>(el)
const motion = { rich: ref(true) }
const { setDragPosition } = useCardAnimation(cardEl, motion)
setDragPosition(50, 30)
expect(mockSet).toHaveBeenCalledWith(el, expect.objectContaining({ x: 50, y: -50 }))
// y = deltaY - 80 = 30 - 80 = -50
})
it('snapBack() calls animate returning to card shape', () => {
const el = makeEl()
const cardEl = ref<HTMLElement | null>(el)
const motion = { rich: ref(true) }
const { snapBack } = useCardAnimation(cardEl, motion)
snapBack()
expect(mockAnimate).toHaveBeenCalledWith(
el,
expect.objectContaining({ x: 0, y: 0, scale: 1 }),
expect.anything(),
)
})
it('animateDismiss("label") calls animate', () => {
const el = makeEl()
const cardEl = ref<HTMLElement | null>(el)
const motion = { rich: ref(true) }
const { animateDismiss } = useCardAnimation(cardEl, motion)
animateDismiss('label')
expect(mockAnimate).toHaveBeenCalled()
})
it('animateDismiss("discard") calls animate', () => {
const el = makeEl()
const cardEl = ref<HTMLElement | null>(el)
const motion = { rich: ref(true) }
const { animateDismiss } = useCardAnimation(cardEl, motion)
animateDismiss('discard')
expect(mockAnimate).toHaveBeenCalled()
})
it('animateDismiss("skip") calls animate', () => {
const el = makeEl()
const cardEl = ref<HTMLElement | null>(el)
const motion = { rich: ref(true) }
const { animateDismiss } = useCardAnimation(cardEl, motion)
animateDismiss('skip')
expect(mockAnimate).toHaveBeenCalled()
})
it('animateDismiss is a no-op when motion.rich is false', () => {
const el = makeEl()
const cardEl = ref<HTMLElement | null>(el)
const motion = { rich: ref(false) }
const { animateDismiss } = useCardAnimation(cardEl, motion)
animateDismiss('label')
expect(mockAnimate).not.toHaveBeenCalled()
})
})
```
**Step 2: Run tests to confirm they fail**
```bash
cd /Library/Development/CircuitForge/avocet/web
npm test -- useCardAnimation
```
Expected: FAIL — "Cannot find module './useCardAnimation'"
**Step 3: Implement the composable**
`web/src/composables/useCardAnimation.ts`:
```ts
import { type Ref } from 'vue'
import { animate, spring, utils } from 'animejs'
const BALL_SCALE = 0.55
const BALL_RADIUS = '50%'
const CARD_RADIUS = '1rem'
const PICKUP_Y_OFFSET = 80 // px above finger
const PICKUP_DURATION = 200
// NOTE: animejs v4 — spring() takes an object, not positional args
const SNAP_SPRING = spring({ mass: 1, stiffness: 80, damping: 10 })
interface Motion { rich: Ref<boolean> }
export function useCardAnimation(
cardEl: Ref<HTMLElement | null>,
motion: Motion,
) {
function pickup() {
if (!motion.rich.value || !cardEl.value) return
// NOTE: animejs v4 — animate() is 2-arg; timing options merge into the params object
animate(cardEl.value, {
scale: BALL_SCALE,
borderRadius: BALL_RADIUS,
y: -PICKUP_Y_OFFSET,
duration: PICKUP_DURATION,
ease: SNAP_SPRING,
})
}
function setDragPosition(dx: number, dy: number) {
if (!cardEl.value) return
utils.set(cardEl.value, { x: dx, y: dy - PICKUP_Y_OFFSET })
}
function snapBack() {
if (!motion.rich.value || !cardEl.value) return
// No duration — spring physics determines settling time
animate(cardEl.value, {
x: 0,
y: 0,
scale: 1,
borderRadius: CARD_RADIUS,
ease: SNAP_SPRING,
})
}
function animateDismiss(type: 'label' | 'skip' | 'discard') {
if (!motion.rich.value || !cardEl.value) return
const el = cardEl.value
if (type === 'label') {
animate(el, { y: '-120%', scale: 0.85, opacity: 0, duration: 280, ease: 'out(3)' })
} else if (type === 'discard') {
// Two-step: crumple then shrink (keyframes array in params object)
animate(el, { keyframes: [
{ scale: 0.95, rotate: 2, filter: 'brightness(0.6) sepia(1) hue-rotate(-20deg)', duration: 140 },
{ scale: 0, rotate: 8, opacity: 0, duration: 210 },
])
} else if (type === 'skip') {
animate(el, { x: '110%', rotate: 5, opacity: 0 }, { duration: 260, ease: 'out(2)' })
}
}
return { pickup, setDragPosition, snapBack, animateDismiss }
}
```
**Step 4: Run tests — expect pass**
```bash
npm test -- useCardAnimation
```
Expected: All 8 tests PASS.
**Step 5: Commit**
```bash
git add web/src/composables/useCardAnimation.ts web/src/composables/useCardAnimation.test.ts
git commit -m "feat(avocet): add useCardAnimation composable with Anime.js"
```
---
## Task 3: Wire `useCardAnimation` into `EmailCardStack.vue`
**Files:**
- Modify: `web/src/components/EmailCardStack.vue`
- Modify: `web/src/components/EmailCardStack.test.ts`
**What changes:**
- Remove `cardStyle` computed and `:style="cardStyle"` binding
- Remove `dismissClass` computed and `:class="[dismissClass, ...]"` binding (keep `is-held`)
- Remove `deltaX`, `deltaY` reactive refs (position now owned by Anime.js)
- Call `pickup()` in `onPointerDown`, `setDragPosition()` in `onPointerMove`, `snapBack()` in `onPointerUp` (no-target path)
- Watch `props.dismissType` and call `animateDismiss()`
- Remove CSS `@keyframes fileAway`, `crumple`, `slideUnder` and their `.dismiss-*` rule blocks from `<style>`
**Step 1: Update the tests that check dismiss classes**
In `EmailCardStack.test.ts`, the 5 tests checking `.dismiss-label`, `.dismiss-discard`, `.dismiss-skip` classes are testing implementation (CSS class name), not behavior. Replace them with a single test that verifies `animateDismiss` is called:
```ts
// Add at the top of the file (after existing imports):
vi.mock('../composables/useCardAnimation', () => ({
useCardAnimation: vi.fn(() => ({
pickup: vi.fn(),
setDragPosition: vi.fn(),
snapBack: vi.fn(),
animateDismiss: vi.fn(),
})),
}))
import { useCardAnimation } from '../composables/useCardAnimation'
```
Replace the five `dismissType` class tests (lines 2546) with:
```ts
it('calls animateDismiss with type when dismissType prop changes', async () => {
const w = mount(EmailCardStack, { props: { item, isBucketMode: false, dismissType: null } })
const { animateDismiss } = (useCardAnimation as ReturnType<typeof vi.fn>).mock.results[0].value
await w.setProps({ dismissType: 'label' })
await nextTick()
expect(animateDismiss).toHaveBeenCalledWith('label')
})
```
Add `nextTick` import to the test file header if not already present:
```ts
import { nextTick } from 'vue'
```
**Step 2: Run tests to confirm the replaced tests fail**
```bash
npm test -- EmailCardStack
```
Expected: FAIL — `animateDismiss` not called (not yet wired in component)
**Step 3: Modify `EmailCardStack.vue`**
Script section changes:
```ts
// Remove:
// import { ref, computed } from 'vue' → change to:
import { ref, watch } from 'vue'
// Add import:
import { useCardAnimation } from '../composables/useCardAnimation'
// Remove these refs:
// const deltaX = ref(0)
// const deltaY = ref(0)
// Add after const motion = useMotion():
const { pickup, setDragPosition, snapBack, animateDismiss } = useCardAnimation(cardEl, motion)
// Add watcher:
watch(() => props.dismissType, (type) => {
if (type) animateDismiss(type)
})
// Remove dismissClass computed entirely.
// In onPointerDown — add after isHeld.value = true:
pickup()
// In onPointerMove — replace deltaX/deltaY assignments with:
const dx = e.clientX - pickupX.value
const dy = e.clientY - pickupY.value
setDragPosition(dx, dy)
// (keep the zone/bucket detection that uses e.clientX/e.clientY — those stay the same)
// In onPointerUp — in the snap-back else branch, replace:
// deltaX.value = 0
// deltaY.value = 0
// with:
snapBack()
```
Template changes — on the `.card-wrapper` div:
```html
<!-- Remove: :class="[dismissClass, { 'is-held': isHeld }]" -->
<!-- Replace with: -->
:class="{ 'is-held': isHeld }"
<!-- Remove: :style="cardStyle" -->
```
CSS changes in `<style scoped>` — delete these entire blocks:
```
@keyframes fileAway { ... }
@keyframes crumple { ... }
@keyframes slideUnder { ... }
.card-wrapper.dismiss-label { ... }
.card-wrapper.dismiss-discard { ... }
.card-wrapper.dismiss-skip { ... }
```
Also delete `--card-dismiss` and `--card-skip` CSS var usages if present.
**Step 4: Run all tests**
```bash
npm test
```
Expected: All pass (both `useCardAnimation.test.ts` and `EmailCardStack.test.ts`).
**Step 5: Commit**
```bash
git add web/src/components/EmailCardStack.vue web/src/components/EmailCardStack.test.ts
git commit -m "feat(avocet): wire Anime.js card animation into EmailCardStack"
```
---
## Task 4: Bucket grid rise animation
**Files:**
- Modify: `web/src/views/LabelView.vue`
**What changes:**
Replace the CSS class-toggle animation on `.bucket-grid-footer.grid-active` with an Anime.js watch in `LabelView.vue`. The `position: sticky → fixed` switch stays as a CSS class (can't animate position), but `translateY` and `opacity` move to Anime.js.
**Step 1: Add gridEl ref and import animate**
In `LabelView.vue` `<script setup>`:
```ts
// Add to imports:
import { ref, onMounted, onUnmounted, watch } from 'vue'
import { animate, spring } from 'animejs'
// Add ref:
const gridEl = ref<HTMLElement | null>(null)
```
**Step 2: Add watcher for isHeld**
```ts
watch(isHeld, (held) => {
if (!motion.rich.value || !gridEl.value) return
// animejs v4: 2-arg animate, spring() takes object
animate(gridEl.value,
held
? { y: -8, opacity: 0.45, ease: spring({ mass: 1, stiffness: 80, damping: 10 }), duration: 250 }
: { y: 0, opacity: 1, ease: spring({ mass: 1, stiffness: 80, damping: 10 }), duration: 250 }
)
})
```
**Step 3: Wire ref in template**
On the `.bucket-grid-footer` div:
```html
<div ref="gridEl" class="bucket-grid-footer" :class="{ 'grid-active': isHeld }">
```
**Step 4: Remove CSS transition from `.bucket-grid-footer`**
In `LabelView.vue <style scoped>`, delete the `transition:` line from `.bucket-grid-footer`:
```css
/* DELETE this line: */
transition: transform 250ms cubic-bezier(0.34, 1.56, 0.64, 1),
opacity 200ms ease,
background 200ms ease;
```
Keep the `transform: translateY(-8px)` and `opacity: 0.45` on `.bucket-grid-footer.grid-active` as fallback for reduced-motion users (no-JS fallback too).
Actually — keep `.grid-active` rules as-is for the no-motion path. The Anime.js `watch` guard (`if (!motion.rich.value)`) means reduced-motion users never hit Anime.js; the CSS class handles them.
**Step 5: Run tests**
```bash
npm test
```
Expected: All pass (LabelView has no dedicated tests, but full suite should be green).
**Step 6: Commit**
```bash
git add web/src/views/LabelView.vue
git commit -m "feat(avocet): animate bucket grid rise with Anime.js spring"
```
---
## Task 5: Badge pop animation
**Files:**
- Modify: `web/src/views/LabelView.vue`
**What changes:**
Replace `@keyframes badge-pop` (scale + opacity keyframe) with a Vue `<Transition>` `@enter` hook that calls `animate()`. Badges already appear/disappear via `v-if`, so they have natural mount/unmount lifecycle.
**Step 1: Wrap each badge in a `<Transition>`**
In `LabelView.vue` template, each badge `<span v-if="...">` gets wrapped:
```html
<Transition @enter="onBadgeEnter" :css="false">
<span v-if="onRoll" class="badge badge-roll">🔥 On a roll!</span>
</Transition>
<Transition @enter="onBadgeEnter" :css="false">
<span v-if="speedRound" class="badge badge-speed">⚡ Speed round!</span>
</Transition>
<!-- repeat for all 6 badges -->
```
`:css="false"` tells Vue not to apply any CSS transition classes — Anime.js owns the enter animation entirely.
**Step 2: Add `onBadgeEnter` hook**
```ts
function onBadgeEnter(el: Element, done: () => void) {
if (!motion.rich.value) { done(); return }
animate(el as HTMLElement,
{ scale: [0.6, 1], opacity: [0, 1] },
{ ease: spring(1.5, 80, 8), duration: 300, onComplete: done }
)
}
```
**Step 3: Remove `@keyframes badge-pop` from CSS**
In `LabelView.vue <style scoped>`:
```css
/* DELETE: */
@keyframes badge-pop {
from { transform: scale(0.6); opacity: 0; }
to { transform: scale(1); opacity: 1; }
}
/* DELETE animation line from .badge: */
animation: badge-pop 0.3s cubic-bezier(0.34, 1.56, 0.64, 1);
```
**Step 4: Run tests**
```bash
npm test
```
Expected: All pass.
**Step 5: Commit**
```bash
git add web/src/views/LabelView.vue
git commit -m "feat(avocet): badge pop via Anime.js spring transition hook"
```
---
## Task 6: Build and smoke test
**Step 1: Build the SPA**
```bash
cd /Library/Development/CircuitForge/avocet
./manage.sh start-api
```
(This builds Vue + starts FastAPI on port 8503.)
**Step 2: Open the app**
```bash
./manage.sh open-api
```
**Step 3: Manual smoke test checklist**
- [ ] Pick up a card — ball morph is smooth (not instant jump)
- [ ] Drag ball around — follows finger with no lag
- [ ] Release in center — springs back to card with bounce
- [ ] Release in left zone — discard fires (card crumples)
- [ ] Release in right zone — skip fires (card slides right)
- [ ] Release on a bucket — label fires (card files up)
- [ ] Fling left fast — discard fires
- [ ] Bucket grid rises smoothly on pickup, falls on release
- [ ] Badge (label 10 in a row for 🔥) pops in with spring
- [ ] Reduced motion: toggle in system settings → no animations, instant behavior
- [ ] Keyboard labels (19) still work (pointer events unchanged)
**Step 4: Final commit if all green**
```bash
git add -A
git commit -m "feat(avocet): complete Anime.js animation integration"
```

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,254 @@
# Fine-tune Email Classifier — Design Spec
**Date:** 2026-03-15
**Status:** Approved
**Scope:** Avocet — `scripts/`, `app/api.py`, `web/src/views/BenchmarkView.vue`, `environment.yml`
---
## Problem
The benchmark baseline shows zero-shot macro-F1 of 0.366 for the best models (`deberta-zeroshot`, `deberta-base-anli`). Zero-shot inference cannot improve with more labeled data. Fine-tuning the fastest models (`deberta-small` at 111ms, `bge-m3` at 123ms) on the growing labeled dataset is the path to meaningful accuracy gains.
---
## Constraints
- 501 labeled samples after dropping 2 non-canonical `profile_alert` rows
- Heavy class imbalance: `digest` 29%, `neutral` 26%, `new_lead` 2.6%, `survey_received` 3%
- 8.2 GB VRAM (shared with Peregrine vLLM during dev)
- Target models: `cross-encoder/nli-deberta-v3-small` (100M params), `MoritzLaurer/bge-m3-zeroshot-v2.0` (600M params)
- Output: local `models/avocet-{name}/` directory
- UI-triggerable via web interface (SSE streaming log)
- Stack: transformers 4.57.3, torch 2.10.0, accelerate 1.12.0, sklearn, CUDA 8.2GB
---
## Environment changes
`environment.yml` must add:
- `scikit-learn` — required for `train_test_split(stratify=...)` and `f1_score`
- `peft` is NOT used by this spec; it is available in the env but not required here
---
## Architecture
### New file: `scripts/finetune_classifier.py`
CLI entry point for fine-tuning. All prints use `flush=True` so stdout is SSE-streamable.
```
python scripts/finetune_classifier.py --model deberta-small [--epochs 5]
```
Supported `--model` values: `deberta-small`, `bge-m3`
**Model registry** (internal to this script):
| Key | Base model ID | Max tokens | fp16 | Batch size | Grad accum steps | Gradient checkpointing |
|-----|--------------|------------|------|------------|-----------------|----------------------|
| `deberta-small` | `cross-encoder/nli-deberta-v3-small` | 512 | No | 16 | 1 | No |
| `bge-m3` | `MoritzLaurer/bge-m3-zeroshot-v2.0` | 512 | Yes | 4 | 4 | Yes |
`bge-m3` uses `fp16=True` (halves optimizer state from ~4.8GB to ~2.4GB) with batch size 4 + gradient accumulation 4 = effective batch 16, matching `deberta-small`. These settings are required to fit within 8.2GB VRAM. Still stop Peregrine vLLM before running bge-m3 fine-tuning.
### Modified: `scripts/classifier_adapters.py`
Add `FineTunedAdapter(ClassifierAdapter)`:
- Takes `model_dir: str` (path to a `models/avocet-*/` checkpoint)
- Loads via `pipeline("text-classification", model=model_dir)`
- `classify()` input format: **`f"{subject} [SEP] {body[:400]}"`** — must match the training format exactly. Do NOT use the zero-shot adapters' `f"Subject: {subject}\n\n{body[:600]}"` format; distribution shift will degrade accuracy.
- Returns the top predicted label directly (single forward pass — no per-label NLI scoring loop)
- Expected inference speed: ~1020ms/email vs 111338ms for zero-shot
### Modified: `scripts/benchmark_classifier.py`
At startup, scan `models/` for subdirectories containing `training_info.json`. Register each as a dynamic entry in the model registry using `FineTunedAdapter`. Silently skips if `models/` does not exist. Existing CLI behaviour unchanged.
### Modified: `app/api.py`
Two new GET endpoints (GET required for `EventSource` compatibility):
**`GET /api/finetune/status`**
Scans `models/` for `training_info.json` files. Returns:
```json
[
{
"name": "avocet-deberta-small",
"base_model": "cross-encoder/nli-deberta-v3-small",
"val_macro_f1": 0.712,
"timestamp": "2026-03-15T12:00:00Z",
"sample_count": 401
}
]
```
Returns `[]` if no fine-tuned models exist.
**`GET /api/finetune/run?model=deberta-small&epochs=5`**
Spawns `finetune_classifier.py` via the `job-seeker-classifiers` Python binary. Streams stdout as SSE `{"type":"progress","message":"..."}` events. Emits `{"type":"complete"}` on clean exit, `{"type":"error","message":"..."}` on non-zero exit. Same implementation pattern as `/api/benchmark/run`.
### Modified: `web/src/views/BenchmarkView.vue`
**Trained models badge row** (top of view, conditional on fine-tuned models existing):
Shows each fine-tuned model name + val macro-F1 chip. Fetches from `/api/finetune/status` on mount.
**Fine-tune section** (collapsible, below benchmark charts):
- Dropdown: `deberta-small` | `bge-m3`
- Number input: epochs (default 5, range 120)
- Run button → streams into existing log component
- On `complete`: auto-triggers `/api/benchmark/run` (with `--save`) so charts update immediately
---
## Training Pipeline
### Data preparation
1. Load `data/email_score.jsonl`
2. Drop rows where `label` not in canonical `LABELS` (removes `profile_alert` etc.)
3. Check for classes with < 2 **total** samples (before any split). Drop those classes and warn. Additionally warn but do not skip classes with < 5 training samples, noting eval F1 for those classes will be unreliable.
4. Input text: `f"{subject} [SEP] {body[:400]}"` — fits within 512 tokens for both target models
5. Stratified 80/20 train/val split via `sklearn.model_selection.train_test_split(stratify=labels)`
### Class weighting
Compute per-class weights: `total_samples / (n_classes × class_count)`. Pass to a `WeightedTrainer` subclass:
```python
class WeightedTrainer(Trainer):
def compute_loss(self, model, inputs, return_outputs=False, **kwargs):
# **kwargs is required — absorbs num_items_in_batch added in Transformers 4.38.
# Do not remove it; removing it causes TypeError on the first training step.
labels = inputs.pop("labels")
outputs = model(**inputs)
# Move class_weights to the same device as logits — required for GPU training.
# class_weights is created on CPU; logits are on cuda:0 during training.
weight = self.class_weights.to(outputs.logits.device)
loss = F.cross_entropy(outputs.logits, labels, weight=weight)
return (loss, outputs) if return_outputs else loss
```
### Model setup
```python
AutoModelForSequenceClassification.from_pretrained(
base_model_id,
num_labels=10,
ignore_mismatched_sizes=True, # see note below
id2label=id2label,
label2id=label2id,
)
```
**Note on `ignore_mismatched_sizes=True`:** The pretrained NLI head is a 3-class linear projection. It mismatches the 10-class head constructed by `num_labels=10`, so its weights are skipped during loading. PyTorch initializes the new head from scratch using the model's default init scheme. The backbone weights load normally. Do not set this to `False` — it will raise a shape error.
### Training config and `compute_metrics`
The Trainer requires a `compute_metrics` callback that takes an `EvalPrediction` (logits + label_ids) and returns a dict with a `macro_f1` key. This is distinct from the existing `compute_metrics` in `classifier_adapters.py` (which operates on string predictions):
```python
def compute_metrics_for_trainer(eval_pred: EvalPrediction) -> dict:
logits, labels = eval_pred
preds = logits.argmax(axis=-1)
return {
"macro_f1": f1_score(labels, preds, average="macro", zero_division=0),
"accuracy": accuracy_score(labels, preds),
}
```
`TrainingArguments` must include:
- `load_best_model_at_end=True`
- `metric_for_best_model="macro_f1"`
- `greater_is_better=True`
These are required for `EarlyStoppingCallback` to work correctly. Without `load_best_model_at_end=True`, `EarlyStoppingCallback` raises `AssertionError` on init.
| Hyperparameter | deberta-small | bge-m3 |
|---------------|--------------|--------|
| Epochs | 5 (default, CLI-overridable) | 5 |
| Batch size | 16 | 4 |
| Gradient accumulation | 1 | 4 (effective batch = 16) |
| Learning rate | 2e-5 | 2e-5 |
| LR schedule | Linear with 10% warmup | same |
| Optimizer | AdamW | AdamW |
| fp16 | No | Yes |
| Gradient checkpointing | No | Yes |
| Eval strategy | Every epoch | Every epoch |
| Best checkpoint | By `macro_f1` | same |
| Early stopping patience | 3 epochs | 3 epochs |
### Output
Saved to `models/avocet-{name}/`:
- Model weights + tokenizer (standard HuggingFace format)
- `training_info.json`:
```json
{
"name": "avocet-deberta-small",
"base_model_id": "cross-encoder/nli-deberta-v3-small",
"timestamp": "2026-03-15T12:00:00Z",
"epochs_run": 5,
"val_macro_f1": 0.712,
"val_accuracy": 0.798,
"sample_count": 401,
"label_counts": { "digest": 116, "neutral": 104, ... }
}
```
---
## Data Flow
```
email_score.jsonl
finetune_classifier.py
├── drop non-canonical labels
├── check for < 2 total samples per class (drop + warn)
├── stratified 80/20 split
├── tokenize (subject [SEP] body[:400])
├── compute class weights
├── WeightedTrainer + EarlyStoppingCallback
└── save → models/avocet-{name}/
├── FineTunedAdapter (classifier_adapters.py)
│ ├── pipeline("text-classification")
│ ├── input: subject [SEP] body[:400] ← must match training format
│ └── ~1020ms/email inference
└── training_info.json
└── /api/finetune/status
└── BenchmarkView badge row
```
---
## Error Handling
- **Insufficient data (< 2 total samples in a class):** Drop class before split, print warning with class name and count.
- **Low data warning (< 5 training samples in a class):** Warn but continue; note eval F1 for that class will be unreliable.
- **VRAM OOM on bge-m3:** Surface as clear SSE error message. Suggest stopping Peregrine vLLM first (it holds ~5.7GB).
- **Missing score file:** Raise `FileNotFoundError` with actionable message (same pattern as `load_scoring_jsonl`).
- **Model dir already exists:** Overwrite with a warning log line. Re-running always produces a fresh checkpoint.
---
## Testing
- Unit test `WeightedTrainer.compute_loss` with a mock model and known label distribution — verify weighted loss differs from unweighted; verify `**kwargs` does not raise `TypeError`
- Unit test `compute_metrics_for_trainer` — verify `macro_f1` key in output, correct value on known inputs
- Unit test `FineTunedAdapter.classify` with a mock pipeline — verify it returns a string from `LABELS` using `subject [SEP] body[:400]` format
- Unit test auto-discovery in `benchmark_classifier.py` — mock `models/` dir with two `training_info.json` files, verify both appear in the active registry
- Integration test: fine-tune on `data/email_score.jsonl.example` (8 samples, 5 of 10 labels represented, 1 epoch, `--model deberta-small`). The 5 missing labels trigger the `< 2 total samples` drop path — the test must verify the drop warning is emitted for each missing label rather than treating it as a failure. Verify `models/avocet-deberta-small/training_info.json` is written with correct keys.
---
## Out of Scope
- Pushing fine-tuned weights to HuggingFace Hub (future)
- Cross-validation or k-fold evaluation (future — dataset too small to be meaningful now)
- Hyperparameter search (future)
- LoRA/PEFT adapter fine-tuning (future — relevant if model sizes grow beyond available VRAM)
- Fine-tuning models other than `deberta-small` and `bge-m3`

234
manage.sh
View file

@ -19,8 +19,9 @@ LOG_FILE="${LOG_DIR}/label_tool.log"
DEFAULT_PORT=8503
CONDA_BASE="${CONDA_BASE:-/devl/miniconda3}"
ENV_UI="${AVOCET_ENV:-cf}"
ENV_UI="job-seeker"
ENV_BM="job-seeker-classifiers"
STREAMLIT="${CONDA_BASE}/envs/${ENV_UI}/bin/streamlit"
PYTHON_BM="${CONDA_BASE}/envs/${ENV_BM}/bin/python"
PYTHON_UI="${CONDA_BASE}/envs/${ENV_UI}/bin/python"
@ -78,11 +79,13 @@ usage() {
echo ""
echo " Usage: ./manage.sh <command> [args]"
echo ""
echo " Vue UI + FastAPI:"
echo -e " ${GREEN}start${NC} Build Vue SPA + start FastAPI on port 8503"
echo -e " ${GREEN}stop${NC} Stop FastAPI server"
echo -e " ${GREEN}restart${NC} Stop + rebuild + restart FastAPI server"
echo -e " ${GREEN}open${NC} Open Vue UI in browser (http://localhost:8503)"
echo " Label tool:"
echo -e " ${GREEN}start${NC} Start label tool UI (port collision-safe)"
echo -e " ${GREEN}stop${NC} Stop label tool UI"
echo -e " ${GREEN}restart${NC} Restart label tool UI"
echo -e " ${GREEN}status${NC} Show running state and port"
echo -e " ${GREEN}logs${NC} Tail label tool log output"
echo -e " ${GREEN}open${NC} Open label tool in browser"
echo ""
echo " Benchmark:"
echo -e " ${GREEN}benchmark [args]${NC} Run benchmark_classifier.py (args passed through)"
@ -90,8 +93,13 @@ usage() {
echo -e " ${GREEN}score [args]${NC} Shortcut: --score [args]"
echo -e " ${GREEN}compare [args]${NC} Shortcut: --compare [args]"
echo ""
echo " Vue API:"
echo -e " ${GREEN}start-api${NC} Build Vue SPA + start FastAPI on port 8503"
echo -e " ${GREEN}stop-api${NC} Stop FastAPI server"
echo -e " ${GREEN}restart-api${NC} Stop + rebuild + restart FastAPI server"
echo -e " ${GREEN}open-api${NC} Open Vue UI in browser (http://localhost:8503)"
echo ""
echo " Dev:"
echo -e " ${GREEN}dev${NC} Hot-reload: uvicorn --reload (:8503) + Vite HMR (:5173)"
echo -e " ${GREEN}test${NC} Run pytest suite"
echo ""
echo " Port defaults to ${DEFAULT_PORT}; auto-increments if occupied."
@ -113,102 +121,102 @@ shift || true
case "$CMD" in
start)
API_PID_FILE=".avocet-api.pid"
API_PORT=8503
if [[ -f "$API_PID_FILE" ]] && kill -0 "$(<"$API_PID_FILE")" 2>/dev/null; then
warn "API already running (PID $(<"$API_PID_FILE")) → http://localhost:${API_PORT}"
pid=$(_running_pid)
if [[ -n "$pid" ]]; then
port=$(_running_port)
warn "Already running (PID ${pid}) on port ${port} → http://localhost:${port}"
exit 0
fi
if [[ ! -x "$STREAMLIT" ]]; then
error "Streamlit not found at ${STREAMLIT}\nActivate env: conda run -n ${ENV_UI} ..."
fi
port=$(_find_free_port "$DEFAULT_PORT")
mkdir -p "$LOG_DIR"
API_LOG="${LOG_DIR}/api.log"
info "Building Vue SPA…"
(cd web && npm run build) >> "$API_LOG" 2>&1
info "Starting FastAPI on port ${API_PORT}"
nohup "$PYTHON_UI" -m uvicorn app.api:app \
--host 0.0.0.0 --port "$API_PORT" \
>> "$API_LOG" 2>&1 &
echo $! > "$API_PID_FILE"
# Poll until port is actually bound (up to 10 s), not just process alive
for _i in $(seq 1 20); do
sleep 0.5
if (echo "" >/dev/tcp/127.0.0.1/"$API_PORT") 2>/dev/null; then
success "Avocet started → http://localhost:${API_PORT} (PID $(<"$API_PID_FILE"))"
break
fi
if ! kill -0 "$(<"$API_PID_FILE")" 2>/dev/null; then
rm -f "$API_PID_FILE"
error "Server died during startup. Check ${API_LOG}"
fi
done
if ! (echo "" >/dev/tcp/127.0.0.1/"$API_PORT") 2>/dev/null; then
error "Server did not bind to port ${API_PORT} within 10 s. Check ${API_LOG}"
info "Starting label tool on port ${port}"
nohup "$STREAMLIT" run app/label_tool.py \
--server.port "$port" \
--server.headless true \
--server.fileWatcherType none \
>"$LOG_FILE" 2>&1 &
pid=$!
echo "$pid" > "$PID_FILE"
echo "$port" > "$PORT_FILE"
# Wait briefly and confirm the process survived
sleep 1
if kill -0 "$pid" 2>/dev/null; then
success "Avocet label tool started → http://localhost:${port} (PID ${pid})"
success "Logs: ${LOG_FILE}"
else
rm -f "$PID_FILE" "$PORT_FILE"
error "Process died immediately. Check ${LOG_FILE} for details."
fi
;;
stop)
API_PID_FILE=".avocet-api.pid"
if [[ ! -f "$API_PID_FILE" ]]; then
pid=$(_running_pid)
if [[ -z "$pid" ]]; then
warn "Not running."
exit 0
fi
PID="$(<"$API_PID_FILE")"
if kill -0 "$PID" 2>/dev/null; then
kill "$PID" && rm -f "$API_PID_FILE"
success "Stopped (PID ${PID})."
else
warn "Stale PID file (process ${PID} not running). Cleaning up."
rm -f "$API_PID_FILE"
info "Stopping label tool (PID ${pid})…"
kill "$pid"
# Wait up to 5 s for clean exit
for _ in $(seq 1 10); do
kill -0 "$pid" 2>/dev/null || break
sleep 0.5
done
if kill -0 "$pid" 2>/dev/null; then
warn "Process did not exit cleanly; sending SIGKILL…"
kill -9 "$pid" 2>/dev/null || true
fi
rm -f "$PID_FILE" "$PORT_FILE"
success "Stopped."
;;
restart)
bash "$0" stop
pid=$(_running_pid)
if [[ -n "$pid" ]]; then
info "Stopping existing process (PID ${pid})…"
kill "$pid"
for _ in $(seq 1 10); do
kill -0 "$pid" 2>/dev/null || break
sleep 0.5
done
kill -0 "$pid" 2>/dev/null && kill -9 "$pid" 2>/dev/null || true
rm -f "$PID_FILE" "$PORT_FILE"
fi
exec bash "$0" start
;;
dev)
API_PORT=8503
VITE_PORT=5173
DEV_API_PID_FILE=".avocet-dev-api.pid"
mkdir -p "$LOG_DIR"
DEV_API_LOG="${LOG_DIR}/dev-api.log"
if [[ -f "$DEV_API_PID_FILE" ]] && kill -0 "$(<"$DEV_API_PID_FILE")" 2>/dev/null; then
warn "Dev API already running (PID $(<"$DEV_API_PID_FILE"))"
status)
pid=$(_running_pid)
if [[ -n "$pid" ]]; then
port=$(_running_port)
success "Running — PID ${pid} port ${port} → http://localhost:${port}"
else
info "Starting uvicorn with --reload on port ${API_PORT}"
nohup "$PYTHON_UI" -m uvicorn app.api:app \
--host 0.0.0.0 --port "$API_PORT" --reload \
>> "$DEV_API_LOG" 2>&1 &
echo $! > "$DEV_API_PID_FILE"
# Wait for API to bind
for _i in $(seq 1 20); do
sleep 0.5
(echo "" >/dev/tcp/127.0.0.1/"$API_PORT") 2>/dev/null && break
if ! kill -0 "$(<"$DEV_API_PID_FILE")" 2>/dev/null; then
rm -f "$DEV_API_PID_FILE"
error "Dev API died during startup. Check ${DEV_API_LOG}"
warn "Not running."
fi
done
success "API (hot-reload) → http://localhost:${API_PORT}"
;;
logs)
if [[ ! -f "$LOG_FILE" ]]; then
warn "No log file found at ${LOG_FILE}. Has the tool been started?"
exit 0
fi
# Kill API on exit (Ctrl+C or Vite exits)
_cleanup_dev() {
local pid
pid=$(<"$DEV_API_PID_FILE" 2>/dev/null || true)
[[ -n "$pid" ]] && kill "$pid" 2>/dev/null && rm -f "$DEV_API_PID_FILE"
info "Dev servers stopped."
}
trap _cleanup_dev EXIT INT TERM
info "Starting Vite HMR on port ${VITE_PORT} (proxy /api → :${API_PORT})…"
success "Frontend (HMR) → http://localhost:${VITE_PORT}"
(cd web && npm run dev -- --host 0.0.0.0 --port "$VITE_PORT")
info "Tailing ${LOG_FILE} (Ctrl-C to stop)"
tail -f "$LOG_FILE"
;;
open)
URL="http://localhost:8503"
port=$(_running_port)
pid=$(_running_pid)
[[ -z "$pid" ]] && warn "Label tool does not appear to be running. Start with: ./manage.sh start"
URL="http://localhost:${port}"
info "Opening ${URL}"
if command -v xdg-open &>/dev/null; then
xdg-open "$URL"
@ -249,6 +257,72 @@ case "$CMD" in
exec "$0" benchmark --compare "$@"
;;
start-api)
API_PID_FILE=".avocet-api.pid"
API_PORT=8503
if [[ -f "$API_PID_FILE" ]] && kill -0 "$(<"$API_PID_FILE")" 2>/dev/null; then
warn "API already running (PID $(<"$API_PID_FILE")) → http://localhost:${API_PORT}"
exit 0
fi
mkdir -p "$LOG_DIR"
API_LOG="${LOG_DIR}/api.log"
info "Building Vue SPA…"
(cd web && npm run build) >> "$API_LOG" 2>&1
info "Starting FastAPI on port ${API_PORT}"
nohup "$PYTHON_UI" -m uvicorn app.api:app \
--host 0.0.0.0 --port "$API_PORT" \
>> "$API_LOG" 2>&1 &
echo $! > "$API_PID_FILE"
# Poll until port is actually bound (up to 10 s), not just process alive
for _i in $(seq 1 20); do
sleep 0.5
if (echo "" >/dev/tcp/127.0.0.1/"$API_PORT") 2>/dev/null; then
success "Avocet API started → http://localhost:${API_PORT} (PID $(<"$API_PID_FILE"))"
break
fi
if ! kill -0 "$(<"$API_PID_FILE")" 2>/dev/null; then
rm -f "$API_PID_FILE"
error "API died during startup. Check ${API_LOG}"
fi
done
if ! (echo "" >/dev/tcp/127.0.0.1/"$API_PORT") 2>/dev/null; then
error "API did not bind to port ${API_PORT} within 10 s. Check ${API_LOG}"
fi
;;
stop-api)
API_PID_FILE=".avocet-api.pid"
if [[ ! -f "$API_PID_FILE" ]]; then
warn "API not running."
exit 0
fi
PID="$(<"$API_PID_FILE")"
if kill -0 "$PID" 2>/dev/null; then
kill "$PID" && rm -f "$API_PID_FILE"
success "API stopped (PID ${PID})."
else
warn "Stale PID file (process ${PID} not running). Cleaning up."
rm -f "$API_PID_FILE"
fi
;;
restart-api)
bash "$0" stop-api
exec bash "$0" start-api
;;
open-api)
URL="http://localhost:8503"
info "Opening ${URL}"
if command -v xdg-open &>/dev/null; then
xdg-open "$URL"
elif command -v open &>/dev/null; then
open "$URL"
else
echo "$URL"
fi
;;
help|--help|-h)
usage
;;

View file

@ -1,110 +0,0 @@
"""Avocet — SFT candidate run discovery and JSONL import.
No FastAPI dependency pure Python file operations.
Used by app/sft.py endpoints and can be run standalone.
"""
from __future__ import annotations
import json
import logging
from pathlib import Path
logger = logging.getLogger(__name__)
_CANDIDATES_FILENAME = "sft_candidates.jsonl"
def discover_runs(bench_results_dir: Path) -> list[dict]:
"""Return one entry per run subdirectory that contains sft_candidates.jsonl.
Sorted newest-first by directory name (directories are named YYYY-MM-DD-HHMMSS
by the cf-orch benchmark harness, so lexicographic order is chronological).
Each entry: {run_id, timestamp, candidate_count, sft_path}
"""
if not bench_results_dir.exists() or not bench_results_dir.is_dir():
return []
runs = []
for subdir in bench_results_dir.iterdir():
if not subdir.is_dir():
continue
sft_path = subdir / _CANDIDATES_FILENAME
if not sft_path.exists():
continue
records = _read_jsonl(sft_path)
runs.append({
"run_id": subdir.name,
"timestamp": subdir.name,
"candidate_count": len(records),
"sft_path": sft_path,
})
runs.sort(key=lambda r: r["run_id"], reverse=True)
return runs
def import_run(sft_path: Path, data_dir: Path) -> dict[str, int]:
"""Append records from sft_path into data_dir/sft_candidates.jsonl.
Deduplicates on the `id` field records whose id already exists in the
destination file are skipped silently. Records missing an `id` field are
also skipped (malformed input from a partial benchmark write).
Returns {imported: N, skipped: M}.
"""
dest = data_dir / _CANDIDATES_FILENAME
existing_ids = _read_existing_ids(dest)
new_records: list[dict] = []
skipped = 0
for record in _read_jsonl(sft_path):
if "id" not in record:
logger.warning("Skipping record missing 'id' field in %s", sft_path)
continue # malformed — skip without crashing
if record["id"] in existing_ids:
skipped += 1
continue
new_records.append(record)
existing_ids.add(record["id"])
if new_records:
dest.parent.mkdir(parents=True, exist_ok=True)
with open(dest, "a", encoding="utf-8") as fh:
for r in new_records:
fh.write(json.dumps(r) + "\n")
return {"imported": len(new_records), "skipped": skipped}
def _read_jsonl(path: Path) -> list[dict]:
"""Read a JSONL file, returning valid records. Skips blank lines and malformed JSON."""
if not path.exists():
return []
records: list[dict] = []
for line in path.read_text(encoding="utf-8").splitlines():
line = line.strip()
if not line:
continue
try:
records.append(json.loads(line))
except json.JSONDecodeError as exc:
logger.warning("Skipping malformed JSON line in %s: %s", path, exc)
return records
def _read_existing_ids(path: Path) -> set[str]:
"""Read only the id field from each line of a JSONL file."""
if not path.exists():
return set()
ids: set[str] = set()
with path.open() as f:
for line in f:
line = line.strip()
if not line:
continue
try:
record = json.loads(line)
if "id" in record:
ids.add(record["id"])
except json.JSONDecodeError:
pass # corrupt line, skip silently (ids file is our own output)
return ids

View file

@ -5,83 +5,83 @@ These functions are stdlib-only and safe to test without an IMAP connection.
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
from app.utils import extract_body, strip_html
from app.label_tool import _extract_body, _strip_html
# ── strip_html ──────────────────────────────────────────────────────────────
# ── _strip_html ──────────────────────────────────────────────────────────────
def test_strip_html_removes_tags():
assert strip_html("<p>Hello <b>world</b></p>") == "Hello world"
assert _strip_html("<p>Hello <b>world</b></p>") == "Hello world"
def test_strip_html_skips_script_content():
result = strip_html("<script>doEvil()</script><p>real</p>")
result = _strip_html("<script>doEvil()</script><p>real</p>")
assert "doEvil" not in result
assert "real" in result
def test_strip_html_skips_style_content():
result = strip_html("<style>.foo{color:red}</style><p>visible</p>")
result = _strip_html("<style>.foo{color:red}</style><p>visible</p>")
assert ".foo" not in result
assert "visible" in result
def test_strip_html_handles_br_as_newline():
result = strip_html("line1<br>line2")
result = _strip_html("line1<br>line2")
assert "line1" in result
assert "line2" in result
def test_strip_html_decodes_entities():
# convert_charrefs=True on HTMLParser handles &amp; etc.
result = strip_html("<p>Hello &amp; welcome</p>")
result = _strip_html("<p>Hello &amp; welcome</p>")
assert "&amp;" not in result
assert "Hello" in result
assert "welcome" in result
def test_strip_html_empty_string():
assert strip_html("") == ""
assert _strip_html("") == ""
def test_strip_html_plain_text_passthrough():
assert strip_html("no tags here") == "no tags here"
assert _strip_html("no tags here") == "no tags here"
# ── extract_body ────────────────────────────────────────────────────────────
# ── _extract_body ────────────────────────────────────────────────────────────
def test_extract_body_prefers_plain_over_html():
msg = MIMEMultipart("alternative")
msg.attach(MIMEText("plain body", "plain"))
msg.attach(MIMEText("<html><body>html body</body></html>", "html"))
assert extract_body(msg) == "plain body"
assert _extract_body(msg) == "plain body"
def test_extract_body_falls_back_to_html_when_no_plain():
msg = MIMEMultipart("alternative")
msg.attach(MIMEText("<html><body><p>HTML only email</p></body></html>", "html"))
result = extract_body(msg)
result = _extract_body(msg)
assert "HTML only email" in result
assert "<" not in result # no raw HTML tags leaked through
def test_extract_body_non_multipart_html_stripped():
msg = MIMEText("<html><body><p>Solo HTML</p></body></html>", "html")
result = extract_body(msg)
result = _extract_body(msg)
assert "Solo HTML" in result
assert "<html>" not in result
def test_extract_body_non_multipart_plain_unchanged():
msg = MIMEText("just plain text", "plain")
assert extract_body(msg) == "just plain text"
assert _extract_body(msg) == "just plain text"
def test_extract_body_empty_message():
msg = MIMEText("", "plain")
assert extract_body(msg) == ""
assert _extract_body(msg) == ""
def test_extract_body_multipart_empty_returns_empty():
msg = MIMEMultipart("alternative")
assert extract_body(msg) == ""
assert _extract_body(msg) == ""

View file

@ -1,399 +0,0 @@
"""Tests for app/models.py — /api/models/* endpoints."""
from __future__ import annotations
import json
from pathlib import Path
from unittest.mock import MagicMock, patch
import pytest
from fastapi.testclient import TestClient
# ── Fixtures ───────────────────────────────────────────────────────────────────
@pytest.fixture(autouse=True)
def reset_models_globals(tmp_path):
"""Redirect module-level dirs to tmp_path and reset download progress."""
from app import models as models_module
prev_models = models_module._MODELS_DIR
prev_queue = models_module._QUEUE_DIR
prev_progress = dict(models_module._download_progress)
models_dir = tmp_path / "models"
queue_dir = tmp_path / "data"
models_dir.mkdir()
queue_dir.mkdir()
models_module.set_models_dir(models_dir)
models_module.set_queue_dir(queue_dir)
models_module._download_progress = {}
yield
models_module.set_models_dir(prev_models)
models_module.set_queue_dir(prev_queue)
models_module._download_progress = prev_progress
@pytest.fixture
def client():
from app.api import app
return TestClient(app)
def _make_hf_response(repo_id: str = "org/model", pipeline_tag: str = "text-classification") -> dict:
"""Minimal HF API response payload."""
return {
"modelId": repo_id,
"pipeline_tag": pipeline_tag,
"tags": ["pytorch", pipeline_tag],
"downloads": 42000,
"siblings": [
{"rfilename": "pytorch_model.bin", "size": 500_000_000},
],
"cardData": {"description": "A test model description."},
}
def _queue_one(client, repo_id: str = "org/model") -> dict:
"""Helper: POST to /queue and return the created entry."""
r = client.post("/api/models/queue", json={
"repo_id": repo_id,
"pipeline_tag": "text-classification",
"adapter_recommendation": "ZeroShotAdapter",
})
assert r.status_code == 201, r.text
return r.json()
# ── GET /lookup ────────────────────────────────────────────────────────────────
def test_lookup_invalid_repo_id_returns_422_no_slash(client):
"""repo_id without a '/' should be rejected with 422."""
r = client.get("/api/models/lookup", params={"repo_id": "noslash"})
assert r.status_code == 422
def test_lookup_invalid_repo_id_returns_422_whitespace(client):
"""repo_id containing whitespace should be rejected with 422."""
r = client.get("/api/models/lookup", params={"repo_id": "org/model name"})
assert r.status_code == 422
def test_lookup_hf_404_returns_404(client):
"""HF API returning 404 should surface as HTTP 404."""
mock_resp = MagicMock()
mock_resp.status_code = 404
with patch("app.models.httpx.get", return_value=mock_resp):
r = client.get("/api/models/lookup", params={"repo_id": "org/nonexistent"})
assert r.status_code == 404
def test_lookup_hf_network_error_returns_502(client):
"""Network error reaching HF API should return 502."""
import httpx as _httpx
with patch("app.models.httpx.get", side_effect=_httpx.RequestError("timeout")):
r = client.get("/api/models/lookup", params={"repo_id": "org/model"})
assert r.status_code == 502
def test_lookup_returns_correct_shape(client):
"""Successful lookup returns all required fields."""
mock_resp = MagicMock()
mock_resp.status_code = 200
mock_resp.json.return_value = _make_hf_response("org/mymodel", "text-classification")
with patch("app.models.httpx.get", return_value=mock_resp):
r = client.get("/api/models/lookup", params={"repo_id": "org/mymodel"})
assert r.status_code == 200
data = r.json()
assert data["repo_id"] == "org/mymodel"
assert data["pipeline_tag"] == "text-classification"
assert data["adapter_recommendation"] == "ZeroShotAdapter"
assert data["model_size_bytes"] == 500_000_000
assert data["downloads"] == 42000
assert data["already_installed"] is False
assert data["already_queued"] is False
def test_lookup_unknown_pipeline_tag_returns_null_adapter(client):
"""An unrecognised pipeline_tag yields adapter_recommendation=null."""
mock_resp = MagicMock()
mock_resp.status_code = 200
mock_resp.json.return_value = _make_hf_response("org/m", "audio-classification")
with patch("app.models.httpx.get", return_value=mock_resp):
r = client.get("/api/models/lookup", params={"repo_id": "org/m"})
assert r.status_code == 200
assert r.json()["adapter_recommendation"] is None
def test_lookup_already_queued_flag(client):
"""already_queued is True when repo_id is in the pending queue."""
_queue_one(client, "org/queued-model")
mock_resp = MagicMock()
mock_resp.status_code = 200
mock_resp.json.return_value = _make_hf_response("org/queued-model")
with patch("app.models.httpx.get", return_value=mock_resp):
r = client.get("/api/models/lookup", params={"repo_id": "org/queued-model"})
assert r.status_code == 200
assert r.json()["already_queued"] is True
# ── GET /queue ─────────────────────────────────────────────────────────────────
def test_queue_empty_initially(client):
r = client.get("/api/models/queue")
assert r.status_code == 200
assert r.json() == []
def test_queue_add_and_list(client):
"""POST then GET /queue should return the entry."""
entry = _queue_one(client, "org/my-model")
r = client.get("/api/models/queue")
assert r.status_code == 200
items = r.json()
assert len(items) == 1
assert items[0]["repo_id"] == "org/my-model"
assert items[0]["status"] == "pending"
assert items[0]["id"] == entry["id"]
def test_queue_add_returns_entry_fields(client):
"""POST /queue returns an entry with all expected fields."""
entry = _queue_one(client)
assert "id" in entry
assert "queued_at" in entry
assert entry["status"] == "pending"
assert entry["pipeline_tag"] == "text-classification"
assert entry["adapter_recommendation"] == "ZeroShotAdapter"
# ── POST /queue — 409 duplicate ────────────────────────────────────────────────
def test_queue_duplicate_returns_409(client):
"""Posting the same repo_id twice should return 409."""
_queue_one(client, "org/dup-model")
r = client.post("/api/models/queue", json={
"repo_id": "org/dup-model",
"pipeline_tag": "text-classification",
"adapter_recommendation": "ZeroShotAdapter",
})
assert r.status_code == 409
def test_queue_multiple_different_models(client):
"""Multiple distinct repo_ids should all be accepted."""
_queue_one(client, "org/model-a")
_queue_one(client, "org/model-b")
_queue_one(client, "org/model-c")
r = client.get("/api/models/queue")
assert r.status_code == 200
assert len(r.json()) == 3
# ── DELETE /queue/{id} — dismiss ──────────────────────────────────────────────
def test_queue_dismiss(client):
"""DELETE /queue/{id} sets status=dismissed; entry not returned by GET /queue."""
entry = _queue_one(client)
entry_id = entry["id"]
r = client.delete(f"/api/models/queue/{entry_id}")
assert r.status_code == 200
assert r.json() == {"ok": True}
r2 = client.get("/api/models/queue")
assert r2.status_code == 200
assert r2.json() == []
def test_queue_dismiss_nonexistent_returns_404(client):
"""DELETE /queue/{id} with unknown id returns 404."""
r = client.delete("/api/models/queue/does-not-exist")
assert r.status_code == 404
def test_queue_dismiss_allows_re_queue(client):
"""After dismissal the same repo_id can be queued again."""
entry = _queue_one(client, "org/requeue-model")
client.delete(f"/api/models/queue/{entry['id']}")
r = client.post("/api/models/queue", json={
"repo_id": "org/requeue-model",
"pipeline_tag": None,
"adapter_recommendation": None,
})
assert r.status_code == 201
# ── POST /queue/{id}/approve ───────────────────────────────────────────────────
def test_approve_nonexistent_returns_404(client):
"""Approving an unknown id returns 404."""
r = client.post("/api/models/queue/ghost-id/approve")
assert r.status_code == 404
def test_approve_non_pending_returns_409(client):
"""Approving an entry that is not in 'pending' state returns 409."""
from app import models as models_module
entry = _queue_one(client)
# Manually flip status to 'failed'
models_module._update_queue_entry(entry["id"], {"status": "failed"})
r = client.post(f"/api/models/queue/{entry['id']}/approve")
assert r.status_code == 409
def test_approve_starts_download_and_returns_ok(client):
"""Approving a pending entry returns {ok: true} and starts a background thread."""
import time
import threading
entry = _queue_one(client)
# Patch snapshot_download so the thread doesn't actually hit the network.
# Use an Event so we can wait for the thread to finish before asserting.
thread_done = threading.Event()
original_run = None
def _fake_snapshot_download(**kwargs):
pass
with patch("app.models.snapshot_download", side_effect=_fake_snapshot_download):
r = client.post(f"/api/models/queue/{entry['id']}/approve")
assert r.status_code == 200
assert r.json() == {"ok": True}
# Give the background thread a moment to complete while snapshot_download is patched
time.sleep(0.3)
# Queue entry status should have moved to 'downloading' (or 'ready' if fast)
from app import models as models_module
updated = models_module._get_queue_entry(entry["id"])
assert updated is not None, "Queue entry not found — thread may have run after fixture teardown"
assert updated["status"] in ("downloading", "ready", "failed")
# ── GET /download/stream ───────────────────────────────────────────────────────
def test_download_stream_idle_when_no_download(client):
"""GET /download/stream returns a single idle event when nothing is downloading."""
r = client.get("/api/models/download/stream")
assert r.status_code == 200
# SSE body should contain the idle event
assert "idle" in r.text
# ── GET /installed ─────────────────────────────────────────────────────────────
def test_installed_empty(client):
"""GET /installed returns [] when models dir is empty."""
r = client.get("/api/models/installed")
assert r.status_code == 200
assert r.json() == []
def test_installed_detects_downloaded_model(client, tmp_path):
"""A subdir with config.json is surfaced as type='downloaded'."""
from app import models as models_module
model_dir = models_module._MODELS_DIR / "org--mymodel"
model_dir.mkdir()
(model_dir / "config.json").write_text(json.dumps({"model_type": "bert"}), encoding="utf-8")
(model_dir / "model_info.json").write_text(
json.dumps({"repo_id": "org/mymodel", "adapter_recommendation": "ZeroShotAdapter"}),
encoding="utf-8",
)
r = client.get("/api/models/installed")
assert r.status_code == 200
items = r.json()
assert len(items) == 1
assert items[0]["type"] == "downloaded"
assert items[0]["name"] == "org--mymodel"
assert items[0]["adapter"] == "ZeroShotAdapter"
assert items[0]["model_id"] == "org/mymodel"
def test_installed_detects_finetuned_model(client):
"""A subdir with training_info.json is surfaced as type='finetuned'."""
from app import models as models_module
model_dir = models_module._MODELS_DIR / "my-finetuned"
model_dir.mkdir()
(model_dir / "training_info.json").write_text(
json.dumps({"base_model": "org/base", "epochs": 5}), encoding="utf-8"
)
r = client.get("/api/models/installed")
assert r.status_code == 200
items = r.json()
assert len(items) == 1
assert items[0]["type"] == "finetuned"
assert items[0]["name"] == "my-finetuned"
# ── DELETE /installed/{name} ───────────────────────────────────────────────────
def test_delete_installed_removes_directory(client):
"""DELETE /installed/{name} removes the directory and returns {ok: true}."""
from app import models as models_module
model_dir = models_module._MODELS_DIR / "org--removeme"
model_dir.mkdir()
(model_dir / "config.json").write_text("{}", encoding="utf-8")
r = client.delete("/api/models/installed/org--removeme")
assert r.status_code == 200
assert r.json() == {"ok": True}
assert not model_dir.exists()
def test_delete_installed_not_found_returns_404(client):
r = client.delete("/api/models/installed/does-not-exist")
assert r.status_code == 404
def test_delete_installed_path_traversal_blocked(client):
"""DELETE /installed/../../etc must be blocked (400 or 422)."""
r = client.delete("/api/models/installed/../../etc")
assert r.status_code in (400, 404, 422)
def test_delete_installed_dotdot_name_blocked(client):
"""A name containing '..' in any form must be rejected."""
r = client.delete("/api/models/installed/..%2F..%2Fetc")
assert r.status_code in (400, 404, 422)
def test_delete_installed_name_with_slash_blocked(client):
"""A name containing a literal '/' after URL decoding must be rejected."""
from app import models as models_module
# The router will see the path segment after /installed/ — a second '/' would
# be parsed as a new path segment, so we test via the validation helper directly.
with pytest.raises(Exception):
# Simulate calling delete logic with a slash-containing name directly
from fastapi import HTTPException as _HTTPException
from app.models import delete_installed
try:
delete_installed("org/traversal")
except _HTTPException as exc:
assert exc.status_code in (400, 404)
raise

View file

@ -1,377 +0,0 @@
"""API integration tests for app/sft.py — /api/sft/* endpoints."""
import json
import pytest
from fastapi.testclient import TestClient
from pathlib import Path
@pytest.fixture(autouse=True)
def reset_sft_globals(tmp_path):
from app import sft as sft_module
_prev_data = sft_module._SFT_DATA_DIR
_prev_cfg = sft_module._SFT_CONFIG_DIR
sft_module.set_sft_data_dir(tmp_path)
sft_module.set_sft_config_dir(tmp_path)
yield
sft_module.set_sft_data_dir(_prev_data)
sft_module.set_sft_config_dir(_prev_cfg)
@pytest.fixture
def client():
from app.api import app
return TestClient(app)
def _make_record(id: str, run_id: str = "2026-04-07-143022") -> dict:
return {
"id": id, "source": "cf-orch-benchmark",
"benchmark_run_id": run_id, "timestamp": "2026-04-07T10:00:00Z",
"status": "needs_review",
"prompt_messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Write a Python function that adds two numbers."},
],
"model_response": "def add(a, b): return a - b",
"corrected_response": None,
"quality_score": 0.2, "failure_reason": "pattern_match: 0/2 matched",
"task_id": "code-fn", "task_type": "code",
"task_name": "Code: Write a Python function",
"model_id": "Qwen/Qwen2.5-3B", "model_name": "Qwen2.5-3B",
"node_id": "heimdall", "gpu_id": 0, "tokens_per_sec": 38.4,
}
def _write_run(tmp_path, run_id: str, records: list[dict]) -> Path:
run_dir = tmp_path / "bench_results" / run_id
run_dir.mkdir(parents=True)
sft_path = run_dir / "sft_candidates.jsonl"
sft_path.write_text(
"\n".join(json.dumps(r) for r in records) + "\n", encoding="utf-8"
)
return sft_path
def _write_config(tmp_path, bench_results_dir: Path) -> None:
import yaml
cfg = {"sft": {"bench_results_dir": str(bench_results_dir)}}
(tmp_path / "label_tool.yaml").write_text(
yaml.dump(cfg, allow_unicode=True), encoding="utf-8"
)
# ── /api/sft/runs ──────────────────────────────────────────────────────────
def test_runs_returns_empty_when_no_config(client):
r = client.get("/api/sft/runs")
assert r.status_code == 200
assert r.json() == []
def test_runs_returns_available_runs(client, tmp_path):
_write_run(tmp_path, "2026-04-07-143022", [_make_record("a"), _make_record("b")])
_write_config(tmp_path, tmp_path / "bench_results")
r = client.get("/api/sft/runs")
assert r.status_code == 200
data = r.json()
assert len(data) == 1
assert data[0]["run_id"] == "2026-04-07-143022"
assert data[0]["candidate_count"] == 2
assert data[0]["already_imported"] is False
def test_runs_marks_already_imported(client, tmp_path):
_write_run(tmp_path, "2026-04-07-143022", [_make_record("a")])
_write_config(tmp_path, tmp_path / "bench_results")
from app import sft as sft_module
candidates = sft_module._candidates_file()
candidates.parent.mkdir(parents=True, exist_ok=True)
candidates.write_text(
json.dumps(_make_record("a", run_id="2026-04-07-143022")) + "\n",
encoding="utf-8"
)
r = client.get("/api/sft/runs")
assert r.json()[0]["already_imported"] is True
# ── /api/sft/import ─────────────────────────────────────────────────────────
def test_import_adds_records(client, tmp_path):
_write_run(tmp_path, "2026-04-07-143022", [_make_record("a"), _make_record("b")])
_write_config(tmp_path, tmp_path / "bench_results")
r = client.post("/api/sft/import", json={"run_id": "2026-04-07-143022"})
assert r.status_code == 200
assert r.json() == {"imported": 2, "skipped": 0}
def test_import_is_idempotent(client, tmp_path):
_write_run(tmp_path, "2026-04-07-143022", [_make_record("a")])
_write_config(tmp_path, tmp_path / "bench_results")
client.post("/api/sft/import", json={"run_id": "2026-04-07-143022"})
r = client.post("/api/sft/import", json={"run_id": "2026-04-07-143022"})
assert r.json() == {"imported": 0, "skipped": 1}
def test_import_unknown_run_returns_404(client, tmp_path):
_write_config(tmp_path, tmp_path / "bench_results")
r = client.post("/api/sft/import", json={"run_id": "nonexistent"})
assert r.status_code == 404
# ── /api/sft/queue ──────────────────────────────────────────────────────────
def _populate_candidates(tmp_path, records: list[dict]) -> None:
from app import sft as sft_module
path = sft_module._candidates_file()
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(
"\n".join(json.dumps(r) for r in records) + "\n", encoding="utf-8"
)
def test_queue_returns_needs_review_only(client, tmp_path):
records = [
_make_record("a"), # needs_review
{**_make_record("b"), "status": "approved"}, # should not appear
{**_make_record("c"), "status": "discarded"}, # should not appear
]
_populate_candidates(tmp_path, records)
r = client.get("/api/sft/queue")
assert r.status_code == 200
data = r.json()
assert data["total"] == 1
assert len(data["items"]) == 1
assert data["items"][0]["id"] == "a"
def test_queue_pagination(client, tmp_path):
records = [_make_record(str(i)) for i in range(25)]
_populate_candidates(tmp_path, records)
r = client.get("/api/sft/queue?page=1&per_page=10")
data = r.json()
assert data["total"] == 25
assert len(data["items"]) == 10
r2 = client.get("/api/sft/queue?page=3&per_page=10")
assert len(r2.json()["items"]) == 5
def test_queue_empty_when_no_file(client):
r = client.get("/api/sft/queue")
assert r.status_code == 200
assert r.json() == {"items": [], "total": 0, "page": 1, "per_page": 20}
# ── /api/sft/submit ─────────────────────────────────────────────────────────
def test_submit_correct_sets_approved(client, tmp_path):
_populate_candidates(tmp_path, [_make_record("a")])
r = client.post("/api/sft/submit", json={
"id": "a", "action": "correct",
"corrected_response": "def add(a, b): return a + b",
})
assert r.status_code == 200
from app import sft as sft_module
records = sft_module._read_candidates()
assert records[0]["status"] == "approved"
assert records[0]["corrected_response"] == "def add(a, b): return a + b"
def test_submit_correct_also_appends_to_approved_file(client, tmp_path):
_populate_candidates(tmp_path, [_make_record("a")])
client.post("/api/sft/submit", json={
"id": "a", "action": "correct",
"corrected_response": "def add(a, b): return a + b",
})
from app import sft as sft_module
from app.utils import read_jsonl
approved = read_jsonl(sft_module._approved_file())
assert len(approved) == 1
assert approved[0]["id"] == "a"
def test_submit_discard_sets_discarded(client, tmp_path):
_populate_candidates(tmp_path, [_make_record("a")])
r = client.post("/api/sft/submit", json={"id": "a", "action": "discard"})
assert r.status_code == 200
from app import sft as sft_module
assert sft_module._read_candidates()[0]["status"] == "discarded"
def test_submit_flag_sets_model_rejected(client, tmp_path):
_populate_candidates(tmp_path, [_make_record("a")])
r = client.post("/api/sft/submit", json={"id": "a", "action": "flag"})
assert r.status_code == 200
from app import sft as sft_module
assert sft_module._read_candidates()[0]["status"] == "model_rejected"
def test_submit_correct_empty_response_returns_422(client, tmp_path):
_populate_candidates(tmp_path, [_make_record("a")])
r = client.post("/api/sft/submit", json={
"id": "a", "action": "correct", "corrected_response": " ",
})
assert r.status_code == 422
def test_submit_correct_null_response_returns_422(client, tmp_path):
_populate_candidates(tmp_path, [_make_record("a")])
r = client.post("/api/sft/submit", json={
"id": "a", "action": "correct", "corrected_response": None,
})
assert r.status_code == 422
def test_submit_unknown_id_returns_404(client, tmp_path):
r = client.post("/api/sft/submit", json={"id": "nope", "action": "discard"})
assert r.status_code == 404
def test_submit_already_approved_returns_409(client, tmp_path):
_populate_candidates(tmp_path, [{**_make_record("a"), "status": "approved"}])
r = client.post("/api/sft/submit", json={"id": "a", "action": "discard"})
assert r.status_code == 409
def test_submit_correct_stores_failure_category(client, tmp_path):
_populate_candidates(tmp_path, [_make_record("a")])
r = client.post("/api/sft/submit", json={
"id": "a", "action": "correct",
"corrected_response": "def add(a, b): return a + b",
"failure_category": "style_violation",
})
assert r.status_code == 200
from app import sft as sft_module
records = sft_module._read_candidates()
assert records[0]["failure_category"] == "style_violation"
def test_submit_correct_null_failure_category(client, tmp_path):
_populate_candidates(tmp_path, [_make_record("a")])
r = client.post("/api/sft/submit", json={
"id": "a", "action": "correct",
"corrected_response": "def add(a, b): return a + b",
})
assert r.status_code == 200
from app import sft as sft_module
records = sft_module._read_candidates()
assert records[0]["failure_category"] is None
def test_submit_invalid_failure_category_returns_422(client, tmp_path):
_populate_candidates(tmp_path, [_make_record("a")])
r = client.post("/api/sft/submit", json={
"id": "a", "action": "correct",
"corrected_response": "def add(a, b): return a + b",
"failure_category": "nonsense",
})
assert r.status_code == 422
# ── /api/sft/undo ────────────────────────────────────────────────────────────
def test_undo_restores_discarded_to_needs_review(client, tmp_path):
_populate_candidates(tmp_path, [_make_record("a")])
client.post("/api/sft/submit", json={"id": "a", "action": "discard"})
r = client.post("/api/sft/undo", json={"id": "a"})
assert r.status_code == 200
from app import sft as sft_module
assert sft_module._read_candidates()[0]["status"] == "needs_review"
def test_undo_removes_approved_from_approved_file(client, tmp_path):
_populate_candidates(tmp_path, [_make_record("a")])
client.post("/api/sft/submit", json={
"id": "a", "action": "correct",
"corrected_response": "def add(a, b): return a + b",
})
client.post("/api/sft/undo", json={"id": "a"})
from app import sft as sft_module
from app.utils import read_jsonl
approved = read_jsonl(sft_module._approved_file())
assert not any(r["id"] == "a" for r in approved)
def test_undo_already_needs_review_returns_409(client, tmp_path):
_populate_candidates(tmp_path, [_make_record("a")])
r = client.post("/api/sft/undo", json={"id": "a"})
assert r.status_code == 409
# ── /api/sft/export ──────────────────────────────────────────────────────────
def test_export_returns_approved_as_sft_jsonl(client, tmp_path):
from app import sft as sft_module
from app.utils import write_jsonl
approved = {
**_make_record("a"),
"status": "approved",
"corrected_response": "def add(a, b): return a + b",
"prompt_messages": [
{"role": "system", "content": "You are a coding assistant."},
{"role": "user", "content": "Write a Python add function."},
],
}
write_jsonl(sft_module._approved_file(), [approved])
_populate_candidates(tmp_path, [approved])
r = client.get("/api/sft/export")
assert r.status_code == 200
assert "application/x-ndjson" in r.headers["content-type"]
lines = [l for l in r.text.splitlines() if l.strip()]
assert len(lines) == 1
record = json.loads(lines[0])
assert record["messages"][-1] == {
"role": "assistant", "content": "def add(a, b): return a + b"
}
assert record["messages"][0]["role"] == "system"
assert record["messages"][1]["role"] == "user"
def test_export_excludes_non_approved(client, tmp_path):
from app import sft as sft_module
from app.utils import write_jsonl
records = [
{**_make_record("a"), "status": "discarded", "corrected_response": None},
{**_make_record("b"), "status": "needs_review", "corrected_response": None},
]
write_jsonl(sft_module._approved_file(), records)
r = client.get("/api/sft/export")
assert r.text.strip() == ""
def test_export_empty_when_no_approved_file(client):
r = client.get("/api/sft/export")
assert r.status_code == 200
assert r.text.strip() == ""
# ── /api/sft/stats ───────────────────────────────────────────────────────────
def test_stats_counts_by_status(client, tmp_path):
from app import sft as sft_module
from app.utils import write_jsonl
records = [
_make_record("a"),
{**_make_record("b"), "status": "approved", "corrected_response": "ok"},
{**_make_record("c"), "status": "discarded"},
{**_make_record("d"), "status": "model_rejected"},
]
_populate_candidates(tmp_path, records)
write_jsonl(sft_module._approved_file(), [records[1]])
r = client.get("/api/sft/stats")
assert r.status_code == 200
data = r.json()
assert data["total"] == 4
assert data["by_status"]["needs_review"] == 1
assert data["by_status"]["approved"] == 1
assert data["by_status"]["discarded"] == 1
assert data["by_status"]["model_rejected"] == 1
assert data["export_ready"] == 1
def test_stats_empty_when_no_data(client):
r = client.get("/api/sft/stats")
assert r.status_code == 200
data = r.json()
assert data["total"] == 0
assert data["export_ready"] == 0

View file

@ -1,95 +0,0 @@
"""Unit tests for scripts/sft_import.py — run discovery and JSONL deduplication."""
import json
import pytest
from pathlib import Path
def _write_candidates(path: Path, records: list[dict]) -> None:
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text("\n".join(json.dumps(r) for r in records) + "\n", encoding="utf-8")
def _make_record(id: str, run_id: str = "run1") -> dict:
return {
"id": id, "source": "cf-orch-benchmark",
"benchmark_run_id": run_id, "timestamp": "2026-04-07T10:00:00Z",
"status": "needs_review", "prompt_messages": [],
"model_response": "bad", "corrected_response": None,
"quality_score": 0.3, "failure_reason": "missing patterns",
"task_id": "code-fn", "task_type": "code", "task_name": "Code: fn",
"model_id": "Qwen/Qwen2.5-3B", "model_name": "Qwen2.5-3B",
"node_id": "heimdall", "gpu_id": 0, "tokens_per_sec": 38.4,
}
def test_discover_runs_empty_when_dir_missing(tmp_path):
from scripts.sft_import import discover_runs
result = discover_runs(tmp_path / "nonexistent")
assert result == []
def test_discover_runs_returns_runs(tmp_path):
from scripts.sft_import import discover_runs
run_dir = tmp_path / "2026-04-07-143022"
_write_candidates(run_dir / "sft_candidates.jsonl", [_make_record("a"), _make_record("b")])
result = discover_runs(tmp_path)
assert len(result) == 1
assert result[0]["run_id"] == "2026-04-07-143022"
assert result[0]["candidate_count"] == 2
assert "sft_path" in result[0]
def test_discover_runs_skips_dirs_without_sft_file(tmp_path):
from scripts.sft_import import discover_runs
(tmp_path / "2026-04-07-no-sft").mkdir()
result = discover_runs(tmp_path)
assert result == []
def test_discover_runs_sorted_newest_first(tmp_path):
from scripts.sft_import import discover_runs
for name in ["2026-04-05-120000", "2026-04-07-143022", "2026-04-06-090000"]:
run_dir = tmp_path / name
_write_candidates(run_dir / "sft_candidates.jsonl", [_make_record("x")])
result = discover_runs(tmp_path)
assert [r["run_id"] for r in result] == [
"2026-04-07-143022", "2026-04-06-090000", "2026-04-05-120000"
]
def test_import_run_imports_new_records(tmp_path):
from scripts.sft_import import import_run
sft_path = tmp_path / "run1" / "sft_candidates.jsonl"
_write_candidates(sft_path, [_make_record("a"), _make_record("b")])
result = import_run(sft_path, tmp_path)
assert result == {"imported": 2, "skipped": 0}
dest = tmp_path / "sft_candidates.jsonl"
lines = [json.loads(l) for l in dest.read_text().splitlines() if l.strip()]
assert len(lines) == 2
def test_import_run_deduplicates_on_id(tmp_path):
from scripts.sft_import import import_run
sft_path = tmp_path / "run1" / "sft_candidates.jsonl"
_write_candidates(sft_path, [_make_record("a"), _make_record("b")])
import_run(sft_path, tmp_path)
result = import_run(sft_path, tmp_path) # second import
assert result == {"imported": 0, "skipped": 2}
dest = tmp_path / "sft_candidates.jsonl"
lines = [l for l in dest.read_text().splitlines() if l.strip()]
assert len(lines) == 2 # no duplicates
def test_import_run_skips_records_missing_id(tmp_path, caplog):
import logging
from scripts.sft_import import import_run
sft_path = tmp_path / "run1" / "sft_candidates.jsonl"
sft_path.parent.mkdir()
sft_path.write_text(
json.dumps({"model_response": "bad", "status": "needs_review"}) + "\n"
+ json.dumps({"id": "abc123", "model_response": "good", "status": "needs_review"}) + "\n"
)
with caplog.at_level(logging.WARNING, logger="scripts.sft_import"):
result = import_run(sft_path, tmp_path)
assert result == {"imported": 1, "skipped": 0}
assert "missing 'id'" in caplog.text

View file

@ -66,8 +66,6 @@ const navItems = [
{ path: '/fetch', icon: '📥', label: 'Fetch' },
{ path: '/stats', icon: '📊', label: 'Stats' },
{ path: '/benchmark', icon: '🏁', label: 'Benchmark' },
{ path: '/models', icon: '🤗', label: 'Models' },
{ path: '/corrections', icon: '✍️', label: 'Corrections' },
{ path: '/settings', icon: '⚙️', label: 'Settings' },
]

View file

@ -1,179 +0,0 @@
import { mount } from '@vue/test-utils'
import SftCard from './SftCard.vue'
import type { SftQueueItem } from '../stores/sft'
import { describe, it, expect } from 'vitest'
const LOW_QUALITY_ITEM: SftQueueItem = {
id: 'abc', source: 'cf-orch-benchmark', benchmark_run_id: 'run1',
timestamp: '2026-04-07T10:00:00Z', status: 'needs_review',
prompt_messages: [
{ role: 'system', content: 'You are a coding assistant.' },
{ role: 'user', content: 'Write a Python add function.' },
],
model_response: 'def add(a, b): return a - b',
corrected_response: null, quality_score: 0.2,
failure_reason: 'pattern_match: 0/2 matched',
failure_category: null,
task_id: 'code-fn', task_type: 'code', task_name: 'Code: Write a function',
model_id: 'Qwen/Qwen2.5-3B', model_name: 'Qwen2.5-3B',
node_id: 'heimdall', gpu_id: 0, tokens_per_sec: 38.4,
}
const MID_QUALITY_ITEM: SftQueueItem = { ...LOW_QUALITY_ITEM, id: 'mid', quality_score: 0.55 }
const HIGH_QUALITY_ITEM: SftQueueItem = { ...LOW_QUALITY_ITEM, id: 'hi', quality_score: 0.72 }
describe('SftCard', () => {
it('renders model name chip', () => {
const w = mount(SftCard, { props: { item: LOW_QUALITY_ITEM } })
expect(w.text()).toContain('Qwen2.5-3B')
})
it('renders task type chip', () => {
const w = mount(SftCard, { props: { item: LOW_QUALITY_ITEM } })
expect(w.text()).toContain('code')
})
it('renders failure reason', () => {
const w = mount(SftCard, { props: { item: LOW_QUALITY_ITEM } })
expect(w.text()).toContain('pattern_match: 0/2 matched')
})
it('renders model response', () => {
const w = mount(SftCard, { props: { item: LOW_QUALITY_ITEM } })
expect(w.text()).toContain('def add(a, b): return a - b')
})
it('quality chip shows numeric value for low quality', () => {
const w = mount(SftCard, { props: { item: LOW_QUALITY_ITEM } })
expect(w.text()).toContain('0.20')
})
it('quality chip has low-quality class when score < 0.4', () => {
const w = mount(SftCard, { props: { item: LOW_QUALITY_ITEM } })
expect(w.find('[data-testid="quality-chip"]').classes()).toContain('quality-low')
})
it('quality chip has mid-quality class when score is 0.4 to <0.7', () => {
const w = mount(SftCard, { props: { item: MID_QUALITY_ITEM } })
expect(w.find('[data-testid="quality-chip"]').classes()).toContain('quality-mid')
})
it('quality chip has acceptable class when score >= 0.7', () => {
const w = mount(SftCard, { props: { item: HIGH_QUALITY_ITEM } })
expect(w.find('[data-testid="quality-chip"]').classes()).toContain('quality-ok')
})
it('clicking Correct button emits correct', async () => {
const w = mount(SftCard, { props: { item: LOW_QUALITY_ITEM } })
await w.find('[data-testid="correct-btn"]').trigger('click')
expect(w.emitted('correct')).toBeTruthy()
})
it('clicking Discard button then confirming emits discard', async () => {
const w = mount(SftCard, { props: { item: LOW_QUALITY_ITEM } })
await w.find('[data-testid="discard-btn"]').trigger('click')
await w.find('[data-testid="confirm-pending-btn"]').trigger('click')
expect(w.emitted('discard')).toBeTruthy()
})
it('clicking Flag Model button then confirming emits flag', async () => {
const w = mount(SftCard, { props: { item: LOW_QUALITY_ITEM } })
await w.find('[data-testid="flag-btn"]').trigger('click')
await w.find('[data-testid="confirm-pending-btn"]').trigger('click')
expect(w.emitted('flag')).toBeTruthy()
})
it('correction area hidden initially', () => {
const w = mount(SftCard, { props: { item: LOW_QUALITY_ITEM } })
expect(w.find('[data-testid="correction-area"]').exists()).toBe(false)
})
it('correction area shown when correcting prop is true', () => {
const w = mount(SftCard, { props: { item: LOW_QUALITY_ITEM, correcting: true } })
expect(w.find('[data-testid="correction-area"]').exists()).toBe(true)
})
it('renders nothing for failure reason when null', () => {
const item = { ...LOW_QUALITY_ITEM, failure_reason: null }
const w = mount(SftCard, { props: { item } })
expect(w.find('.failure-reason').exists()).toBe(false)
})
// ── Failure category chip-group ───────────────────────────────────
it('failure category section hidden when not correcting and no pending action', () => {
const w = mount(SftCard, { props: { item: LOW_QUALITY_ITEM } })
expect(w.find('[data-testid="failure-category-section"]').exists()).toBe(false)
})
it('failure category section shown when correcting prop is true', () => {
const w = mount(SftCard, { props: { item: LOW_QUALITY_ITEM, correcting: true } })
expect(w.find('[data-testid="failure-category-section"]').exists()).toBe(true)
})
it('renders all six category chips when correcting', () => {
const w = mount(SftCard, { props: { item: LOW_QUALITY_ITEM, correcting: true } })
const chips = w.findAll('.category-chip')
expect(chips).toHaveLength(6)
})
it('clicking a category chip selects it (adds active class)', async () => {
const w = mount(SftCard, { props: { item: LOW_QUALITY_ITEM, correcting: true } })
const chip = w.find('[data-testid="category-chip-wrong_answer"]')
await chip.trigger('click')
expect(chip.classes()).toContain('category-chip--active')
})
it('clicking the active chip again deselects it', async () => {
const w = mount(SftCard, { props: { item: LOW_QUALITY_ITEM, correcting: true } })
const chip = w.find('[data-testid="category-chip-hallucination"]')
await chip.trigger('click')
expect(chip.classes()).toContain('category-chip--active')
await chip.trigger('click')
expect(chip.classes()).not.toContain('category-chip--active')
})
it('only one chip can be active at a time', async () => {
const w = mount(SftCard, { props: { item: LOW_QUALITY_ITEM, correcting: true } })
await w.find('[data-testid="category-chip-wrong_answer"]').trigger('click')
await w.find('[data-testid="category-chip-hallucination"]').trigger('click')
const active = w.findAll('.category-chip--active')
expect(active).toHaveLength(1)
expect(active[0].attributes('data-testid')).toBe('category-chip-hallucination')
})
it('clicking Discard shows pending action row with category section', async () => {
const w = mount(SftCard, { props: { item: LOW_QUALITY_ITEM } })
await w.find('[data-testid="discard-btn"]').trigger('click')
expect(w.find('[data-testid="failure-category-section"]').exists()).toBe(true)
expect(w.find('[data-testid="pending-action-row"]').exists()).toBe(true)
})
it('clicking Flag shows pending action row', async () => {
const w = mount(SftCard, { props: { item: LOW_QUALITY_ITEM } })
await w.find('[data-testid="flag-btn"]').trigger('click')
expect(w.find('[data-testid="pending-action-row"]').exists()).toBe(true)
})
it('confirming discard emits discard with null when no category selected', async () => {
const w = mount(SftCard, { props: { item: LOW_QUALITY_ITEM } })
await w.find('[data-testid="discard-btn"]').trigger('click')
await w.find('[data-testid="confirm-pending-btn"]').trigger('click')
expect(w.emitted('discard')).toBeTruthy()
expect(w.emitted('discard')![0]).toEqual([null])
})
it('confirming discard emits discard with selected category', async () => {
const w = mount(SftCard, { props: { item: LOW_QUALITY_ITEM } })
await w.find('[data-testid="discard-btn"]').trigger('click')
await w.find('[data-testid="category-chip-scoring_artifact"]').trigger('click')
await w.find('[data-testid="confirm-pending-btn"]').trigger('click')
expect(w.emitted('discard')![0]).toEqual(['scoring_artifact'])
})
it('cancelling pending action hides the pending row', async () => {
const w = mount(SftCard, { props: { item: LOW_QUALITY_ITEM } })
await w.find('[data-testid="discard-btn"]').trigger('click')
await w.find('[data-testid="cancel-pending-btn"]').trigger('click')
expect(w.find('[data-testid="pending-action-row"]').exists()).toBe(false)
})
})

View file

@ -1,393 +0,0 @@
<template>
<article class="sft-card">
<!-- Chips row -->
<div class="chips-row">
<span class="chip chip-model">{{ item.model_name }}</span>
<span class="chip chip-task">{{ item.task_type }}</span>
<span class="chip chip-node">{{ item.node_id }} · GPU {{ item.gpu_id }}</span>
<span class="chip chip-speed">{{ item.tokens_per_sec.toFixed(1) }} tok/s</span>
<span
class="chip quality-chip"
:class="qualityClass"
data-testid="quality-chip"
:title="qualityLabel"
>
{{ item.quality_score.toFixed(2) }} · {{ qualityLabel }}
</span>
</div>
<!-- Failure reason -->
<p v-if="item.failure_reason" class="failure-reason">{{ item.failure_reason }}</p>
<!-- Prompt (collapsible) -->
<div class="prompt-section">
<button
class="prompt-toggle"
:aria-expanded="promptExpanded"
@click="promptExpanded = !promptExpanded"
>
{{ promptExpanded ? 'Hide prompt ↑' : 'Show full prompt ↓' }}
</button>
<div v-if="promptExpanded" class="prompt-messages">
<div
v-for="(msg, i) in item.prompt_messages"
:key="i"
class="prompt-message"
:class="`role-${msg.role}`"
>
<span class="role-label">{{ msg.role }}</span>
<pre class="message-content">{{ msg.content }}</pre>
</div>
</div>
</div>
<!-- Model response -->
<div class="model-response-section">
<p class="section-label">Model output (incorrect)</p>
<pre class="model-response">{{ item.model_response }}</pre>
</div>
<!-- Action bar -->
<div class="action-bar">
<button
data-testid="correct-btn"
class="btn-correct"
@click="$emit('correct')"
> Correct</button>
<button
data-testid="discard-btn"
class="btn-discard"
@click="emitWithCategory('discard')"
> Discard</button>
<button
data-testid="flag-btn"
class="btn-flag"
@click="emitWithCategory('flag')"
> Flag Model</button>
</div>
<!-- Failure category selector (shown when correcting or acting) -->
<div
v-if="correcting || pendingAction"
class="failure-category-section"
data-testid="failure-category-section"
>
<p class="section-label">Failure category <span class="optional-label">(optional)</span></p>
<div class="category-chips" role="group" aria-label="Failure category">
<button
v-for="cat in FAILURE_CATEGORIES"
:key="cat.value"
type="button"
class="category-chip"
:class="{ 'category-chip--active': selectedCategory === cat.value }"
:aria-pressed="selectedCategory === cat.value || undefined"
:data-testid="'category-chip-' + cat.value"
@click="toggleCategory(cat.value)"
>{{ cat.label }}</button>
</div>
<!-- Pending discard/flag confirm row -->
<div v-if="pendingAction" class="pending-action-row" data-testid="pending-action-row">
<button class="btn-confirm" @click="confirmPendingAction" data-testid="confirm-pending-btn">
Confirm {{ pendingAction }}
</button>
<button class="btn-cancel-pending" @click="cancelPendingAction" data-testid="cancel-pending-btn">
Cancel
</button>
</div>
</div>
<!-- Correction area (shown when correcting = true) -->
<div v-if="correcting" data-testid="correction-area">
<SftCorrectionArea
ref="correctionAreaEl"
:described-by="'sft-failure-' + item.id"
@submit="handleSubmitCorrection"
@cancel="$emit('cancel-correction')"
/>
</div>
</article>
</template>
<script setup lang="ts">
import { ref, computed } from 'vue'
import type { SftQueueItem, SftFailureCategory } from '../stores/sft'
import SftCorrectionArea from './SftCorrectionArea.vue'
const props = defineProps<{ item: SftQueueItem; correcting?: boolean }>()
const emit = defineEmits<{
correct: []
discard: [category: SftFailureCategory | null]
flag: [category: SftFailureCategory | null]
'submit-correction': [text: string, category: SftFailureCategory | null]
'cancel-correction': []
}>()
const FAILURE_CATEGORIES: { value: SftFailureCategory; label: string }[] = [
{ value: 'scoring_artifact', label: 'Scoring artifact' },
{ value: 'style_violation', label: 'Style violation' },
{ value: 'partial_answer', label: 'Partial answer' },
{ value: 'wrong_answer', label: 'Wrong answer' },
{ value: 'format_error', label: 'Format error' },
{ value: 'hallucination', label: 'Hallucination' },
]
const promptExpanded = ref(false)
const correctionAreaEl = ref<InstanceType<typeof SftCorrectionArea> | null>(null)
const selectedCategory = ref<SftFailureCategory | null>(null)
const pendingAction = ref<'discard' | 'flag' | null>(null)
const qualityClass = computed(() => {
const s = props.item.quality_score
if (s < 0.4) return 'quality-low'
if (s < 0.7) return 'quality-mid'
return 'quality-ok'
})
const qualityLabel = computed(() => {
const s = props.item.quality_score
if (s < 0.4) return 'low quality'
if (s < 0.7) return 'fair'
return 'acceptable'
})
function toggleCategory(cat: SftFailureCategory) {
selectedCategory.value = selectedCategory.value === cat ? null : cat
}
function emitWithCategory(action: 'discard' | 'flag') {
pendingAction.value = action
}
function confirmPendingAction() {
if (!pendingAction.value) return
emit(pendingAction.value, selectedCategory.value)
pendingAction.value = null
selectedCategory.value = null
}
function cancelPendingAction() {
pendingAction.value = null
}
function handleSubmitCorrection(text: string) {
emit('submit-correction', text, selectedCategory.value)
selectedCategory.value = null
}
function resetCorrection() {
correctionAreaEl.value?.reset()
selectedCategory.value = null
pendingAction.value = null
}
defineExpose({ resetCorrection })
</script>
<style scoped>
.sft-card {
background: var(--color-surface-raised);
border: 1px solid var(--color-border);
border-radius: var(--radius-lg);
padding: var(--space-4);
display: flex;
flex-direction: column;
gap: var(--space-3);
}
.chips-row {
display: flex;
flex-wrap: wrap;
gap: var(--space-2);
}
.chip {
padding: var(--space-1) var(--space-2);
border-radius: var(--radius-full);
font-size: 0.78rem;
font-weight: 600;
white-space: nowrap;
}
.chip-model { background: var(--color-primary-light, #e8f2e7); color: var(--color-primary); }
.chip-task { background: var(--color-surface-alt); color: var(--color-text-muted); }
.chip-node { background: var(--color-surface-alt); color: var(--color-text-muted); }
.chip-speed { background: var(--color-surface-alt); color: var(--color-text-muted); }
.quality-chip { color: #fff; }
.quality-low { background: var(--color-error, #c0392b); }
.quality-mid { background: var(--color-warning, #d4891a); }
.quality-ok { background: var(--color-success, #3a7a32); }
.failure-reason {
font-size: 0.82rem;
color: var(--color-text-muted);
font-style: italic;
}
.prompt-toggle {
background: none;
border: none;
color: var(--color-accent);
font-size: 0.85rem;
cursor: pointer;
padding: 0;
text-decoration: underline;
}
.prompt-messages {
margin-top: var(--space-2);
display: flex;
flex-direction: column;
gap: var(--space-2);
}
.prompt-message {
display: flex;
flex-direction: column;
gap: var(--space-1);
}
.role-label {
font-size: 0.75rem;
font-weight: 700;
text-transform: uppercase;
letter-spacing: 0.05em;
color: var(--color-text-muted);
}
.message-content {
font-family: var(--font-mono);
font-size: 0.82rem;
white-space: pre-wrap;
background: var(--color-surface-alt);
padding: var(--space-2) var(--space-3);
border-radius: var(--radius-md);
max-height: 200px;
overflow-y: auto;
}
.section-label {
font-size: 0.82rem;
font-weight: 600;
color: var(--color-text-muted);
margin-bottom: var(--space-1);
}
.model-response {
font-family: var(--font-mono);
font-size: 0.88rem;
white-space: pre-wrap;
background: color-mix(in srgb, var(--color-error, #c0392b) 8%, var(--color-surface-alt));
border-left: 3px solid var(--color-error, #c0392b);
padding: var(--space-3);
border-radius: var(--radius-md);
max-height: 300px;
overflow-y: auto;
}
.action-bar {
display: flex;
gap: var(--space-3);
flex-wrap: wrap;
}
.action-bar button {
padding: var(--space-2) var(--space-4);
border-radius: var(--radius-md);
border: 1px solid var(--color-border);
font-size: 0.9rem;
cursor: pointer;
background: var(--color-surface-raised);
color: var(--color-text);
}
.btn-correct { border-color: var(--color-success); color: var(--color-success); }
.btn-correct:hover { background: color-mix(in srgb, var(--color-success) 10%, transparent); }
.btn-discard { border-color: var(--color-error); color: var(--color-error); }
.btn-discard:hover { background: color-mix(in srgb, var(--color-error) 10%, transparent); }
.btn-flag { border-color: var(--color-warning); color: var(--color-warning); }
.btn-flag:hover { background: color-mix(in srgb, var(--color-warning) 10%, transparent); }
/* ── Failure category selector ─────────────────── */
.failure-category-section {
display: flex;
flex-direction: column;
gap: var(--space-2);
}
.optional-label {
font-size: 0.75rem;
font-weight: 400;
color: var(--color-text-muted);
}
.category-chips {
display: flex;
flex-wrap: wrap;
gap: var(--space-2);
}
.category-chip {
padding: var(--space-1) var(--space-3);
border-radius: var(--radius-full);
border: 1px solid var(--color-border);
background: var(--color-surface-alt);
color: var(--color-text-muted);
font-size: 0.78rem;
font-weight: 500;
cursor: pointer;
transition: background var(--transition), color var(--transition), border-color var(--transition);
}
.category-chip:hover {
border-color: var(--color-accent);
color: var(--color-accent);
background: var(--color-accent-light);
}
.category-chip--active {
background: var(--color-accent-light);
border-color: var(--color-accent);
color: var(--color-accent);
font-weight: 700;
}
.pending-action-row {
display: flex;
gap: var(--space-2);
margin-top: var(--space-1);
}
.btn-confirm {
padding: var(--space-1) var(--space-3);
border-radius: var(--radius-md);
border: 1px solid var(--color-accent);
background: var(--color-accent-light);
color: var(--color-accent);
font-size: 0.85rem;
font-weight: 600;
cursor: pointer;
}
.btn-confirm:hover {
background: color-mix(in srgb, var(--color-accent) 15%, transparent);
}
.btn-cancel-pending {
padding: var(--space-1) var(--space-3);
border-radius: var(--radius-md);
border: 1px solid var(--color-border);
background: none;
color: var(--color-text-muted);
font-size: 0.85rem;
cursor: pointer;
}
.btn-cancel-pending:hover {
background: var(--color-surface-alt);
}
</style>

View file

@ -1,68 +0,0 @@
import { mount } from '@vue/test-utils'
import SftCorrectionArea from './SftCorrectionArea.vue'
import { describe, it, expect } from 'vitest'
describe('SftCorrectionArea', () => {
it('renders a textarea', () => {
const w = mount(SftCorrectionArea)
expect(w.find('textarea').exists()).toBe(true)
})
it('submit button is disabled when textarea is empty', () => {
const w = mount(SftCorrectionArea)
const btn = w.find('[data-testid="submit-btn"]')
expect((btn.element as HTMLButtonElement).disabled).toBe(true)
})
it('submit button is disabled when textarea is whitespace only', async () => {
const w = mount(SftCorrectionArea)
await w.find('textarea').setValue(' ')
const btn = w.find('[data-testid="submit-btn"]')
expect((btn.element as HTMLButtonElement).disabled).toBe(true)
})
it('submit button is enabled when textarea has content', async () => {
const w = mount(SftCorrectionArea)
await w.find('textarea').setValue('def add(a, b): return a + b')
const btn = w.find('[data-testid="submit-btn"]')
expect((btn.element as HTMLButtonElement).disabled).toBe(false)
})
it('clicking submit emits submit with trimmed text', async () => {
const w = mount(SftCorrectionArea)
await w.find('textarea').setValue(' def add(a, b): return a + b ')
await w.find('[data-testid="submit-btn"]').trigger('click')
expect(w.emitted('submit')?.[0]).toEqual(['def add(a, b): return a + b'])
})
it('clicking cancel emits cancel', async () => {
const w = mount(SftCorrectionArea)
await w.find('[data-testid="cancel-btn"]').trigger('click')
expect(w.emitted('cancel')).toBeTruthy()
})
it('Escape key emits cancel', async () => {
const w = mount(SftCorrectionArea)
await w.find('textarea').trigger('keydown', { key: 'Escape' })
expect(w.emitted('cancel')).toBeTruthy()
})
it('Ctrl+Enter emits submit when text is non-empty', async () => {
const w = mount(SftCorrectionArea)
await w.find('textarea').setValue('correct answer')
await w.find('textarea').trigger('keydown', { key: 'Enter', ctrlKey: true })
expect(w.emitted('submit')?.[0]).toEqual(['correct answer'])
})
it('Ctrl+Enter does not emit submit when text is empty', async () => {
const w = mount(SftCorrectionArea)
await w.find('textarea').trigger('keydown', { key: 'Enter', ctrlKey: true })
expect(w.emitted('submit')).toBeFalsy()
})
it('omits aria-describedby when describedBy prop is not provided', () => {
const w = mount(SftCorrectionArea)
const textarea = w.find('textarea')
expect(textarea.attributes('aria-describedby')).toBeUndefined()
})
})

View file

@ -1,130 +0,0 @@
<template>
<div class="correction-area">
<label class="correction-label" for="correction-textarea">
Write the corrected response:
</label>
<textarea
id="correction-textarea"
ref="textareaEl"
v-model="text"
class="correction-textarea"
aria-label="Write corrected response"
aria-required="true"
:aria-describedby="describedBy || undefined"
placeholder="Write the response this model should have given..."
rows="4"
@keydown.escape="$emit('cancel')"
@keydown.enter.ctrl.prevent="submitIfValid"
@keydown.enter.meta.prevent="submitIfValid"
/>
<div class="correction-actions">
<button
data-testid="submit-btn"
class="btn-submit"
:disabled="!isValid"
@click="submitIfValid"
>
Submit correction
</button>
<button data-testid="cancel-btn" class="btn-cancel" @click="$emit('cancel')">
Cancel
</button>
</div>
</div>
</template>
<script setup lang="ts">
import { ref, computed, onMounted } from 'vue'
const props = withDefaults(defineProps<{ describedBy?: string }>(), { describedBy: undefined })
const emit = defineEmits<{ submit: [text: string]; cancel: [] }>()
const text = ref('')
const textareaEl = ref<HTMLTextAreaElement | null>(null)
const isValid = computed(() => text.value.trim().length > 0)
onMounted(() => textareaEl.value?.focus())
function submitIfValid() {
if (isValid.value) emit('submit', text.value.trim())
}
function reset() {
text.value = ''
}
defineExpose({ reset })
</script>
<style scoped>
.correction-area {
display: flex;
flex-direction: column;
gap: var(--space-3);
padding: var(--space-4);
border-top: 1px solid var(--color-border);
background: var(--color-surface-alt, var(--color-surface));
border-radius: 0 0 var(--radius-lg) var(--radius-lg);
}
.correction-label {
font-size: 0.85rem;
font-weight: 600;
color: var(--color-text-muted);
}
.correction-textarea {
width: 100%;
min-height: 7rem;
padding: var(--space-3);
border: 1px solid var(--color-border);
border-radius: var(--radius-md);
background: var(--color-surface-raised);
color: var(--color-text);
font-family: var(--font-mono);
font-size: 0.88rem;
line-height: 1.5;
resize: vertical;
}
.correction-textarea:focus {
outline: 2px solid var(--color-primary);
outline-offset: 1px;
}
.correction-actions {
display: flex;
gap: var(--space-3);
align-items: center;
}
.btn-submit {
padding: var(--space-2) var(--space-4);
background: var(--color-primary);
color: var(--color-text-inverse, #fff);
border: none;
border-radius: var(--radius-md);
font-size: 0.9rem;
cursor: pointer;
}
.btn-submit:disabled {
opacity: 0.45;
cursor: not-allowed;
}
.btn-submit:not(:disabled):hover {
background: var(--color-primary-hover, var(--color-primary));
}
.btn-cancel {
background: none;
border: none;
color: var(--color-text-muted);
font-size: 0.9rem;
cursor: pointer;
text-decoration: underline;
padding: 0;
}
</style>

View file

@ -1,42 +0,0 @@
// src/composables/useSftKeyboard.ts
import { onUnmounted, getCurrentInstance } from 'vue'
interface Options {
onCorrect: () => void
onDiscard: () => void
onFlag: () => void
onEscape: () => void
onSubmit: () => void
isEditing: () => boolean // returns true when correction area is open
}
export function useSftKeyboard(opts: Options) {
function handler(e: KeyboardEvent) {
// Never intercept keys when focus is in an input (correction textarea handles its own keys)
if (e.target instanceof HTMLInputElement) return
// When correction area is open, only handle Escape (textarea handles Ctrl+Enter itself)
if (e.target instanceof HTMLTextAreaElement) return
const k = e.key.toLowerCase()
if (opts.isEditing()) {
if (k === 'escape') opts.onEscape()
return
}
if (k === 'c') { opts.onCorrect(); return }
if (k === 'd') { opts.onDiscard(); return }
if (k === 'f') { opts.onFlag(); return }
if (k === 'escape') { opts.onEscape(); return }
}
window.addEventListener('keydown', handler)
const cleanup = () => window.removeEventListener('keydown', handler)
if (getCurrentInstance()) {
onUnmounted(cleanup)
}
return { cleanup }
}

View file

@ -6,8 +6,6 @@ const FetchView = () => import('../views/FetchView.vue')
const StatsView = () => import('../views/StatsView.vue')
const BenchmarkView = () => import('../views/BenchmarkView.vue')
const SettingsView = () => import('../views/SettingsView.vue')
const CorrectionsView = () => import('../views/CorrectionsView.vue')
const ModelsView = () => import('../views/ModelsView.vue')
export const router = createRouter({
history: createWebHashHistory(),
@ -16,8 +14,6 @@ export const router = createRouter({
{ path: '/fetch', component: FetchView, meta: { title: 'Fetch' } },
{ path: '/stats', component: StatsView, meta: { title: 'Stats' } },
{ path: '/benchmark', component: BenchmarkView, meta: { title: 'Benchmark' } },
{ path: '/models', component: ModelsView, meta: { title: 'Models' } },
{ path: '/corrections', component: CorrectionsView, meta: { title: 'Corrections' } },
{ path: '/settings', component: SettingsView, meta: { title: 'Settings' } },
],
})

View file

@ -1,78 +0,0 @@
import { setActivePinia, createPinia } from 'pinia'
import { useSftStore } from './sft'
import type { SftQueueItem } from './sft'
import { beforeEach, describe, it, expect } from 'vitest'
function makeMockItem(overrides: Partial<SftQueueItem> = {}): SftQueueItem {
return {
id: 'abc',
source: 'cf-orch-benchmark',
benchmark_run_id: 'run1',
timestamp: '2026-04-07T10:00:00Z',
status: 'needs_review',
prompt_messages: [
{ role: 'system', content: 'You are a coding assistant.' },
{ role: 'user', content: 'Write a Python add function.' },
],
model_response: 'def add(a, b): return a - b',
corrected_response: null,
quality_score: 0.2,
failure_reason: 'pattern_match: 0/2 matched',
task_id: 'code-fn',
task_type: 'code',
task_name: 'Code: Write a Python function',
model_id: 'Qwen/Qwen2.5-3B',
model_name: 'Qwen2.5-3B',
node_id: 'heimdall',
gpu_id: 0,
tokens_per_sec: 38.4,
...overrides,
}
}
describe('useSftStore', () => {
beforeEach(() => setActivePinia(createPinia()))
it('starts with empty queue', () => {
const store = useSftStore()
expect(store.queue).toEqual([])
expect(store.current).toBeNull()
})
it('current returns first item', () => {
const store = useSftStore()
store.queue = [makeMockItem()]
expect(store.current?.id).toBe('abc')
})
it('removeCurrentFromQueue removes first item', () => {
const store = useSftStore()
const second = makeMockItem({ id: 'def' })
store.queue = [makeMockItem(), second]
store.removeCurrentFromQueue()
expect(store.queue[0].id).toBe('def')
})
it('restoreItem adds to front of queue', () => {
const store = useSftStore()
const second = makeMockItem({ id: 'def' })
store.queue = [second]
store.restoreItem(makeMockItem())
expect(store.queue[0].id).toBe('abc')
expect(store.queue[1].id).toBe('def')
})
it('setLastAction records the action', () => {
const store = useSftStore()
store.setLastAction('discard', makeMockItem())
expect(store.lastAction?.type).toBe('discard')
expect(store.lastAction?.item.id).toBe('abc')
})
it('clearLastAction nulls lastAction', () => {
const store = useSftStore()
store.setLastAction('flag', makeMockItem())
store.clearLastAction()
expect(store.lastAction).toBeNull()
})
})

View file

@ -1,72 +0,0 @@
// src/stores/sft.ts
import { defineStore } from 'pinia'
import { computed, ref } from 'vue'
export type SftFailureCategory =
| 'scoring_artifact'
| 'style_violation'
| 'partial_answer'
| 'wrong_answer'
| 'format_error'
| 'hallucination'
export interface SftQueueItem {
id: string
source: 'cf-orch-benchmark'
benchmark_run_id: string
timestamp: string
status: 'needs_review' | 'approved' | 'discarded' | 'model_rejected'
prompt_messages: { role: string; content: string }[]
model_response: string
corrected_response: string | null
quality_score: number // 0.0 to 1.0
failure_reason: string | null
failure_category: SftFailureCategory | null
task_id: string
task_type: string
task_name: string
model_id: string
model_name: string
node_id: string
gpu_id: number
tokens_per_sec: number
}
export interface SftLastAction {
type: 'correct' | 'discard' | 'flag'
item: SftQueueItem
failure_category?: SftFailureCategory | null
}
export const useSftStore = defineStore('sft', () => {
const queue = ref<SftQueueItem[]>([])
const totalRemaining = ref(0)
const lastAction = ref<SftLastAction | null>(null)
const current = computed(() => queue.value[0] ?? null)
function removeCurrentFromQueue() {
queue.value.shift()
}
function setLastAction(
type: SftLastAction['type'],
item: SftQueueItem,
failure_category?: SftFailureCategory | null,
) {
lastAction.value = { type, item, failure_category }
}
function clearLastAction() {
lastAction.value = null
}
function restoreItem(item: SftQueueItem) {
queue.value.unshift(item)
}
return {
queue, totalRemaining, lastAction, current,
removeCurrentFromQueue, setLastAction, clearLastAction, restoreItem,
}
})

View file

@ -24,54 +24,6 @@
</div>
</header>
<!-- Model Picker -->
<details class="model-picker" ref="pickerEl">
<summary class="picker-summary">
<span class="picker-title">🎯 Model Selection</span>
<span class="picker-badge">{{ pickerSummaryText }}</span>
</summary>
<div class="picker-body">
<div v-if="modelsLoading" class="picker-loading">Loading models</div>
<div v-else-if="Object.keys(modelCategories).length === 0" class="picker-empty">
No models found check API connection.
</div>
<template v-else>
<div
v-for="(models, category) in modelCategories"
:key="category"
class="picker-category"
>
<label class="picker-cat-header">
<input
type="checkbox"
:checked="isCategoryAllSelected(models)"
:indeterminate="isCategoryIndeterminate(models)"
@change="toggleCategory(models, ($event.target as HTMLInputElement).checked)"
/>
<span class="picker-cat-name">{{ category }}</span>
<span class="picker-cat-count">({{ models.length }})</span>
</label>
<div v-if="models.length === 0" class="picker-no-models">No models installed</div>
<div v-else class="picker-model-list">
<label
v-for="m in models"
:key="m.name"
class="picker-model-row"
>
<input
type="checkbox"
:checked="selectedModels.has(m.name)"
@change="toggleModel(m.name, ($event.target as HTMLInputElement).checked)"
/>
<span class="picker-model-name" :title="m.repo_id ?? m.name">{{ m.name }}</span>
<span class="picker-adapter-type">{{ m.adapter_type }}</span>
</label>
</div>
</div>
</template>
</div>
</details>
<!-- Trained models badge row -->
<div v-if="fineTunedModels.length > 0" class="trained-models-row">
<span class="trained-label">Trained:</span>
@ -272,16 +224,6 @@ const LABEL_META: Record<string, { emoji: string }> = {
}
// Types
interface AvailableModel {
name: string
repo_id?: string
adapter_type: string
}
interface ModelCategoriesResponse {
categories: Record<string, AvailableModel[]>
}
interface FineTunedModel {
name: string
base_model_id?: string
@ -312,13 +254,6 @@ const runError = ref('')
const includeSlow = ref(false)
const logEl = ref<HTMLElement | null>(null)
// Model picker state
const modelCategories = ref<Record<string, AvailableModel[]>>({})
const selectedModels = ref<Set<string>>(new Set())
const allModels = ref<string[]>([])
const modelsLoading = ref(false)
const pickerEl = ref<HTMLDetailsElement | null>(null)
// Fine-tune state
const fineTunedModels = ref<FineTunedModel[]>([])
const ftModel = ref('deberta-small')
@ -339,52 +274,6 @@ async function cancelFinetune() {
await fetch('/api/finetune/cancel', { method: 'POST' }).catch(() => {})
}
// Model picker computed
const pickerSummaryText = computed(() => {
const total = allModels.value.length
if (total === 0) return 'No models available'
const selected = selectedModels.value.size
if (selected === total) return `All models (${total})`
return `${selected} of ${total} selected`
})
function isCategoryAllSelected(models: AvailableModel[]): boolean {
return models.length > 0 && models.every(m => selectedModels.value.has(m.name))
}
function isCategoryIndeterminate(models: AvailableModel[]): boolean {
const someSelected = models.some(m => selectedModels.value.has(m.name))
return someSelected && !isCategoryAllSelected(models)
}
function toggleModel(name: string, checked: boolean) {
const next = new Set(selectedModels.value)
if (checked) next.add(name)
else next.delete(name)
selectedModels.value = next
}
function toggleCategory(models: AvailableModel[], checked: boolean) {
const next = new Set(selectedModels.value)
for (const m of models) {
if (checked) next.add(m.name)
else next.delete(m.name)
}
selectedModels.value = next
}
async function loadModelCategories() {
modelsLoading.value = true
const { data } = await useApiFetch<ModelCategoriesResponse>('/api/benchmark/models')
modelsLoading.value = false
if (data?.categories) {
modelCategories.value = data.categories
const flat = Object.values(data.categories).flat().map(m => m.name)
allModels.value = flat
selectedModels.value = new Set(flat)
}
}
// Derived
const modelNames = computed(() => Object.keys(results.value?.models ?? {}))
const modelCount = computed(() => modelNames.value.length)
@ -466,16 +355,7 @@ function startBenchmark() {
runError.value = ''
runCancelled.value = false
const params = new URLSearchParams()
if (includeSlow.value) params.set('include_slow', 'true')
// Only send model_names when a subset is selected (not all, not none)
const total = allModels.value.length
const selected = selectedModels.value.size
if (total > 0 && selected > 0 && selected < total) {
params.set('model_names', [...selectedModels.value].join(','))
}
const qs = params.toString()
const url = `/api/benchmark/run${qs ? `?${qs}` : ''}`
const url = `/api/benchmark/run${includeSlow.value ? '?include_slow=true' : ''}`
useApiSSE(
url,
async (event) => {
@ -547,7 +427,6 @@ function startFinetune() {
onMounted(() => {
loadResults()
loadFineTunedModels()
loadModelCategories()
})
</script>
@ -883,134 +762,6 @@ onMounted(() => {
font-weight: 700;
}
/* ── Model Picker ───────────────────────────────────────── */
.model-picker {
border: 1px solid var(--color-border, #d0d7e8);
border-radius: 0.5rem;
overflow: hidden;
}
.picker-summary {
display: flex;
align-items: center;
gap: 0.6rem;
padding: 0.65rem 0.9rem;
cursor: pointer;
user-select: none;
list-style: none;
background: var(--color-surface-raised, #e4ebf5);
}
.picker-summary::-webkit-details-marker { display: none; }
.picker-summary::before { content: '▶ '; font-size: 0.65rem; color: var(--color-text-secondary, #6b7a99); }
details[open] .picker-summary::before { content: '▼ '; }
.picker-title {
font-size: 0.9rem;
font-weight: 600;
color: var(--color-text, #1a2338);
}
.picker-badge {
font-size: 0.75rem;
color: var(--color-text-secondary, #6b7a99);
background: var(--color-surface, #fff);
border: 1px solid var(--color-border, #d0d7e8);
padding: 0.15rem 0.5rem;
border-radius: 1rem;
font-family: var(--font-mono, monospace);
margin-left: auto;
}
.picker-body {
padding: 0.75rem;
border-top: 1px solid var(--color-border, #d0d7e8);
display: flex;
flex-direction: column;
gap: 0.75rem;
}
.picker-loading,
.picker-empty {
font-size: 0.85rem;
color: var(--color-text-secondary, #6b7a99);
padding: 0.5rem 0;
}
.picker-category {
display: flex;
flex-direction: column;
gap: 0.3rem;
}
.picker-cat-header {
display: flex;
align-items: center;
gap: 0.45rem;
font-size: 0.82rem;
font-weight: 700;
color: var(--color-text, #1a2338);
text-transform: uppercase;
letter-spacing: 0.04em;
cursor: pointer;
}
.picker-cat-count {
font-weight: 400;
color: var(--color-text-secondary, #6b7a99);
font-family: var(--font-mono, monospace);
font-size: 0.75rem;
text-transform: none;
letter-spacing: 0;
}
.picker-no-models {
font-size: 0.78rem;
color: var(--color-text-secondary, #6b7a99);
opacity: 0.65;
padding-left: 1.4rem;
font-style: italic;
}
.picker-model-list {
display: flex;
flex-wrap: wrap;
gap: 0.35rem 0.75rem;
padding-left: 1.4rem;
}
.picker-model-row {
display: flex;
align-items: center;
gap: 0.35rem;
font-size: 0.82rem;
cursor: pointer;
color: var(--color-text, #1a2338);
}
.picker-model-name {
font-family: var(--font-mono, monospace);
font-size: 0.78rem;
white-space: nowrap;
max-width: 18ch;
overflow: hidden;
text-overflow: ellipsis;
}
.picker-adapter-type {
font-size: 0.68rem;
color: var(--color-text-secondary, #6b7a99);
background: var(--color-surface-raised, #e4ebf5);
border: 1px solid var(--color-border, #d0d7e8);
border-radius: 0.25rem;
padding: 0.05rem 0.3rem;
font-family: var(--font-mono, monospace);
}
@media (max-width: 600px) {
.picker-model-list { padding-left: 0; }
.picker-model-name { max-width: 14ch; }
}
/* ── Fine-tune section ──────────────────────────────────── */
.ft-section {
border: 1px solid var(--color-border, #d0d7e8);

View file

@ -1,328 +0,0 @@
<template>
<div class="corrections-view">
<header class="cv-header">
<span class="queue-count">
<template v-if="loading">Loading</template>
<template v-else-if="store.totalRemaining > 0">
{{ store.totalRemaining }} remaining
</template>
<span v-else class="queue-empty-label">All caught up</span>
</span>
<div class="header-actions">
<button @click="handleUndo" :disabled="!store.lastAction" class="btn-action"> Undo</button>
</div>
</header>
<!-- States -->
<div v-if="loading" class="skeleton-card" aria-label="Loading candidates" />
<div v-else-if="apiError" class="error-display" role="alert">
<p>Couldn't reach Avocet API.</p>
<button @click="fetchBatch" class="btn-action">Retry</button>
</div>
<div v-else-if="!store.current" class="empty-state">
<p>No candidates need review.</p>
<p class="empty-hint">Import a benchmark run from the Settings tab to get started.</p>
</div>
<template v-else>
<div class="card-wrapper">
<SftCard
:item="store.current"
:correcting="correcting"
@correct="startCorrection"
@discard="handleDiscard"
@flag="handleFlag"
@submit-correction="handleCorrect"
@cancel-correction="correcting = false"
ref="sftCardEl"
/>
</div>
</template>
<!-- Stats footer -->
<footer v-if="stats" class="stats-footer">
<span class="stat"> {{ stats.by_status?.approved ?? 0 }} approved</span>
<span class="stat"> {{ stats.by_status?.discarded ?? 0 }} discarded</span>
<span class="stat"> {{ stats.by_status?.model_rejected ?? 0 }} flagged</span>
<a
v-if="(stats.export_ready ?? 0) > 0"
:href="exportUrl"
download
class="btn-export"
>
Export {{ stats.export_ready }} corrections
</a>
</footer>
<!-- Undo toast (inline UndoToast.vue uses label store's LastAction shape, not SFT's) -->
<div v-if="store.lastAction" class="undo-toast">
<span>Last: {{ store.lastAction.type }}</span>
<button @click="handleUndo" class="btn-undo"> Undo</button>
<button @click="store.clearLastAction()" class="btn-dismiss"></button>
</div>
</div>
</template>
<script setup lang="ts">
import { ref, onMounted } from 'vue'
import { useSftStore } from '../stores/sft'
import type { SftFailureCategory } from '../stores/sft'
import { useSftKeyboard } from '../composables/useSftKeyboard'
import SftCard from '../components/SftCard.vue'
const store = useSftStore()
const loading = ref(false)
const apiError = ref(false)
const correcting = ref(false)
const stats = ref<Record<string, any> | null>(null)
const exportUrl = '/api/sft/export'
const sftCardEl = ref<InstanceType<typeof SftCard> | null>(null)
useSftKeyboard({
onCorrect: () => { if (store.current && !correcting.value) correcting.value = true },
onDiscard: () => { if (store.current && !correcting.value) handleDiscard() },
onFlag: () => { if (store.current && !correcting.value) handleFlag() },
onEscape: () => { correcting.value = false },
onSubmit: () => {},
isEditing: () => correcting.value,
})
async function fetchBatch() {
loading.value = true
apiError.value = false
try {
const res = await fetch('/api/sft/queue?per_page=20')
if (!res.ok) throw new Error('API error')
const data = await res.json()
store.queue = data.items
store.totalRemaining = data.total
} catch {
apiError.value = true
} finally {
loading.value = false
}
}
async function fetchStats() {
try {
const res = await fetch('/api/sft/stats')
if (res.ok) stats.value = await res.json()
} catch { /* ignore */ }
}
function startCorrection() {
correcting.value = true
}
async function handleCorrect(text: string, category: SftFailureCategory | null = null) {
if (!store.current) return
const item = store.current
correcting.value = false
try {
const body: Record<string, unknown> = { id: item.id, action: 'correct', corrected_response: text }
if (category != null) body.failure_category = category
const res = await fetch('/api/sft/submit', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(body),
})
if (!res.ok) throw new Error(`HTTP ${res.status}`)
store.removeCurrentFromQueue()
store.setLastAction('correct', item, category)
store.totalRemaining = Math.max(0, store.totalRemaining - 1)
fetchStats()
if (store.queue.length < 5) fetchBatch()
} catch (err) {
console.error('handleCorrect failed:', err)
}
}
async function handleDiscard(category: SftFailureCategory | null = null) {
if (!store.current) return
const item = store.current
try {
const body: Record<string, unknown> = { id: item.id, action: 'discard' }
if (category != null) body.failure_category = category
const res = await fetch('/api/sft/submit', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(body),
})
if (!res.ok) throw new Error(`HTTP ${res.status}`)
store.removeCurrentFromQueue()
store.setLastAction('discard', item, category)
store.totalRemaining = Math.max(0, store.totalRemaining - 1)
fetchStats()
if (store.queue.length < 5) fetchBatch()
} catch (err) {
console.error('handleDiscard failed:', err)
}
}
async function handleFlag(category: SftFailureCategory | null = null) {
if (!store.current) return
const item = store.current
try {
const body: Record<string, unknown> = { id: item.id, action: 'flag' }
if (category != null) body.failure_category = category
const res = await fetch('/api/sft/submit', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(body),
})
if (!res.ok) throw new Error(`HTTP ${res.status}`)
store.removeCurrentFromQueue()
store.setLastAction('flag', item, category)
store.totalRemaining = Math.max(0, store.totalRemaining - 1)
fetchStats()
if (store.queue.length < 5) fetchBatch()
} catch (err) {
console.error('handleFlag failed:', err)
}
}
async function handleUndo() {
if (!store.lastAction) return
const action = store.lastAction
const { item } = action
try {
const res = await fetch('/api/sft/undo', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ id: item.id }),
})
if (!res.ok) throw new Error(`HTTP ${res.status}`)
store.restoreItem(item)
store.totalRemaining++
store.clearLastAction()
fetchStats()
} catch (err) {
// Backend did not restore clear the undo UI without restoring queue state
console.error('handleUndo failed:', err)
store.clearLastAction()
}
}
onMounted(() => {
fetchBatch()
fetchStats()
})
</script>
<style scoped>
.corrections-view {
display: flex;
flex-direction: column;
min-height: 100dvh;
padding: var(--space-4);
gap: var(--space-4);
max-width: 760px;
margin: 0 auto;
}
.cv-header {
display: flex;
justify-content: space-between;
align-items: center;
}
.queue-count {
font-size: 1rem;
font-weight: 600;
color: var(--color-text);
}
.queue-empty-label { color: var(--color-text-muted); }
.btn-action {
padding: var(--space-2) var(--space-3);
border: 1px solid var(--color-border);
border-radius: var(--radius-md);
background: var(--color-surface-raised);
cursor: pointer;
font-size: 0.88rem;
}
.btn-action:disabled { opacity: 0.4; cursor: not-allowed; }
.skeleton-card {
height: 320px;
background: var(--color-surface-alt);
border-radius: var(--radius-lg);
animation: pulse 1.5s ease-in-out infinite;
}
@keyframes pulse {
0%, 100% { opacity: 1; }
50% { opacity: 0.5; }
}
@media (prefers-reduced-motion: reduce) {
.skeleton-card { animation: none; }
}
.error-display, .empty-state {
text-align: center;
padding: var(--space-12);
color: var(--color-text-muted);
}
.empty-hint { font-size: 0.88rem; margin-top: var(--space-2); }
.stats-footer {
display: flex;
gap: var(--space-4);
align-items: center;
flex-wrap: wrap;
padding: var(--space-3) 0;
border-top: 1px solid var(--color-border-light);
font-size: 0.85rem;
color: var(--color-text-muted);
}
.btn-export {
margin-left: auto;
padding: var(--space-2) var(--space-3);
background: var(--color-primary);
color: var(--color-text-inverse, #fff);
border-radius: var(--radius-md);
text-decoration: none;
font-size: 0.88rem;
}
.undo-toast {
position: fixed;
bottom: var(--space-6);
left: 50%;
transform: translateX(-50%);
display: flex;
align-items: center;
gap: var(--space-3);
background: var(--color-surface-raised);
border: 1px solid var(--color-border);
border-radius: var(--radius-md);
padding: var(--space-3) var(--space-4);
box-shadow: 0 4px 12px rgba(0,0,0,0.15);
font-size: 0.9rem;
}
.btn-undo {
background: var(--color-primary);
color: var(--color-text-inverse, #fff);
border: none;
border-radius: var(--radius-sm);
padding: var(--space-1) var(--space-3);
cursor: pointer;
font-size: 0.88rem;
}
.btn-dismiss {
background: none;
border: none;
color: var(--color-text-muted);
cursor: pointer;
font-size: 1rem;
}
</style>

View file

@ -1,826 +0,0 @@
<template>
<div class="models-view">
<h1 class="page-title">🤗 Models</h1>
<!-- 1. HF Lookup -->
<section class="section">
<h2 class="section-title">HuggingFace Lookup</h2>
<div class="lookup-row">
<input
v-model="lookupInput"
type="text"
class="lookup-input"
placeholder="org/model or huggingface.co/org/model"
:disabled="lookupLoading"
@keydown.enter="doLookup"
aria-label="HuggingFace model ID"
/>
<button
class="btn-primary"
:disabled="lookupLoading || !lookupInput.trim()"
@click="doLookup"
>
{{ lookupLoading ? 'Looking up…' : 'Lookup' }}
</button>
</div>
<div v-if="lookupError" class="error-notice" role="alert">
{{ lookupError }}
</div>
<div v-if="lookupResult" class="preview-card">
<div class="preview-header">
<span class="preview-repo-id">{{ lookupResult.repo_id }}</span>
<div class="badge-group">
<span v-if="lookupResult.already_installed" class="badge badge-success">Installed</span>
<span v-if="lookupResult.already_queued" class="badge badge-info">In queue</span>
</div>
</div>
<div class="preview-meta">
<span v-if="lookupResult.pipeline_tag" class="chip chip-pipeline">
{{ lookupResult.pipeline_tag }}
</span>
<span v-if="lookupResult.adapter_recommendation" class="chip chip-adapter">
{{ lookupResult.adapter_recommendation }}
</span>
<span v-if="lookupResult.size != null" class="preview-size">
{{ humanBytes(lookupResult.size) }}
</span>
</div>
<p v-if="lookupResult.description" class="preview-desc">
{{ lookupResult.description }}
</p>
<button
class="btn-primary btn-add-queue"
:disabled="lookupResult.already_installed || lookupResult.already_queued || addingToQueue"
@click="addToQueue"
>
{{ addingToQueue ? 'Adding…' : 'Add to queue' }}
</button>
</div>
</section>
<!-- 2. Approval Queue -->
<section class="section">
<h2 class="section-title">Approval Queue</h2>
<div v-if="pendingModels.length === 0" class="empty-notice">
No models waiting for approval.
</div>
<div v-for="model in pendingModels" :key="model.id" class="model-card">
<div class="model-card-header">
<span class="model-repo-id">{{ model.repo_id }}</span>
<button
class="btn-dismiss"
:aria-label="`Dismiss ${model.repo_id}`"
@click="dismissModel(model.id)"
>
</button>
</div>
<div class="model-meta">
<span v-if="model.pipeline_tag" class="chip chip-pipeline">{{ model.pipeline_tag }}</span>
<span v-if="model.adapter_recommendation" class="chip chip-adapter">{{ model.adapter_recommendation }}</span>
</div>
<div class="model-card-actions">
<button class="btn-primary btn-sm" @click="approveModel(model.id)">
Approve download
</button>
</div>
</div>
</section>
<!-- 3. Active Downloads -->
<section class="section">
<h2 class="section-title">Active Downloads</h2>
<div v-if="downloadingModels.length === 0" class="empty-notice">
No active downloads.
</div>
<div v-for="model in downloadingModels" :key="model.id" class="model-card">
<div class="model-card-header">
<span class="model-repo-id">{{ model.repo_id }}</span>
<span v-if="downloadErrors[model.id]" class="badge badge-error">Error</span>
</div>
<div class="model-meta">
<span v-if="model.pipeline_tag" class="chip chip-pipeline">{{ model.pipeline_tag }}</span>
</div>
<div v-if="downloadErrors[model.id]" class="download-error" role="alert">
{{ downloadErrors[model.id] }}
</div>
<div v-else class="progress-wrap" :aria-label="`Download progress for ${model.repo_id}`">
<div
class="progress-bar"
:style="{ width: `${downloadProgress[model.id] ?? 0}%` }"
role="progressbar"
:aria-valuenow="downloadProgress[model.id] ?? 0"
aria-valuemin="0"
aria-valuemax="100"
/>
<span class="progress-label">
{{ downloadProgress[model.id] == null ? 'Preparing…' : `${downloadProgress[model.id]}%` }}
</span>
</div>
</div>
</section>
<!-- 4. Installed Models -->
<section class="section">
<h2 class="section-title">Installed Models</h2>
<div v-if="installedModels.length === 0" class="empty-notice">
No models installed yet.
</div>
<div v-else class="installed-table-wrap">
<table class="installed-table">
<thead>
<tr>
<th>Name</th>
<th>Type</th>
<th>Adapter</th>
<th>Size</th>
<th></th>
</tr>
</thead>
<tbody>
<tr v-for="model in installedModels" :key="model.name">
<td class="td-name">{{ model.name }}</td>
<td>
<span
class="badge"
:class="model.type === 'finetuned' ? 'badge-accent' : 'badge-info'"
>
{{ model.type }}
</span>
</td>
<td>{{ model.adapter ?? '—' }}</td>
<td>{{ humanBytes(model.size) }}</td>
<td>
<button
class="btn-danger btn-sm"
@click="deleteInstalled(model.name)"
>
Delete
</button>
</td>
</tr>
</tbody>
</table>
</div>
</section>
</div>
</template>
<script setup lang="ts">
import { ref, computed, onMounted, onUnmounted } from 'vue'
// Type definitions
interface LookupResult {
repo_id: string
pipeline_tag: string | null
adapter_recommendation: string | null
size: number | null
description: string | null
already_installed: boolean
already_queued: boolean
}
interface QueuedModel {
id: string
repo_id: string
status: 'pending' | 'downloading' | 'done' | 'error'
pipeline_tag: string | null
adapter_recommendation: string | null
}
interface InstalledModel {
name: string
type: 'finetuned' | 'downloaded'
adapter: string | null
size: number
}
interface SseProgressEvent {
model_id: string
pct: number | null
status: 'progress' | 'done' | 'error'
message?: string
}
// State
const lookupInput = ref('')
const lookupLoading = ref(false)
const lookupError = ref<string | null>(null)
const lookupResult = ref<LookupResult | null>(null)
const addingToQueue = ref(false)
const queuedModels = ref<QueuedModel[]>([])
const installedModels = ref<InstalledModel[]>([])
const downloadProgress = ref<Record<string, number>>({})
const downloadErrors = ref<Record<string, string>>({})
let pollInterval: ReturnType<typeof setInterval> | null = null
let sseSource: EventSource | null = null
// Derived
const pendingModels = computed(() =>
queuedModels.value.filter(m => m.status === 'pending')
)
const downloadingModels = computed(() =>
queuedModels.value.filter(m => m.status === 'downloading')
)
// Helpers
function humanBytes(bytes: number | null): string {
if (bytes == null) return '—'
const units = ['B', 'KB', 'MB', 'GB', 'TB']
let value = bytes
let unitIndex = 0
while (value >= 1024 && unitIndex < units.length - 1) {
value /= 1024
unitIndex++
}
return `${value.toFixed(unitIndex === 0 ? 0 : 1)} ${units[unitIndex]}`
}
function normalizeRepoId(raw: string): string {
return raw.trim().replace(/^https?:\/\/huggingface\.co\//, '')
}
// API calls
async function doLookup() {
const repoId = normalizeRepoId(lookupInput.value)
if (!repoId) return
lookupLoading.value = true
lookupError.value = null
lookupResult.value = null
try {
const res = await fetch(`/api/models/lookup?repo_id=${encodeURIComponent(repoId)}`)
if (res.status === 404) {
lookupError.value = 'Model not found on HuggingFace.'
return
}
if (res.status === 502) {
lookupError.value = 'HuggingFace unreachable. Check your connection and try again.'
return
}
if (!res.ok) {
lookupError.value = `Lookup failed (HTTP ${res.status}).`
return
}
lookupResult.value = await res.json() as LookupResult
} catch {
lookupError.value = 'Network error. Is the Avocet API running?'
} finally {
lookupLoading.value = false
}
}
async function addToQueue() {
if (!lookupResult.value) return
addingToQueue.value = true
try {
const res = await fetch('/api/models/queue', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ repo_id: lookupResult.value.repo_id }),
})
if (res.ok) {
lookupResult.value = { ...lookupResult.value, already_queued: true }
await loadQueue()
}
} catch { /* ignore — already_queued badge won't flip, user can retry */ }
finally {
addingToQueue.value = false
}
}
async function approveModel(id: string) {
try {
const res = await fetch(`/api/models/queue/${encodeURIComponent(id)}/approve`, { method: 'POST' })
if (res.ok) {
await loadQueue()
startSse()
}
} catch { /* ignore */ }
}
async function dismissModel(id: string) {
try {
const res = await fetch(`/api/models/queue/${encodeURIComponent(id)}`, { method: 'DELETE' })
if (res.ok) {
queuedModels.value = queuedModels.value.filter(m => m.id !== id)
}
} catch { /* ignore */ }
}
async function deleteInstalled(name: string) {
if (!window.confirm(`Delete installed model "${name}"? This cannot be undone.`)) return
try {
const res = await fetch(`/api/models/installed/${encodeURIComponent(name)}`, { method: 'DELETE' })
if (res.ok) {
installedModels.value = installedModels.value.filter(m => m.name !== name)
}
} catch { /* ignore */ }
}
async function loadQueue() {
try {
const res = await fetch('/api/models/queue')
if (res.ok) queuedModels.value = await res.json() as QueuedModel[]
} catch { /* non-fatal */ }
}
async function loadInstalled() {
try {
const res = await fetch('/api/models/installed')
if (res.ok) installedModels.value = await res.json() as InstalledModel[]
} catch { /* non-fatal */ }
}
// SSE for download progress
function startSse() {
if (sseSource) return // already connected
sseSource = new EventSource('/api/models/download/stream')
sseSource.addEventListener('message', (e: MessageEvent) => {
let event: SseProgressEvent
try {
event = JSON.parse(e.data as string) as SseProgressEvent
} catch {
return
}
const { model_id, pct, status, message } = event
if (status === 'progress' && pct != null) {
downloadProgress.value = { ...downloadProgress.value, [model_id]: pct }
} else if (status === 'done') {
const updated = { ...downloadProgress.value }
delete updated[model_id]
downloadProgress.value = updated
queuedModels.value = queuedModels.value.filter(m => m.id !== model_id)
loadInstalled()
} else if (status === 'error') {
downloadErrors.value = {
...downloadErrors.value,
[model_id]: message ?? 'Download failed.',
}
}
})
sseSource.onerror = () => {
sseSource?.close()
sseSource = null
}
}
function stopSse() {
sseSource?.close()
sseSource = null
}
// Polling
function startPollingIfDownloading() {
if (pollInterval) return
pollInterval = setInterval(async () => {
await loadQueue()
if (downloadingModels.value.length === 0) {
stopPolling()
}
}, 5000)
}
function stopPolling() {
if (pollInterval) {
clearInterval(pollInterval)
pollInterval = null
}
}
// Lifecycle
onMounted(async () => {
await Promise.all([loadQueue(), loadInstalled()])
if (downloadingModels.value.length > 0) {
startSse()
startPollingIfDownloading()
}
})
onUnmounted(() => {
stopPolling()
stopSse()
})
</script>
<style scoped>
.models-view {
max-width: 760px;
margin: 0 auto;
padding: 1.5rem 1rem 4rem;
display: flex;
flex-direction: column;
gap: 2rem;
}
.page-title {
font-family: var(--font-display, var(--font-body, sans-serif));
font-size: 1.4rem;
font-weight: 700;
color: var(--color-primary, #2d5a27);
}
/* ── Sections ── */
.section {
display: flex;
flex-direction: column;
gap: 0.75rem;
}
.section-title {
font-size: 1rem;
font-weight: 600;
color: var(--color-text, #1a2338);
padding-bottom: 0.4rem;
border-bottom: 1px solid var(--color-border, #a8b8d0);
}
/* ── Lookup row ── */
.lookup-row {
display: flex;
gap: 0.5rem;
flex-wrap: wrap;
}
.lookup-input {
flex: 1;
min-width: 0;
padding: 0.45rem 0.7rem;
border: 1px solid var(--color-border, #a8b8d0);
border-radius: var(--radius-md, 0.5rem);
background: var(--color-surface-raised, #f5f7fc);
color: var(--color-text, #1a2338);
font-size: 0.9rem;
font-family: var(--font-body, sans-serif);
}
.lookup-input:disabled {
opacity: 0.6;
}
.lookup-input::placeholder {
color: var(--color-text-muted, #4a5c7a);
}
/* ── Notices ── */
.error-notice {
padding: 0.6rem 0.8rem;
background: color-mix(in srgb, var(--color-error, #c0392b) 12%, transparent);
border: 1px solid color-mix(in srgb, var(--color-error, #c0392b) 30%, transparent);
border-radius: var(--radius-md, 0.5rem);
color: var(--color-error, #c0392b);
font-size: 0.88rem;
}
.empty-notice {
color: var(--color-text-muted, #4a5c7a);
font-size: 0.9rem;
padding: 0.75rem;
border: 1px dashed var(--color-border, #a8b8d0);
border-radius: var(--radius-md, 0.5rem);
}
/* ── Preview card ── */
.preview-card {
border: 1px solid var(--color-border, #a8b8d0);
border-radius: var(--radius-lg, 1rem);
background: var(--color-surface-raised, #f5f7fc);
padding: 1rem;
display: flex;
flex-direction: column;
gap: 0.6rem;
box-shadow: var(--shadow-sm);
}
.preview-header {
display: flex;
align-items: flex-start;
justify-content: space-between;
gap: 0.5rem;
flex-wrap: wrap;
}
.preview-repo-id {
font-family: var(--font-mono, monospace);
font-size: 0.95rem;
font-weight: 600;
color: var(--color-text, #1a2338);
word-break: break-all;
}
.preview-meta {
display: flex;
gap: 0.4rem;
flex-wrap: wrap;
align-items: center;
}
.preview-size {
font-size: 0.8rem;
color: var(--color-text-muted, #4a5c7a);
margin-left: 0.25rem;
}
.preview-desc {
font-size: 0.875rem;
color: var(--color-text-muted, #4a5c7a);
line-height: 1.5;
margin: 0;
display: -webkit-box;
-webkit-line-clamp: 3;
-webkit-box-orient: vertical;
overflow: hidden;
}
.btn-add-queue {
align-self: flex-start;
}
/* ── Model cards (queue + downloads) ── */
.model-card {
border: 1px solid var(--color-border, #a8b8d0);
border-radius: var(--radius-md, 0.5rem);
background: var(--color-surface-raised, #f5f7fc);
padding: 0.75rem 1rem;
display: flex;
flex-direction: column;
gap: 0.5rem;
box-shadow: var(--shadow-sm);
}
.model-card-header {
display: flex;
align-items: center;
justify-content: space-between;
gap: 0.5rem;
}
.model-repo-id {
font-family: var(--font-mono, monospace);
font-size: 0.9rem;
font-weight: 600;
color: var(--color-text, #1a2338);
word-break: break-all;
}
.model-meta {
display: flex;
gap: 0.4rem;
flex-wrap: wrap;
}
.model-card-actions {
display: flex;
gap: 0.5rem;
flex-wrap: wrap;
padding-top: 0.25rem;
}
/* ── Progress bar ── */
.progress-wrap {
position: relative;
height: 1.5rem;
background: var(--color-surface-alt, #dde4f0);
border-radius: var(--radius-full, 9999px);
overflow: hidden;
}
.progress-bar {
position: absolute;
top: 0;
left: 0;
height: 100%;
background: var(--color-accent, #c4732a);
border-radius: var(--radius-full, 9999px);
transition: width 300ms ease;
}
.progress-label {
position: absolute;
inset: 0;
display: flex;
align-items: center;
justify-content: center;
font-size: 0.75rem;
font-weight: 600;
color: var(--color-text, #1a2338);
pointer-events: none;
}
.download-error {
font-size: 0.85rem;
color: var(--color-error, #c0392b);
padding: 0.4rem 0.5rem;
background: color-mix(in srgb, var(--color-error, #c0392b) 10%, transparent);
border-radius: var(--radius-sm, 0.25rem);
}
/* ── Installed table ── */
.installed-table-wrap {
overflow-x: auto;
}
.installed-table {
width: 100%;
border-collapse: collapse;
font-size: 0.875rem;
}
.installed-table th {
text-align: left;
padding: 0.4rem 0.6rem;
color: var(--color-text-muted, #4a5c7a);
font-size: 0.78rem;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.03em;
border-bottom: 1px solid var(--color-border, #a8b8d0);
white-space: nowrap;
}
.installed-table td {
padding: 0.55rem 0.6rem;
border-bottom: 1px solid var(--color-border-light, #ccd5e6);
vertical-align: middle;
}
.td-name {
font-family: var(--font-mono, monospace);
font-size: 0.85rem;
word-break: break-all;
}
/* ── Badges ── */
.badge-group {
display: flex;
gap: 0.35rem;
flex-wrap: wrap;
align-items: center;
}
.badge {
display: inline-flex;
align-items: center;
padding: 0.15rem 0.55rem;
border-radius: var(--radius-full, 9999px);
font-size: 0.72rem;
font-weight: 700;
letter-spacing: 0.02em;
text-transform: uppercase;
white-space: nowrap;
}
.badge-success {
background: color-mix(in srgb, var(--color-success, #3a7a32) 15%, transparent);
color: var(--color-success, #3a7a32);
}
.badge-info {
background: color-mix(in srgb, var(--color-info, #1e6091) 15%, transparent);
color: var(--color-info, #1e6091);
}
.badge-accent {
background: color-mix(in srgb, var(--color-accent, #c4732a) 15%, transparent);
color: var(--color-accent, #c4732a);
}
.badge-error {
background: color-mix(in srgb, var(--color-error, #c0392b) 15%, transparent);
color: var(--color-error, #c0392b);
}
/* ── Chips ── */
.chip {
display: inline-flex;
align-items: center;
padding: 0.15rem 0.5rem;
border-radius: var(--radius-full, 9999px);
font-size: 0.75rem;
font-weight: 600;
background: var(--color-surface-alt, #dde4f0);
white-space: nowrap;
}
.chip-pipeline {
color: var(--color-primary, #2d5a27);
background: color-mix(in srgb, var(--color-primary, #2d5a27) 12%, var(--color-surface-alt, #dde4f0));
}
.chip-adapter {
color: var(--color-accent, #c4732a);
background: color-mix(in srgb, var(--color-accent, #c4732a) 12%, var(--color-surface-alt, #dde4f0));
}
/* ── Buttons ── */
.btn-primary, .btn-danger {
padding: 0.4rem 0.9rem;
border-radius: var(--radius-md, 0.5rem);
font-size: 0.85rem;
cursor: pointer;
border: 1px solid;
font-family: var(--font-body, sans-serif);
transition: background var(--transition, 200ms ease), color var(--transition, 200ms ease);
}
.btn-sm {
padding: 0.25rem 0.65rem;
font-size: 0.8rem;
}
.btn-primary {
border-color: var(--color-primary, #2d5a27);
background: var(--color-primary, #2d5a27);
color: var(--color-text-inverse, #eaeff8);
}
.btn-primary:hover:not(:disabled) {
background: var(--color-primary-hover, #234820);
border-color: var(--color-primary-hover, #234820);
}
.btn-primary:disabled {
opacity: 0.5;
cursor: not-allowed;
}
.btn-danger {
border-color: var(--color-error, #c0392b);
background: transparent;
color: var(--color-error, #c0392b);
}
.btn-danger:hover {
background: color-mix(in srgb, var(--color-error, #c0392b) 10%, transparent);
}
.btn-dismiss {
border: none;
background: transparent;
color: var(--color-text-muted, #4a5c7a);
cursor: pointer;
font-size: 0.9rem;
padding: 0.15rem 0.4rem;
border-radius: var(--radius-sm, 0.25rem);
flex-shrink: 0;
transition: color var(--transition, 200ms ease), background var(--transition, 200ms ease);
}
.btn-dismiss:hover {
color: var(--color-error, #c0392b);
background: color-mix(in srgb, var(--color-error, #c0392b) 10%, transparent);
}
/* ── Responsive ── */
@media (max-width: 480px) {
.lookup-row {
flex-direction: column;
}
.lookup-input {
width: 100%;
}
.btn-primary:not(.btn-sm) {
width: 100%;
}
.installed-table th:nth-child(3),
.installed-table td:nth-child(3) {
display: none; /* hide Adapter column on very narrow screens */
}
}
</style>

View file

@ -110,63 +110,6 @@
</label>
</section>
<!-- cf-orch SFT Integration section -->
<section class="section">
<h2 class="section-title">cf-orch Integration</h2>
<p class="section-desc">
Import SFT (supervised fine-tuning) candidates from cf-orch benchmark runs.
</p>
<div class="field-row">
<label class="field field-grow">
<span>bench_results_dir</span>
<input
id="bench-results-dir"
v-model="benchResultsDir"
type="text"
placeholder="/path/to/circuitforge-orch/scripts/bench_results"
/>
</label>
</div>
<div class="account-actions">
<button class="btn-primary" @click="saveSftConfig">Save</button>
<button class="btn-secondary" @click="scanRuns">Scan for runs</button>
<span v-if="saveStatus" class="save-status">{{ saveStatus }}</span>
</div>
<table v-if="runs.length > 0" class="runs-table">
<thead>
<tr>
<th>Timestamp</th>
<th>Candidates</th>
<th>Imported</th>
<th></th>
</tr>
</thead>
<tbody>
<tr v-for="run in runs" :key="run.run_id">
<td>{{ run.timestamp }}</td>
<td>{{ run.candidate_count }}</td>
<td>{{ run.already_imported ? '✓' : '—' }}</td>
<td>
<button
class="btn-import"
:disabled="run.already_imported || importingRunId === run.run_id"
@click="importRun(run.run_id)"
>
{{ importingRunId === run.run_id ? 'Importing…' : 'Import' }}
</button>
</td>
</tr>
</tbody>
</table>
<div v-if="importResult" class="import-result">
Imported {{ importResult.imported }}, skipped {{ importResult.skipped }}.
</div>
</section>
<!-- Save / Reload -->
<div class="save-bar">
<button class="btn-primary" :disabled="saving" @click="save">
@ -199,64 +142,6 @@ const saveOk = ref(true)
const richMotion = ref(localStorage.getItem('cf-avocet-rich-motion') !== 'false')
const keyHints = ref(localStorage.getItem('cf-avocet-key-hints') !== 'false')
// SFT integration state
const benchResultsDir = ref('')
const runs = ref<Array<{ run_id: string; timestamp: string; candidate_count: number; already_imported: boolean }>>([])
const importingRunId = ref<string | null>(null)
const importResult = ref<{ imported: number; skipped: number } | null>(null)
const saveStatus = ref('')
async function loadSftConfig() {
try {
const res = await fetch('/api/sft/config')
if (res.ok) {
const data = await res.json()
benchResultsDir.value = data.bench_results_dir ?? ''
}
} catch {
// non-fatal leave field empty
}
}
async function saveSftConfig() {
saveStatus.value = 'Saving…'
try {
const res = await fetch('/api/sft/config', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ bench_results_dir: benchResultsDir.value }),
})
saveStatus.value = res.ok ? 'Saved.' : 'Error saving.'
} catch {
saveStatus.value = 'Error saving.'
}
setTimeout(() => { saveStatus.value = '' }, 2000)
}
async function scanRuns() {
try {
const res = await fetch('/api/sft/runs')
if (res.ok) runs.value = await res.json()
} catch { /* ignore */ }
}
async function importRun(runId: string) {
importingRunId.value = runId
importResult.value = null
try {
const res = await fetch('/api/sft/import', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ run_id: runId }),
})
if (res.ok) {
importResult.value = await res.json()
scanRuns() // refresh already_imported flags
}
} catch { /* ignore */ }
importingRunId.value = null
}
async function reload() {
const { data } = await useApiFetch<{ accounts: Account[]; max_per_account: number }>('/api/config')
if (data) {
@ -334,10 +219,7 @@ function onKeyHintsChange() {
document.documentElement.classList.toggle('hide-key-hints', !keyHints.value)
}
onMounted(() => {
reload()
loadSftConfig()
})
onMounted(reload)
</script>
<style scoped>
@ -546,61 +428,4 @@ onMounted(() => {
border: 1px dashed var(--color-border, #d0d7e8);
border-radius: 0.5rem;
}
.section-desc {
color: var(--color-text-secondary, #6b7a99);
font-size: 0.88rem;
line-height: 1.5;
}
.field-input {
padding: 0.4rem 0.6rem;
border: 1px solid var(--color-border, #d0d7e8);
border-radius: 0.375rem;
background: var(--color-surface, #fff);
color: var(--color-text, #1a2338);
font-size: 0.9rem;
font-family: var(--font-body, sans-serif);
width: 100%;
}
.runs-table {
width: 100%;
border-collapse: collapse;
margin-top: var(--space-3, 0.75rem);
font-size: 0.88rem;
}
.runs-table th,
.runs-table td {
padding: var(--space-2, 0.5rem) var(--space-3, 0.75rem);
text-align: left;
border-bottom: 1px solid var(--color-border, #d0d7e8);
}
.btn-import {
padding: var(--space-1, 0.25rem) var(--space-3, 0.75rem);
border: 1px solid var(--app-primary, #2A6080);
border-radius: var(--radius-sm, 0.25rem);
background: none;
color: var(--app-primary, #2A6080);
cursor: pointer;
font-size: 0.85rem;
}
.btn-import:disabled {
opacity: 0.45;
cursor: not-allowed;
}
.import-result {
margin-top: var(--space-2, 0.5rem);
font-size: 0.88rem;
color: var(--color-text-secondary, #6b7a99);
}
.save-status {
font-size: 0.85rem;
color: var(--color-text-secondary, #6b7a99);
}
</style>

View file

@ -35,39 +35,6 @@
</div>
</div>
<!-- Benchmark Results -->
<template v-if="benchRows.length > 0">
<h2 class="section-title">🏁 Benchmark Results</h2>
<div class="bench-table-wrap">
<table class="bench-table">
<thead>
<tr>
<th class="bt-model-col">Model</th>
<th
v-for="m in BENCH_METRICS"
:key="m.key as string"
class="bt-metric-col"
>{{ m.label }}</th>
</tr>
</thead>
<tbody>
<tr v-for="row in benchRows" :key="row.name">
<td class="bt-model-cell" :title="row.name">{{ row.name }}</td>
<td
v-for="m in BENCH_METRICS"
:key="m.key as string"
class="bt-metric-cell"
:class="{ 'bt-best': bestByMetric[m.key as string] === row.name }"
>
{{ formatMetric(row.result[m.key]) }}
</td>
</tr>
</tbody>
</table>
</div>
<p class="bench-hint">Highlighted cells are the best-scoring model per metric.</p>
</template>
<div class="file-info">
<span class="file-path">Score file: <code>data/email_score.jsonl</code></span>
<span class="file-size">{{ fileSizeLabel }}</span>
@ -87,18 +54,10 @@
import { ref, computed, onMounted } from 'vue'
import { useApiFetch } from '../composables/useApi'
interface BenchmarkModelResult {
accuracy?: number
macro_f1?: number
weighted_f1?: number
[key: string]: number | undefined
}
interface StatsResponse {
total: number
counts: Record<string, number>
score_file_bytes: number
benchmark_results?: Record<string, BenchmarkModelResult>
}
// Canonical label order + metadata
@ -149,42 +108,6 @@ const fileSizeLabel = computed(() => {
return `${(b / 1024 / 1024).toFixed(2)} MB`
})
// Benchmark results helpers
const BENCH_METRICS: Array<{ key: keyof BenchmarkModelResult; label: string }> = [
{ key: 'accuracy', label: 'Accuracy' },
{ key: 'macro_f1', label: 'Macro F1' },
{ key: 'weighted_f1', label: 'Weighted F1' },
]
const benchRows = computed(() => {
const br = stats.value.benchmark_results
if (!br || Object.keys(br).length === 0) return []
return Object.entries(br).map(([name, result]) => ({ name, result }))
})
// Find the best model name for each metric
const bestByMetric = computed((): Record<string, string> => {
const result: Record<string, string> = {}
for (const { key } of BENCH_METRICS) {
let bestName = ''
let bestVal = -Infinity
for (const { name, result: r } of benchRows.value) {
const v = r[key]
if (v != null && v > bestVal) { bestVal = v; bestName = name }
}
result[key as string] = bestName
}
return result
})
function formatMetric(v: number | undefined): string {
if (v == null) return '—'
// Values in 0-1 range: format as percentage
if (v <= 1) return `${(v * 100).toFixed(1)}%`
// Already a percentage
return `${v.toFixed(1)}%`
}
async function load() {
loading.value = true
error.value = ''
@ -311,79 +234,6 @@ onMounted(load)
padding: 1rem;
}
/* ── Benchmark Results ──────────────────────────── */
.section-title {
font-family: var(--font-display, var(--font-body, sans-serif));
font-size: 1.05rem;
font-weight: 700;
color: var(--app-primary, #2A6080);
margin: 0;
}
.bench-table-wrap {
overflow-x: auto;
border: 1px solid var(--color-border, #d0d7e8);
border-radius: 0.5rem;
}
.bench-table {
border-collapse: collapse;
width: 100%;
font-size: 0.82rem;
}
.bt-model-col {
text-align: left;
padding: 0.45rem 0.75rem;
background: var(--color-surface-raised, #e4ebf5);
border-bottom: 1px solid var(--color-border, #d0d7e8);
font-weight: 600;
min-width: 12rem;
}
.bt-metric-col {
text-align: right;
padding: 0.45rem 0.75rem;
background: var(--color-surface-raised, #e4ebf5);
border-bottom: 1px solid var(--color-border, #d0d7e8);
font-weight: 600;
white-space: nowrap;
min-width: 6rem;
}
.bt-model-cell {
padding: 0.4rem 0.75rem;
border-top: 1px solid var(--color-border, #d0d7e8);
font-family: var(--font-mono, monospace);
font-size: 0.76rem;
white-space: nowrap;
overflow: hidden;
text-overflow: ellipsis;
max-width: 16rem;
color: var(--color-text, #1a2338);
}
.bt-metric-cell {
padding: 0.4rem 0.75rem;
border-top: 1px solid var(--color-border, #d0d7e8);
text-align: right;
font-family: var(--font-mono, monospace);
font-variant-numeric: tabular-nums;
color: var(--color-text, #1a2338);
}
.bt-metric-cell.bt-best {
color: var(--color-success, #3a7a32);
font-weight: 700;
background: color-mix(in srgb, var(--color-success, #3a7a32) 8%, transparent);
}
.bench-hint {
font-size: 0.75rem;
color: var(--color-text-secondary, #6b7a99);
margin: 0;
}
@media (max-width: 480px) {
.bar-row {
grid-template-columns: 1.5rem 1fr 1fr 3rem;

View file

@ -4,11 +4,6 @@ import UnoCSS from 'unocss/vite'
export default defineConfig({
plugins: [vue(), UnoCSS()],
server: {
proxy: {
'/api': 'http://localhost:8503',
},
},
test: {
environment: 'jsdom',
globals: true,