feat: Vue 3 label tab — complete card-stack UI with ASMR bucket UX #1
34 changed files with 4382 additions and 4723 deletions
6
.gitignore
vendored
6
.gitignore
vendored
|
|
@ -11,12 +11,6 @@ config/label_tool.yaml
|
||||||
data/email_score.jsonl
|
data/email_score.jsonl
|
||||||
data/email_label_queue.jsonl
|
data/email_label_queue.jsonl
|
||||||
data/email_compare_sample.jsonl
|
data/email_compare_sample.jsonl
|
||||||
data/sft_candidates.jsonl
|
|
||||||
data/sft_approved.jsonl
|
|
||||||
|
|
||||||
# Conda/pip artifacts
|
# Conda/pip artifacts
|
||||||
.env
|
.env
|
||||||
|
|
||||||
# Claude context — BSL 1.1, keep out of version control
|
|
||||||
CLAUDE.md
|
|
||||||
docs/superpowers/
|
|
||||||
|
|
|
||||||
173
CLAUDE.md
Normal file
173
CLAUDE.md
Normal file
|
|
@ -0,0 +1,173 @@
|
||||||
|
# Avocet — Email Classifier Training Tool
|
||||||
|
|
||||||
|
## What it is
|
||||||
|
|
||||||
|
Shared infrastructure for building and benchmarking email classifiers across the CircuitForge menagerie.
|
||||||
|
Named for the avocet's sweeping-bill technique — it sweeps through email streams and filters out categories.
|
||||||
|
|
||||||
|
**Pipeline:**
|
||||||
|
```
|
||||||
|
Scrape (IMAP, wide search, multi-account) → data/email_label_queue.jsonl
|
||||||
|
↓
|
||||||
|
Label (card-stack UI) → data/email_score.jsonl
|
||||||
|
↓
|
||||||
|
Benchmark (HuggingFace NLI/reranker) → per-model macro-F1 + latency
|
||||||
|
```
|
||||||
|
|
||||||
|
## Environment
|
||||||
|
|
||||||
|
- Python env: `conda run -n job-seeker <cmd>` for basic use (streamlit, yaml, stdlib only)
|
||||||
|
- Classifier env: `conda run -n job-seeker-classifiers <cmd>` for benchmark (transformers, FlagEmbedding, gliclass)
|
||||||
|
- Run tests: `/devl/miniconda3/envs/job-seeker/bin/pytest tests/ -v`
|
||||||
|
(direct binary — `conda run pytest` can spawn runaway processes)
|
||||||
|
- Create classifier env: `conda env create -f environment.yml`
|
||||||
|
|
||||||
|
## Label Tool (app/label_tool.py)
|
||||||
|
|
||||||
|
Card-stack Streamlit UI for manually labeling recruitment emails.
|
||||||
|
|
||||||
|
```
|
||||||
|
conda run -n job-seeker streamlit run app/label_tool.py --server.port 8503
|
||||||
|
```
|
||||||
|
|
||||||
|
- Config: `config/label_tool.yaml` (gitignored — copy from `.example`, or use ⚙️ Settings tab)
|
||||||
|
- Queue: `data/email_label_queue.jsonl` (gitignored)
|
||||||
|
- Output: `data/email_score.jsonl` (gitignored)
|
||||||
|
- Four tabs: 🃏 Label, 📥 Fetch, 📊 Stats, ⚙️ Settings
|
||||||
|
- Keyboard shortcuts: 1–9 = label, 0 = Other (wildcard, prompts free-text input), S = skip, U = undo
|
||||||
|
- Dedup: MD5 of `(subject + body[:100])` — cross-account safe
|
||||||
|
|
||||||
|
### Settings Tab (⚙️)
|
||||||
|
- Add / edit / remove IMAP accounts via form UI — no manual YAML editing required
|
||||||
|
- Per-account fields: display name, host, port, SSL toggle, username, password (masked), folder, days back
|
||||||
|
- **🔌 Test connection** button per account — connects, logs in, selects folder, reports message count
|
||||||
|
- Global: max emails per account per fetch
|
||||||
|
- **💾 Save** writes `config/label_tool.yaml`; **↩ Reload** discards unsaved changes
|
||||||
|
- `_sync_settings_to_state()` collects widget values before any add/remove to avoid index-key drift
|
||||||
|
|
||||||
|
## Benchmark (scripts/benchmark_classifier.py)
|
||||||
|
|
||||||
|
```
|
||||||
|
# List available models
|
||||||
|
conda run -n job-seeker-classifiers python scripts/benchmark_classifier.py --list-models
|
||||||
|
|
||||||
|
# Score against labeled JSONL
|
||||||
|
conda run -n job-seeker-classifiers python scripts/benchmark_classifier.py --score
|
||||||
|
|
||||||
|
# Visual comparison on live IMAP emails
|
||||||
|
conda run -n job-seeker-classifiers python scripts/benchmark_classifier.py --compare --limit 20
|
||||||
|
|
||||||
|
# Include slow/large models
|
||||||
|
conda run -n job-seeker-classifiers python scripts/benchmark_classifier.py --score --include-slow
|
||||||
|
|
||||||
|
# Export DB-labeled emails (⚠️ LLM-generated labels — review first)
|
||||||
|
conda run -n job-seeker-classifiers python scripts/benchmark_classifier.py --export-db --db /path/to/staging.db
|
||||||
|
```
|
||||||
|
|
||||||
|
## Labels (peregrine defaults — configurable per product)
|
||||||
|
|
||||||
|
| Label | Key | Meaning |
|
||||||
|
|-------|-----|---------|
|
||||||
|
| `interview_scheduled` | 1 | Phone screen, video call, or on-site invitation |
|
||||||
|
| `offer_received` | 2 | Formal job offer or offer letter |
|
||||||
|
| `rejected` | 3 | Application declined or not moving forward |
|
||||||
|
| `positive_response` | 4 | Recruiter interest or request to connect |
|
||||||
|
| `survey_received` | 5 | Culture-fit survey or assessment invitation |
|
||||||
|
| `neutral` | 6 | ATS confirmation (application received, etc.) |
|
||||||
|
| `event_rescheduled` | 7 | Interview or event moved to a new time |
|
||||||
|
| `digest` | 8 | Job digest or multi-listing email (scrapeable) |
|
||||||
|
| `new_lead` | 9 | Unsolicited recruiter outreach or cold contact |
|
||||||
|
| `hired` | h | Offer accepted, onboarding, welcome email, start date |
|
||||||
|
|
||||||
|
## Model Registry (13 models, 7 defaults)
|
||||||
|
|
||||||
|
See `scripts/benchmark_classifier.py:MODEL_REGISTRY`.
|
||||||
|
Default models run without `--include-slow`.
|
||||||
|
Add `--models deberta-small deberta-small-2pass` to test a specific subset.
|
||||||
|
|
||||||
|
## Config Files
|
||||||
|
|
||||||
|
- `config/label_tool.yaml` — gitignored; multi-account IMAP config
|
||||||
|
- `config/label_tool.yaml.example` — committed template
|
||||||
|
|
||||||
|
## Data Files
|
||||||
|
|
||||||
|
- `data/email_score.jsonl` — gitignored; manually-labeled ground truth
|
||||||
|
- `data/email_score.jsonl.example` — committed sample for CI
|
||||||
|
- `data/email_label_queue.jsonl` — gitignored; IMAP fetch queue
|
||||||
|
|
||||||
|
## Key Design Notes
|
||||||
|
|
||||||
|
- `ZeroShotAdapter.load()` instantiates the pipeline object; `classify()` calls the object.
|
||||||
|
Tests patch `scripts.classifier_adapters.pipeline` (the module-level factory) with a
|
||||||
|
two-level mock: `mock_factory.return_value = MagicMock(return_value={...})`.
|
||||||
|
- `two_pass=True` on ZeroShotAdapter: first pass ranks all 6 labels; second pass re-runs
|
||||||
|
with only top-2, forcing a binary choice. 2× cost, better confidence.
|
||||||
|
- `--compare` uses the first account in `label_tool.yaml` for live IMAP emails.
|
||||||
|
- DB export labels are llama3.1:8b-generated — treat as noisy, not gold truth.
|
||||||
|
|
||||||
|
## Vue Label UI (app/api.py + web/)
|
||||||
|
|
||||||
|
FastAPI on port 8503 serves both the REST API and the built Vue SPA (`web/dist/`).
|
||||||
|
|
||||||
|
```
|
||||||
|
./manage.sh start-api # build Vue SPA + start FastAPI (binds 0.0.0.0:8503 — LAN accessible)
|
||||||
|
./manage.sh stop-api
|
||||||
|
./manage.sh open-api # xdg-open http://localhost:8503
|
||||||
|
```
|
||||||
|
|
||||||
|
Logs: `log/api.log`
|
||||||
|
|
||||||
|
## Email Field Schema — IMPORTANT
|
||||||
|
|
||||||
|
Two schemas exist. The normalization layer in `app/api.py` bridges them automatically.
|
||||||
|
|
||||||
|
### JSONL on-disk schema (written by `label_tool.py` and `label_tool.py`'s IMAP fetch)
|
||||||
|
|
||||||
|
| Field | Type | Notes |
|
||||||
|
|-------|------|-------|
|
||||||
|
| `subject` | str | Email subject line |
|
||||||
|
| `body` | str | Plain-text body, truncated at 800 chars; HTML stripped by `_strip_html()` |
|
||||||
|
| `from_addr` | str | Sender address string (`"Name <addr>"`) |
|
||||||
|
| `date` | str | Raw RFC 2822 date string |
|
||||||
|
| `account` | str | Display name of the IMAP account that fetched it |
|
||||||
|
| *(no `id`)* | — | Dedup key is MD5 of `(subject + body[:100])` — never stored on disk |
|
||||||
|
|
||||||
|
### Vue API schema (returned by `GET /api/queue`, required by POST endpoints)
|
||||||
|
|
||||||
|
| Field | Type | Notes |
|
||||||
|
|-------|------|-------|
|
||||||
|
| `id` | str | MD5 content hash, or stored `id` if item has one |
|
||||||
|
| `subject` | str | Unchanged |
|
||||||
|
| `body` | str | Unchanged |
|
||||||
|
| `from` | str | Mapped from `from_addr` (or `from` if already present) |
|
||||||
|
| `date` | str | Unchanged |
|
||||||
|
| `source` | str | Mapped from `account` (or `source` if already present) |
|
||||||
|
|
||||||
|
### Normalization layer (`_normalize()` in `app/api.py`)
|
||||||
|
|
||||||
|
`_normalize(item)` handles the mapping and ID generation. All `GET /api/queue` responses
|
||||||
|
pass through it. Mutating endpoints (`/api/label`, `/api/skip`, `/api/discard`) look up
|
||||||
|
items via `_normalize(x)["id"]`, so both real data (no `id`, uses content hash) and test
|
||||||
|
fixtures (explicit `id` field) work transparently.
|
||||||
|
|
||||||
|
### Peregrine integration
|
||||||
|
|
||||||
|
Peregrine's `staging.db` uses different field names again:
|
||||||
|
|
||||||
|
| staging.db column | Maps to avocet JSONL field |
|
||||||
|
|-------------------|---------------------------|
|
||||||
|
| `subject` | `subject` |
|
||||||
|
| `body` | `body` (may contain HTML — run through `_strip_html()` before queuing) |
|
||||||
|
| `from_address` | `from_addr` |
|
||||||
|
| `received_date` | `date` |
|
||||||
|
| `account` or source context | `account` |
|
||||||
|
|
||||||
|
When exporting from Peregrine's DB for avocet labeling, transform to the JSONL schema above
|
||||||
|
(not the Vue API schema). The `--export-db` flag in `benchmark_classifier.py` does this.
|
||||||
|
Any new export path should also call `_strip_html()` on the body before writing.
|
||||||
|
|
||||||
|
## Relationship to Peregrine
|
||||||
|
|
||||||
|
Avocet started as `peregrine/tools/label_tool.py` + `peregrine/scripts/classifier_adapters.py`.
|
||||||
|
Peregrine retains copies during stabilization; once avocet is proven, peregrine will import from here.
|
||||||
51
app/api.py
51
app/api.py
|
|
@ -142,13 +142,6 @@ def _normalize(item: dict) -> dict:
|
||||||
|
|
||||||
app = FastAPI(title="Avocet API")
|
app = FastAPI(title="Avocet API")
|
||||||
|
|
||||||
from app.sft import router as sft_router
|
|
||||||
app.include_router(sft_router, prefix="/api/sft")
|
|
||||||
|
|
||||||
from app.models import router as models_router
|
|
||||||
import app.models as _models_module
|
|
||||||
app.include_router(models_router, prefix="/api/models")
|
|
||||||
|
|
||||||
# In-memory last-action store (single user, local tool — in-memory is fine)
|
# In-memory last-action store (single user, local tool — in-memory is fine)
|
||||||
_last_action: dict | None = None
|
_last_action: dict | None = None
|
||||||
|
|
||||||
|
|
@ -302,18 +295,10 @@ def get_stats():
|
||||||
lbl = r.get("label", "")
|
lbl = r.get("label", "")
|
||||||
if lbl:
|
if lbl:
|
||||||
counts[lbl] = counts.get(lbl, 0) + 1
|
counts[lbl] = counts.get(lbl, 0) + 1
|
||||||
benchmark_results: dict = {}
|
|
||||||
benchmark_path = _DATA_DIR / "benchmark_results.json"
|
|
||||||
if benchmark_path.exists():
|
|
||||||
try:
|
|
||||||
benchmark_results = json.loads(benchmark_path.read_text(encoding="utf-8"))
|
|
||||||
except Exception:
|
|
||||||
pass
|
|
||||||
return {
|
return {
|
||||||
"total": len(records),
|
"total": len(records),
|
||||||
"counts": counts,
|
"counts": counts,
|
||||||
"score_file_bytes": _score_file().stat().st_size if _score_file().exists() else 0,
|
"score_file_bytes": _score_file().stat().st_size if _score_file().exists() else 0,
|
||||||
"benchmark_results": benchmark_results,
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
@ -348,36 +333,6 @@ from fastapi.responses import StreamingResponse
|
||||||
# Benchmark endpoints
|
# Benchmark endpoints
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
@app.get("/api/benchmark/models")
|
|
||||||
def get_benchmark_models() -> dict:
|
|
||||||
"""Return installed models grouped by adapter_type category."""
|
|
||||||
models_dir: Path = _models_module._MODELS_DIR
|
|
||||||
categories: dict[str, list[dict]] = {
|
|
||||||
"ZeroShotAdapter": [],
|
|
||||||
"RerankerAdapter": [],
|
|
||||||
"GenerationAdapter": [],
|
|
||||||
"Unknown": [],
|
|
||||||
}
|
|
||||||
if models_dir.exists():
|
|
||||||
for sub in models_dir.iterdir():
|
|
||||||
if not sub.is_dir():
|
|
||||||
continue
|
|
||||||
info_path = sub / "model_info.json"
|
|
||||||
adapter_type = "Unknown"
|
|
||||||
repo_id: str | None = None
|
|
||||||
if info_path.exists():
|
|
||||||
try:
|
|
||||||
info = json.loads(info_path.read_text(encoding="utf-8"))
|
|
||||||
adapter_type = info.get("adapter_type") or info.get("adapter_recommendation") or "Unknown"
|
|
||||||
repo_id = info.get("repo_id")
|
|
||||||
except Exception:
|
|
||||||
pass
|
|
||||||
bucket = adapter_type if adapter_type in categories else "Unknown"
|
|
||||||
entry: dict = {"name": sub.name, "repo_id": repo_id, "adapter_type": adapter_type}
|
|
||||||
categories[bucket].append(entry)
|
|
||||||
return {"categories": categories}
|
|
||||||
|
|
||||||
|
|
||||||
@app.get("/api/benchmark/results")
|
@app.get("/api/benchmark/results")
|
||||||
def get_benchmark_results():
|
def get_benchmark_results():
|
||||||
"""Return the most recently saved benchmark results, or an empty envelope."""
|
"""Return the most recently saved benchmark results, or an empty envelope."""
|
||||||
|
|
@ -388,17 +343,13 @@ def get_benchmark_results():
|
||||||
|
|
||||||
|
|
||||||
@app.get("/api/benchmark/run")
|
@app.get("/api/benchmark/run")
|
||||||
def run_benchmark(include_slow: bool = False, model_names: str = ""):
|
def run_benchmark(include_slow: bool = False):
|
||||||
"""Spawn the benchmark script and stream stdout as SSE progress events."""
|
"""Spawn the benchmark script and stream stdout as SSE progress events."""
|
||||||
python_bin = "/devl/miniconda3/envs/job-seeker-classifiers/bin/python"
|
python_bin = "/devl/miniconda3/envs/job-seeker-classifiers/bin/python"
|
||||||
script = str(_ROOT / "scripts" / "benchmark_classifier.py")
|
script = str(_ROOT / "scripts" / "benchmark_classifier.py")
|
||||||
cmd = [python_bin, script, "--score", "--save"]
|
cmd = [python_bin, script, "--score", "--save"]
|
||||||
if include_slow:
|
if include_slow:
|
||||||
cmd.append("--include-slow")
|
cmd.append("--include-slow")
|
||||||
if model_names:
|
|
||||||
names = [n.strip() for n in model_names.split(",") if n.strip()]
|
|
||||||
if names:
|
|
||||||
cmd.extend(["--models"] + names)
|
|
||||||
|
|
||||||
def generate():
|
def generate():
|
||||||
try:
|
try:
|
||||||
|
|
|
||||||
|
|
@ -1,6 +1,6 @@
|
||||||
"""Avocet — IMAP fetch utilities.
|
"""Avocet — IMAP fetch utilities.
|
||||||
|
|
||||||
Shared between app/api.py (FastAPI SSE endpoint) and the label UI.
|
Shared between app/api.py (FastAPI SSE endpoint) and app/label_tool.py (Streamlit).
|
||||||
No Streamlit imports here — stdlib + imaplib only.
|
No Streamlit imports here — stdlib + imaplib only.
|
||||||
"""
|
"""
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
@ -8,11 +8,36 @@ from __future__ import annotations
|
||||||
import email as _email_lib
|
import email as _email_lib
|
||||||
import hashlib
|
import hashlib
|
||||||
import imaplib
|
import imaplib
|
||||||
|
import re
|
||||||
from datetime import datetime, timedelta
|
from datetime import datetime, timedelta
|
||||||
from email.header import decode_header as _raw_decode
|
from email.header import decode_header as _raw_decode
|
||||||
|
from html.parser import HTMLParser
|
||||||
from typing import Any, Iterator
|
from typing import Any, Iterator
|
||||||
|
|
||||||
from app.utils import extract_body, strip_html # noqa: F401 (strip_html re-exported for callers)
|
|
||||||
|
# ── HTML → plain text ────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
class _TextExtractor(HTMLParser):
|
||||||
|
def __init__(self):
|
||||||
|
super().__init__()
|
||||||
|
self._parts: list[str] = []
|
||||||
|
|
||||||
|
def handle_data(self, data: str) -> None:
|
||||||
|
stripped = data.strip()
|
||||||
|
if stripped:
|
||||||
|
self._parts.append(stripped)
|
||||||
|
|
||||||
|
def get_text(self) -> str:
|
||||||
|
return " ".join(self._parts)
|
||||||
|
|
||||||
|
|
||||||
|
def strip_html(html_str: str) -> str:
|
||||||
|
try:
|
||||||
|
ex = _TextExtractor()
|
||||||
|
ex.feed(html_str)
|
||||||
|
return ex.get_text()
|
||||||
|
except Exception:
|
||||||
|
return re.sub(r"<[^>]+>", " ", html_str).strip()
|
||||||
|
|
||||||
|
|
||||||
# ── IMAP decode helpers ───────────────────────────────────────────────────────
|
# ── IMAP decode helpers ───────────────────────────────────────────────────────
|
||||||
|
|
@ -30,6 +55,37 @@ def _decode_str(value: str | None) -> str:
|
||||||
return " ".join(out).strip()
|
return " ".join(out).strip()
|
||||||
|
|
||||||
|
|
||||||
|
def _extract_body(msg: Any) -> str:
|
||||||
|
if msg.is_multipart():
|
||||||
|
html_fallback: str | None = None
|
||||||
|
for part in msg.walk():
|
||||||
|
ct = part.get_content_type()
|
||||||
|
if ct == "text/plain":
|
||||||
|
try:
|
||||||
|
charset = part.get_content_charset() or "utf-8"
|
||||||
|
return part.get_payload(decode=True).decode(charset, errors="replace")
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
elif ct == "text/html" and html_fallback is None:
|
||||||
|
try:
|
||||||
|
charset = part.get_content_charset() or "utf-8"
|
||||||
|
raw = part.get_payload(decode=True).decode(charset, errors="replace")
|
||||||
|
html_fallback = strip_html(raw)
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
return html_fallback or ""
|
||||||
|
else:
|
||||||
|
try:
|
||||||
|
charset = msg.get_content_charset() or "utf-8"
|
||||||
|
raw = msg.get_payload(decode=True).decode(charset, errors="replace")
|
||||||
|
if msg.get_content_type() == "text/html":
|
||||||
|
return strip_html(raw)
|
||||||
|
return raw
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
return ""
|
||||||
|
|
||||||
|
|
||||||
def entry_key(e: dict) -> str:
|
def entry_key(e: dict) -> str:
|
||||||
"""Stable MD5 content-hash for dedup — matches label_tool.py _entry_key."""
|
"""Stable MD5 content-hash for dedup — matches label_tool.py _entry_key."""
|
||||||
key = (e.get("subject", "") + (e.get("body", "") or "")[:100])
|
key = (e.get("subject", "") + (e.get("body", "") or "")[:100])
|
||||||
|
|
@ -137,7 +193,7 @@ def fetch_account_stream(
|
||||||
subj = _decode_str(msg.get("Subject", ""))
|
subj = _decode_str(msg.get("Subject", ""))
|
||||||
from_addr = _decode_str(msg.get("From", ""))
|
from_addr = _decode_str(msg.get("From", ""))
|
||||||
date = _decode_str(msg.get("Date", ""))
|
date = _decode_str(msg.get("Date", ""))
|
||||||
body = extract_body(msg)[:800]
|
body = _extract_body(msg)[:800]
|
||||||
entry = {"subject": subj, "body": body, "from_addr": from_addr,
|
entry = {"subject": subj, "body": body, "from_addr": from_addr,
|
||||||
"date": date, "account": name}
|
"date": date, "account": name}
|
||||||
k = entry_key(entry)
|
k = entry_key(entry)
|
||||||
|
|
|
||||||
1186
app/label_tool.py
Normal file
1186
app/label_tool.py
Normal file
File diff suppressed because it is too large
Load diff
428
app/models.py
428
app/models.py
|
|
@ -1,428 +0,0 @@
|
||||||
"""Avocet — HF model lifecycle API.
|
|
||||||
|
|
||||||
Handles model metadata lookup, approval queue, download with progress,
|
|
||||||
and installed model management.
|
|
||||||
|
|
||||||
All endpoints are registered on `router` (a FastAPI APIRouter).
|
|
||||||
api.py includes this router with prefix="/api/models".
|
|
||||||
|
|
||||||
Module-level globals (_MODELS_DIR, _QUEUE_DIR) follow the same
|
|
||||||
testability pattern as sft.py — override them via set_models_dir() and
|
|
||||||
set_queue_dir() in test fixtures.
|
|
||||||
"""
|
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
import json
|
|
||||||
import logging
|
|
||||||
import shutil
|
|
||||||
import threading
|
|
||||||
from datetime import datetime, timezone
|
|
||||||
from pathlib import Path
|
|
||||||
from typing import Any
|
|
||||||
from uuid import uuid4
|
|
||||||
|
|
||||||
import httpx
|
|
||||||
from fastapi import APIRouter, HTTPException
|
|
||||||
from fastapi.responses import StreamingResponse
|
|
||||||
from pydantic import BaseModel
|
|
||||||
|
|
||||||
from app.utils import read_jsonl, write_jsonl
|
|
||||||
|
|
||||||
try:
|
|
||||||
from huggingface_hub import snapshot_download
|
|
||||||
except ImportError: # pragma: no cover
|
|
||||||
snapshot_download = None # type: ignore[assignment]
|
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
|
||||||
|
|
||||||
_ROOT = Path(__file__).parent.parent
|
|
||||||
_MODELS_DIR: Path = _ROOT / "models"
|
|
||||||
_QUEUE_DIR: Path = _ROOT / "data"
|
|
||||||
|
|
||||||
router = APIRouter()
|
|
||||||
|
|
||||||
# ── Download progress shared state ────────────────────────────────────────────
|
|
||||||
# Updated by the background download thread; read by GET /download/stream.
|
|
||||||
_download_progress: dict[str, Any] = {}
|
|
||||||
|
|
||||||
# ── HF pipeline_tag → adapter recommendation ──────────────────────────────────
|
|
||||||
_TAG_TO_ADAPTER: dict[str, str] = {
|
|
||||||
"zero-shot-classification": "ZeroShotAdapter",
|
|
||||||
"text-classification": "ZeroShotAdapter",
|
|
||||||
"natural-language-inference": "ZeroShotAdapter",
|
|
||||||
"sentence-similarity": "RerankerAdapter",
|
|
||||||
"text-ranking": "RerankerAdapter",
|
|
||||||
"text-generation": "GenerationAdapter",
|
|
||||||
"text2text-generation": "GenerationAdapter",
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
# ── Testability seams ──────────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
def set_models_dir(path: Path) -> None:
|
|
||||||
global _MODELS_DIR
|
|
||||||
_MODELS_DIR = path
|
|
||||||
|
|
||||||
|
|
||||||
def set_queue_dir(path: Path) -> None:
|
|
||||||
global _QUEUE_DIR
|
|
||||||
_QUEUE_DIR = path
|
|
||||||
|
|
||||||
|
|
||||||
# ── Internal helpers ───────────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
def _queue_file() -> Path:
|
|
||||||
return _QUEUE_DIR / "model_queue.jsonl"
|
|
||||||
|
|
||||||
|
|
||||||
def _read_queue() -> list[dict]:
|
|
||||||
return read_jsonl(_queue_file())
|
|
||||||
|
|
||||||
|
|
||||||
def _write_queue(records: list[dict]) -> None:
|
|
||||||
write_jsonl(_queue_file(), records)
|
|
||||||
|
|
||||||
|
|
||||||
def _safe_model_name(repo_id: str) -> str:
|
|
||||||
"""Convert repo_id to a filesystem-safe directory name (HF convention)."""
|
|
||||||
return repo_id.replace("/", "--")
|
|
||||||
|
|
||||||
|
|
||||||
def _is_installed(repo_id: str) -> bool:
|
|
||||||
"""Check if a model is already downloaded in _MODELS_DIR."""
|
|
||||||
safe_name = _safe_model_name(repo_id)
|
|
||||||
model_dir = _MODELS_DIR / safe_name
|
|
||||||
return model_dir.exists() and (
|
|
||||||
(model_dir / "config.json").exists()
|
|
||||||
or (model_dir / "training_info.json").exists()
|
|
||||||
or (model_dir / "model_info.json").exists()
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def _is_queued(repo_id: str) -> bool:
|
|
||||||
"""Check if repo_id is already in the queue (non-dismissed)."""
|
|
||||||
for entry in _read_queue():
|
|
||||||
if entry.get("repo_id") == repo_id and entry.get("status") != "dismissed":
|
|
||||||
return True
|
|
||||||
return False
|
|
||||||
|
|
||||||
|
|
||||||
def _update_queue_entry(entry_id: str, updates: dict) -> dict | None:
|
|
||||||
"""Update a queue entry by id. Returns updated entry or None if not found."""
|
|
||||||
records = _read_queue()
|
|
||||||
for i, r in enumerate(records):
|
|
||||||
if r.get("id") == entry_id:
|
|
||||||
records[i] = {**r, **updates}
|
|
||||||
_write_queue(records)
|
|
||||||
return records[i]
|
|
||||||
return None
|
|
||||||
|
|
||||||
|
|
||||||
def _get_queue_entry(entry_id: str) -> dict | None:
|
|
||||||
for r in _read_queue():
|
|
||||||
if r.get("id") == entry_id:
|
|
||||||
return r
|
|
||||||
return None
|
|
||||||
|
|
||||||
|
|
||||||
# ── Background download ────────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
def _run_download(entry_id: str, repo_id: str, pipeline_tag: str | None, adapter_recommendation: str | None) -> None:
|
|
||||||
"""Background thread: download model via huggingface_hub.snapshot_download."""
|
|
||||||
global _download_progress
|
|
||||||
safe_name = _safe_model_name(repo_id)
|
|
||||||
local_dir = _MODELS_DIR / safe_name
|
|
||||||
|
|
||||||
_download_progress = {
|
|
||||||
"active": True,
|
|
||||||
"repo_id": repo_id,
|
|
||||||
"downloaded_bytes": 0,
|
|
||||||
"total_bytes": 0,
|
|
||||||
"pct": 0.0,
|
|
||||||
"done": False,
|
|
||||||
"error": None,
|
|
||||||
}
|
|
||||||
|
|
||||||
try:
|
|
||||||
if snapshot_download is None:
|
|
||||||
raise RuntimeError("huggingface_hub is not installed")
|
|
||||||
|
|
||||||
snapshot_download(
|
|
||||||
repo_id=repo_id,
|
|
||||||
local_dir=str(local_dir),
|
|
||||||
)
|
|
||||||
|
|
||||||
# Write model_info.json alongside downloaded files
|
|
||||||
model_info = {
|
|
||||||
"repo_id": repo_id,
|
|
||||||
"pipeline_tag": pipeline_tag,
|
|
||||||
"adapter_recommendation": adapter_recommendation,
|
|
||||||
"downloaded_at": datetime.now(timezone.utc).isoformat(),
|
|
||||||
}
|
|
||||||
local_dir.mkdir(parents=True, exist_ok=True)
|
|
||||||
(local_dir / "model_info.json").write_text(
|
|
||||||
json.dumps(model_info, indent=2), encoding="utf-8"
|
|
||||||
)
|
|
||||||
|
|
||||||
_download_progress["done"] = True
|
|
||||||
_download_progress["pct"] = 100.0
|
|
||||||
_update_queue_entry(entry_id, {"status": "ready"})
|
|
||||||
|
|
||||||
except Exception as exc:
|
|
||||||
logger.exception("Download failed for %s: %s", repo_id, exc)
|
|
||||||
_download_progress["error"] = str(exc)
|
|
||||||
_download_progress["done"] = True
|
|
||||||
_update_queue_entry(entry_id, {"status": "failed", "error": str(exc)})
|
|
||||||
finally:
|
|
||||||
_download_progress["active"] = False
|
|
||||||
|
|
||||||
|
|
||||||
# ── GET /lookup ────────────────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
@router.get("/lookup")
|
|
||||||
def lookup_model(repo_id: str) -> dict:
|
|
||||||
"""Validate repo_id and fetch metadata from the HF API."""
|
|
||||||
# Validate: must contain exactly one '/', no whitespace
|
|
||||||
if "/" not in repo_id or any(c.isspace() for c in repo_id):
|
|
||||||
raise HTTPException(422, f"Invalid repo_id {repo_id!r}: must be 'owner/model-name' with no whitespace")
|
|
||||||
|
|
||||||
hf_url = f"https://huggingface.co/api/models/{repo_id}"
|
|
||||||
try:
|
|
||||||
resp = httpx.get(hf_url, timeout=10.0)
|
|
||||||
except httpx.RequestError as exc:
|
|
||||||
raise HTTPException(502, f"Network error reaching HuggingFace API: {exc}") from exc
|
|
||||||
|
|
||||||
if resp.status_code == 404:
|
|
||||||
raise HTTPException(404, f"Model {repo_id!r} not found on HuggingFace")
|
|
||||||
if resp.status_code != 200:
|
|
||||||
raise HTTPException(502, f"HuggingFace API returned status {resp.status_code}")
|
|
||||||
|
|
||||||
data = resp.json()
|
|
||||||
pipeline_tag = data.get("pipeline_tag")
|
|
||||||
adapter_recommendation = _TAG_TO_ADAPTER.get(pipeline_tag) if pipeline_tag else None
|
|
||||||
if pipeline_tag and adapter_recommendation is None:
|
|
||||||
logger.warning("Unknown pipeline_tag %r for %s — no adapter recommendation", pipeline_tag, repo_id)
|
|
||||||
|
|
||||||
# Estimate model size from siblings list
|
|
||||||
siblings = data.get("siblings") or []
|
|
||||||
model_size_bytes: int = sum(s.get("size", 0) for s in siblings if isinstance(s, dict))
|
|
||||||
|
|
||||||
# Description: first 300 chars of card data (modelId field used as fallback)
|
|
||||||
card_data = data.get("cardData") or {}
|
|
||||||
description_raw = card_data.get("description") or data.get("modelId") or ""
|
|
||||||
description = description_raw[:300] if description_raw else ""
|
|
||||||
|
|
||||||
return {
|
|
||||||
"repo_id": repo_id,
|
|
||||||
"pipeline_tag": pipeline_tag,
|
|
||||||
"adapter_recommendation": adapter_recommendation,
|
|
||||||
"model_size_bytes": model_size_bytes,
|
|
||||||
"description": description,
|
|
||||||
"tags": data.get("tags") or [],
|
|
||||||
"downloads": data.get("downloads") or 0,
|
|
||||||
"already_installed": _is_installed(repo_id),
|
|
||||||
"already_queued": _is_queued(repo_id),
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
# ── GET /queue ─────────────────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
@router.get("/queue")
|
|
||||||
def get_queue() -> list[dict]:
|
|
||||||
"""Return all non-dismissed queue entries sorted newest-first."""
|
|
||||||
records = _read_queue()
|
|
||||||
active = [r for r in records if r.get("status") != "dismissed"]
|
|
||||||
return sorted(active, key=lambda r: r.get("queued_at", ""), reverse=True)
|
|
||||||
|
|
||||||
|
|
||||||
# ── POST /queue ────────────────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
class QueueAddRequest(BaseModel):
|
|
||||||
repo_id: str
|
|
||||||
pipeline_tag: str | None = None
|
|
||||||
adapter_recommendation: str | None = None
|
|
||||||
|
|
||||||
|
|
||||||
@router.post("/queue", status_code=201)
|
|
||||||
def add_to_queue(req: QueueAddRequest) -> dict:
|
|
||||||
"""Add a model to the approval queue with status 'pending'."""
|
|
||||||
if _is_installed(req.repo_id):
|
|
||||||
raise HTTPException(409, f"{req.repo_id!r} is already installed")
|
|
||||||
if _is_queued(req.repo_id):
|
|
||||||
raise HTTPException(409, f"{req.repo_id!r} is already in the queue")
|
|
||||||
|
|
||||||
entry = {
|
|
||||||
"id": str(uuid4()),
|
|
||||||
"repo_id": req.repo_id,
|
|
||||||
"pipeline_tag": req.pipeline_tag,
|
|
||||||
"adapter_recommendation": req.adapter_recommendation,
|
|
||||||
"status": "pending",
|
|
||||||
"queued_at": datetime.now(timezone.utc).isoformat(),
|
|
||||||
}
|
|
||||||
records = _read_queue()
|
|
||||||
records.append(entry)
|
|
||||||
_write_queue(records)
|
|
||||||
return entry
|
|
||||||
|
|
||||||
|
|
||||||
# ── POST /queue/{id}/approve ───────────────────────────────────────────────────
|
|
||||||
|
|
||||||
@router.post("/queue/{entry_id}/approve")
|
|
||||||
def approve_queue_entry(entry_id: str) -> dict:
|
|
||||||
"""Approve a pending queue entry and start background download."""
|
|
||||||
entry = _get_queue_entry(entry_id)
|
|
||||||
if entry is None:
|
|
||||||
raise HTTPException(404, f"Queue entry {entry_id!r} not found")
|
|
||||||
if entry.get("status") != "pending":
|
|
||||||
raise HTTPException(409, f"Entry is not in pending state (current: {entry.get('status')!r})")
|
|
||||||
|
|
||||||
updated = _update_queue_entry(entry_id, {"status": "downloading"})
|
|
||||||
|
|
||||||
thread = threading.Thread(
|
|
||||||
target=_run_download,
|
|
||||||
args=(entry_id, entry["repo_id"], entry.get("pipeline_tag"), entry.get("adapter_recommendation")),
|
|
||||||
daemon=True,
|
|
||||||
name=f"model-download-{entry_id}",
|
|
||||||
)
|
|
||||||
thread.start()
|
|
||||||
|
|
||||||
return {"ok": True}
|
|
||||||
|
|
||||||
|
|
||||||
# ── DELETE /queue/{id} ─────────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
@router.delete("/queue/{entry_id}")
|
|
||||||
def dismiss_queue_entry(entry_id: str) -> dict:
|
|
||||||
"""Dismiss (soft-delete) a queue entry."""
|
|
||||||
entry = _get_queue_entry(entry_id)
|
|
||||||
if entry is None:
|
|
||||||
raise HTTPException(404, f"Queue entry {entry_id!r} not found")
|
|
||||||
|
|
||||||
_update_queue_entry(entry_id, {"status": "dismissed"})
|
|
||||||
return {"ok": True}
|
|
||||||
|
|
||||||
|
|
||||||
# ── GET /download/stream ───────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
@router.get("/download/stream")
|
|
||||||
def download_stream() -> StreamingResponse:
|
|
||||||
"""SSE stream of download progress. Yields one idle event if no download active."""
|
|
||||||
|
|
||||||
def generate():
|
|
||||||
prog = _download_progress
|
|
||||||
if not prog.get("active") and not (prog.get("done") and not prog.get("error")):
|
|
||||||
yield f"data: {json.dumps({'type': 'idle'})}\n\n"
|
|
||||||
return
|
|
||||||
|
|
||||||
if prog.get("done"):
|
|
||||||
if prog.get("error"):
|
|
||||||
yield f"data: {json.dumps({'type': 'error', 'error': prog['error']})}\n\n"
|
|
||||||
else:
|
|
||||||
yield f"data: {json.dumps({'type': 'done', 'repo_id': prog.get('repo_id')})}\n\n"
|
|
||||||
return
|
|
||||||
|
|
||||||
# Stream live progress
|
|
||||||
import time
|
|
||||||
while True:
|
|
||||||
p = dict(_download_progress)
|
|
||||||
if p.get("done"):
|
|
||||||
if p.get("error"):
|
|
||||||
yield f"data: {json.dumps({'type': 'error', 'error': p['error']})}\n\n"
|
|
||||||
else:
|
|
||||||
yield f"data: {json.dumps({'type': 'done', 'repo_id': p.get('repo_id')})}\n\n"
|
|
||||||
break
|
|
||||||
event = json.dumps({
|
|
||||||
"type": "progress",
|
|
||||||
"repo_id": p.get("repo_id"),
|
|
||||||
"downloaded_bytes": p.get("downloaded_bytes", 0),
|
|
||||||
"total_bytes": p.get("total_bytes", 0),
|
|
||||||
"pct": p.get("pct", 0.0),
|
|
||||||
})
|
|
||||||
yield f"data: {event}\n\n"
|
|
||||||
time.sleep(0.5)
|
|
||||||
|
|
||||||
return StreamingResponse(
|
|
||||||
generate(),
|
|
||||||
media_type="text/event-stream",
|
|
||||||
headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
# ── GET /installed ─────────────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
@router.get("/installed")
|
|
||||||
def list_installed() -> list[dict]:
|
|
||||||
"""Scan _MODELS_DIR and return info on each installed model."""
|
|
||||||
if not _MODELS_DIR.exists():
|
|
||||||
return []
|
|
||||||
|
|
||||||
results: list[dict] = []
|
|
||||||
for sub in _MODELS_DIR.iterdir():
|
|
||||||
if not sub.is_dir():
|
|
||||||
continue
|
|
||||||
|
|
||||||
has_training_info = (sub / "training_info.json").exists()
|
|
||||||
has_config = (sub / "config.json").exists()
|
|
||||||
has_model_info = (sub / "model_info.json").exists()
|
|
||||||
|
|
||||||
if not (has_training_info or has_config or has_model_info):
|
|
||||||
continue
|
|
||||||
|
|
||||||
model_type = "finetuned" if has_training_info else "downloaded"
|
|
||||||
|
|
||||||
# Compute directory size
|
|
||||||
size_bytes = sum(f.stat().st_size for f in sub.rglob("*") if f.is_file())
|
|
||||||
|
|
||||||
# Load adapter/model_id from model_info.json or training_info.json
|
|
||||||
adapter: str | None = None
|
|
||||||
model_id: str | None = None
|
|
||||||
|
|
||||||
if has_model_info:
|
|
||||||
try:
|
|
||||||
info = json.loads((sub / "model_info.json").read_text(encoding="utf-8"))
|
|
||||||
adapter = info.get("adapter_recommendation")
|
|
||||||
model_id = info.get("repo_id")
|
|
||||||
except Exception:
|
|
||||||
pass
|
|
||||||
elif has_training_info:
|
|
||||||
try:
|
|
||||||
info = json.loads((sub / "training_info.json").read_text(encoding="utf-8"))
|
|
||||||
adapter = info.get("adapter")
|
|
||||||
model_id = info.get("base_model") or info.get("model_id")
|
|
||||||
except Exception:
|
|
||||||
pass
|
|
||||||
|
|
||||||
results.append({
|
|
||||||
"name": sub.name,
|
|
||||||
"path": str(sub),
|
|
||||||
"type": model_type,
|
|
||||||
"adapter": adapter,
|
|
||||||
"size_bytes": size_bytes,
|
|
||||||
"model_id": model_id,
|
|
||||||
})
|
|
||||||
|
|
||||||
return results
|
|
||||||
|
|
||||||
|
|
||||||
# ── DELETE /installed/{name} ───────────────────────────────────────────────────
|
|
||||||
|
|
||||||
@router.delete("/installed/{name}")
|
|
||||||
def delete_installed(name: str) -> dict:
|
|
||||||
"""Remove an installed model directory by name. Blocks path traversal."""
|
|
||||||
# Validate: single path component, no slashes or '..'
|
|
||||||
if "/" in name or "\\" in name or ".." in name or not name or name.startswith("."):
|
|
||||||
raise HTTPException(400, f"Invalid model name {name!r}: must be a single directory name with no path separators or '..'")
|
|
||||||
|
|
||||||
model_path = _MODELS_DIR / name
|
|
||||||
|
|
||||||
# Extra safety: confirm resolved path is inside _MODELS_DIR
|
|
||||||
try:
|
|
||||||
model_path.resolve().relative_to(_MODELS_DIR.resolve())
|
|
||||||
except ValueError:
|
|
||||||
raise HTTPException(400, f"Path traversal detected for name {name!r}")
|
|
||||||
|
|
||||||
if not model_path.exists():
|
|
||||||
raise HTTPException(404, f"Installed model {name!r} not found")
|
|
||||||
|
|
||||||
shutil.rmtree(model_path)
|
|
||||||
return {"ok": True}
|
|
||||||
326
app/sft.py
326
app/sft.py
|
|
@ -1,326 +0,0 @@
|
||||||
"""Avocet — SFT candidate import and correction API.
|
|
||||||
|
|
||||||
All endpoints are registered on `router` (a FastAPI APIRouter).
|
|
||||||
api.py includes this router with prefix="/api/sft".
|
|
||||||
|
|
||||||
Module-level globals (_SFT_DATA_DIR, _SFT_CONFIG_DIR) follow the same
|
|
||||||
testability pattern as api.py — override them via set_sft_data_dir() and
|
|
||||||
set_sft_config_dir() in test fixtures.
|
|
||||||
"""
|
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
import json
|
|
||||||
import logging
|
|
||||||
from datetime import datetime, timezone
|
|
||||||
from pathlib import Path
|
|
||||||
from typing import Literal
|
|
||||||
|
|
||||||
import yaml
|
|
||||||
from fastapi import APIRouter, HTTPException
|
|
||||||
from fastapi.responses import StreamingResponse
|
|
||||||
from pydantic import BaseModel
|
|
||||||
|
|
||||||
from app.utils import append_jsonl, read_jsonl, write_jsonl
|
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
|
||||||
|
|
||||||
_ROOT = Path(__file__).parent.parent
|
|
||||||
_SFT_DATA_DIR: Path = _ROOT / "data"
|
|
||||||
_SFT_CONFIG_DIR: Path | None = None
|
|
||||||
|
|
||||||
router = APIRouter()
|
|
||||||
|
|
||||||
|
|
||||||
# ── Testability seams ──────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
def set_sft_data_dir(path: Path) -> None:
|
|
||||||
global _SFT_DATA_DIR
|
|
||||||
_SFT_DATA_DIR = path
|
|
||||||
|
|
||||||
|
|
||||||
def set_sft_config_dir(path: Path | None) -> None:
|
|
||||||
global _SFT_CONFIG_DIR
|
|
||||||
_SFT_CONFIG_DIR = path
|
|
||||||
|
|
||||||
|
|
||||||
# ── Internal helpers ───────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
def _config_file() -> Path:
|
|
||||||
if _SFT_CONFIG_DIR is not None:
|
|
||||||
return _SFT_CONFIG_DIR / "label_tool.yaml"
|
|
||||||
return _ROOT / "config" / "label_tool.yaml"
|
|
||||||
|
|
||||||
|
|
||||||
def _get_bench_results_dir() -> Path:
|
|
||||||
f = _config_file()
|
|
||||||
if not f.exists():
|
|
||||||
return Path("/nonexistent-bench-results")
|
|
||||||
try:
|
|
||||||
raw = yaml.safe_load(f.read_text(encoding="utf-8")) or {}
|
|
||||||
except yaml.YAMLError as exc:
|
|
||||||
logger.warning("Failed to parse SFT config %s: %s", f, exc)
|
|
||||||
return Path("/nonexistent-bench-results")
|
|
||||||
d = raw.get("sft", {}).get("bench_results_dir", "")
|
|
||||||
return Path(d) if d else Path("/nonexistent-bench-results")
|
|
||||||
|
|
||||||
|
|
||||||
def _candidates_file() -> Path:
|
|
||||||
return _SFT_DATA_DIR / "sft_candidates.jsonl"
|
|
||||||
|
|
||||||
|
|
||||||
def _approved_file() -> Path:
|
|
||||||
return _SFT_DATA_DIR / "sft_approved.jsonl"
|
|
||||||
|
|
||||||
|
|
||||||
def _read_candidates() -> list[dict]:
|
|
||||||
return read_jsonl(_candidates_file())
|
|
||||||
|
|
||||||
|
|
||||||
def _write_candidates(records: list[dict]) -> None:
|
|
||||||
write_jsonl(_candidates_file(), records)
|
|
||||||
|
|
||||||
|
|
||||||
def _is_exportable(r: dict) -> bool:
|
|
||||||
"""Return True if an approved record is ready to include in SFT export."""
|
|
||||||
return (
|
|
||||||
r.get("status") == "approved"
|
|
||||||
and bool(r.get("corrected_response"))
|
|
||||||
and str(r["corrected_response"]).strip() != ""
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
# ── GET /runs ──────────────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
@router.get("/runs")
|
|
||||||
def get_runs():
|
|
||||||
"""List available benchmark runs in the configured bench_results_dir."""
|
|
||||||
from scripts.sft_import import discover_runs
|
|
||||||
bench_dir = _get_bench_results_dir()
|
|
||||||
existing = _read_candidates()
|
|
||||||
# benchmark_run_id in each record equals the run's directory name by cf-orch convention
|
|
||||||
imported_run_ids = {
|
|
||||||
r["benchmark_run_id"]
|
|
||||||
for r in existing
|
|
||||||
if r.get("benchmark_run_id") is not None
|
|
||||||
}
|
|
||||||
runs = discover_runs(bench_dir)
|
|
||||||
return [
|
|
||||||
{
|
|
||||||
"run_id": r["run_id"],
|
|
||||||
"timestamp": r["timestamp"],
|
|
||||||
"candidate_count": r["candidate_count"],
|
|
||||||
"already_imported": r["run_id"] in imported_run_ids,
|
|
||||||
}
|
|
||||||
for r in runs
|
|
||||||
]
|
|
||||||
|
|
||||||
|
|
||||||
# ── POST /import ───────────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
class ImportRequest(BaseModel):
|
|
||||||
run_id: str
|
|
||||||
|
|
||||||
|
|
||||||
@router.post("/import")
|
|
||||||
def post_import(req: ImportRequest):
|
|
||||||
"""Import one benchmark run's sft_candidates.jsonl into the local data dir."""
|
|
||||||
from scripts.sft_import import discover_runs, import_run
|
|
||||||
bench_dir = _get_bench_results_dir()
|
|
||||||
runs = discover_runs(bench_dir)
|
|
||||||
run = next((r for r in runs if r["run_id"] == req.run_id), None)
|
|
||||||
if run is None:
|
|
||||||
raise HTTPException(404, f"Run {req.run_id!r} not found in bench_results_dir")
|
|
||||||
return import_run(run["sft_path"], _SFT_DATA_DIR)
|
|
||||||
|
|
||||||
|
|
||||||
# ── GET /queue ─────────────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
@router.get("/queue")
|
|
||||||
def get_queue(page: int = 1, per_page: int = 20):
|
|
||||||
"""Return paginated needs_review candidates."""
|
|
||||||
records = _read_candidates()
|
|
||||||
pending = [r for r in records if r.get("status") == "needs_review"]
|
|
||||||
start = (page - 1) * per_page
|
|
||||||
return {
|
|
||||||
"items": pending[start:start + per_page],
|
|
||||||
"total": len(pending),
|
|
||||||
"page": page,
|
|
||||||
"per_page": per_page,
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
# ── POST /submit ───────────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
FailureCategory = Literal[
|
|
||||||
"scoring_artifact",
|
|
||||||
"style_violation",
|
|
||||||
"partial_answer",
|
|
||||||
"wrong_answer",
|
|
||||||
"format_error",
|
|
||||||
"hallucination",
|
|
||||||
]
|
|
||||||
|
|
||||||
|
|
||||||
class SubmitRequest(BaseModel):
|
|
||||||
id: str
|
|
||||||
action: Literal["correct", "discard", "flag"]
|
|
||||||
corrected_response: str | None = None
|
|
||||||
failure_category: FailureCategory | None = None
|
|
||||||
|
|
||||||
|
|
||||||
@router.post("/submit")
|
|
||||||
def post_submit(req: SubmitRequest):
|
|
||||||
"""Record a reviewer decision for one SFT candidate."""
|
|
||||||
if req.action == "correct":
|
|
||||||
if not req.corrected_response or not req.corrected_response.strip():
|
|
||||||
raise HTTPException(422, "corrected_response must be non-empty when action is 'correct'")
|
|
||||||
|
|
||||||
records = _read_candidates()
|
|
||||||
idx = next((i for i, r in enumerate(records) if r.get("id") == req.id), None)
|
|
||||||
if idx is None:
|
|
||||||
raise HTTPException(404, f"Record {req.id!r} not found")
|
|
||||||
|
|
||||||
record = records[idx]
|
|
||||||
if record.get("status") != "needs_review":
|
|
||||||
raise HTTPException(409, f"Record is not in needs_review state (current: {record.get('status')})")
|
|
||||||
|
|
||||||
if req.action == "correct":
|
|
||||||
records[idx] = {
|
|
||||||
**record,
|
|
||||||
"status": "approved",
|
|
||||||
"corrected_response": req.corrected_response,
|
|
||||||
"failure_category": req.failure_category,
|
|
||||||
}
|
|
||||||
_write_candidates(records)
|
|
||||||
append_jsonl(_approved_file(), records[idx])
|
|
||||||
elif req.action == "discard":
|
|
||||||
records[idx] = {**record, "status": "discarded"}
|
|
||||||
_write_candidates(records)
|
|
||||||
else: # flag
|
|
||||||
records[idx] = {**record, "status": "model_rejected"}
|
|
||||||
_write_candidates(records)
|
|
||||||
|
|
||||||
return {"ok": True}
|
|
||||||
|
|
||||||
|
|
||||||
# ── POST /undo ─────────────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
class UndoRequest(BaseModel):
|
|
||||||
id: str
|
|
||||||
|
|
||||||
|
|
||||||
@router.post("/undo")
|
|
||||||
def post_undo(req: UndoRequest):
|
|
||||||
"""Restore a previously actioned candidate back to needs_review."""
|
|
||||||
records = _read_candidates()
|
|
||||||
idx = next((i for i, r in enumerate(records) if r.get("id") == req.id), None)
|
|
||||||
if idx is None:
|
|
||||||
raise HTTPException(404, f"Record {req.id!r} not found")
|
|
||||||
|
|
||||||
record = records[idx]
|
|
||||||
old_status = record.get("status")
|
|
||||||
if old_status == "needs_review":
|
|
||||||
raise HTTPException(409, "Record is already in needs_review state")
|
|
||||||
|
|
||||||
records[idx] = {**record, "status": "needs_review", "corrected_response": None}
|
|
||||||
_write_candidates(records)
|
|
||||||
|
|
||||||
# If it was approved, remove from the approved file too
|
|
||||||
if old_status == "approved":
|
|
||||||
approved = read_jsonl(_approved_file())
|
|
||||||
write_jsonl(_approved_file(), [r for r in approved if r.get("id") != req.id])
|
|
||||||
|
|
||||||
return {"ok": True}
|
|
||||||
|
|
||||||
|
|
||||||
# ── GET /export ─────────────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
@router.get("/export")
|
|
||||||
def get_export() -> StreamingResponse:
|
|
||||||
"""Stream approved records as SFT-ready JSONL for download."""
|
|
||||||
exportable = [r for r in read_jsonl(_approved_file()) if _is_exportable(r)]
|
|
||||||
|
|
||||||
def generate():
|
|
||||||
for r in exportable:
|
|
||||||
record = {
|
|
||||||
"messages": r.get("prompt_messages", []) + [
|
|
||||||
{"role": "assistant", "content": r["corrected_response"]}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
yield json.dumps(record) + "\n"
|
|
||||||
|
|
||||||
timestamp = datetime.now(timezone.utc).strftime("%Y%m%d-%H%M%S")
|
|
||||||
return StreamingResponse(
|
|
||||||
generate(),
|
|
||||||
media_type="application/x-ndjson",
|
|
||||||
headers={
|
|
||||||
"Content-Disposition": f'attachment; filename="sft_export_{timestamp}.jsonl"'
|
|
||||||
},
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
# ── GET /stats ──────────────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
@router.get("/stats")
|
|
||||||
def get_stats() -> dict[str, object]:
|
|
||||||
"""Return counts by status, model, and task type."""
|
|
||||||
records = _read_candidates()
|
|
||||||
by_status: dict[str, int] = {}
|
|
||||||
by_model: dict[str, int] = {}
|
|
||||||
by_task_type: dict[str, int] = {}
|
|
||||||
|
|
||||||
for r in records:
|
|
||||||
status = r.get("status", "unknown")
|
|
||||||
by_status[status] = by_status.get(status, 0) + 1
|
|
||||||
model = r.get("model_name", "unknown")
|
|
||||||
by_model[model] = by_model.get(model, 0) + 1
|
|
||||||
task_type = r.get("task_type", "unknown")
|
|
||||||
by_task_type[task_type] = by_task_type.get(task_type, 0) + 1
|
|
||||||
|
|
||||||
approved = read_jsonl(_approved_file())
|
|
||||||
export_ready = sum(1 for r in approved if _is_exportable(r))
|
|
||||||
|
|
||||||
return {
|
|
||||||
"total": len(records),
|
|
||||||
"by_status": by_status,
|
|
||||||
"by_model": by_model,
|
|
||||||
"by_task_type": by_task_type,
|
|
||||||
"export_ready": export_ready,
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
# ── GET /config ─────────────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
@router.get("/config")
|
|
||||||
def get_sft_config() -> dict:
|
|
||||||
"""Return the current SFT configuration (bench_results_dir)."""
|
|
||||||
f = _config_file()
|
|
||||||
if not f.exists():
|
|
||||||
return {"bench_results_dir": ""}
|
|
||||||
try:
|
|
||||||
raw = yaml.safe_load(f.read_text(encoding="utf-8")) or {}
|
|
||||||
except yaml.YAMLError:
|
|
||||||
return {"bench_results_dir": ""}
|
|
||||||
sft_section = raw.get("sft") or {}
|
|
||||||
return {"bench_results_dir": sft_section.get("bench_results_dir", "")}
|
|
||||||
|
|
||||||
|
|
||||||
class SftConfigPayload(BaseModel):
|
|
||||||
bench_results_dir: str
|
|
||||||
|
|
||||||
|
|
||||||
@router.post("/config")
|
|
||||||
def post_sft_config(payload: SftConfigPayload) -> dict:
|
|
||||||
"""Write the bench_results_dir setting to the config file."""
|
|
||||||
f = _config_file()
|
|
||||||
f.parent.mkdir(parents=True, exist_ok=True)
|
|
||||||
try:
|
|
||||||
raw = yaml.safe_load(f.read_text(encoding="utf-8")) if f.exists() else {}
|
|
||||||
raw = raw or {}
|
|
||||||
except yaml.YAMLError:
|
|
||||||
raw = {}
|
|
||||||
raw["sft"] = {"bench_results_dir": payload.bench_results_dir}
|
|
||||||
tmp = f.with_suffix(".tmp")
|
|
||||||
tmp.write_text(yaml.dump(raw, allow_unicode=True, sort_keys=False), encoding="utf-8")
|
|
||||||
tmp.rename(f)
|
|
||||||
return {"ok": True}
|
|
||||||
117
app/utils.py
117
app/utils.py
|
|
@ -1,117 +0,0 @@
|
||||||
"""Shared email utility functions for Avocet.
|
|
||||||
|
|
||||||
Pure-stdlib helpers extracted from the retired label_tool.py Streamlit app.
|
|
||||||
These are reused by the FastAPI backend and the test suite.
|
|
||||||
"""
|
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
import json
|
|
||||||
import re
|
|
||||||
from html.parser import HTMLParser
|
|
||||||
from pathlib import Path
|
|
||||||
from typing import Any
|
|
||||||
|
|
||||||
|
|
||||||
# ── HTML → plain-text extractor ──────────────────────────────────────────────
|
|
||||||
|
|
||||||
class _TextExtractor(HTMLParser):
|
|
||||||
"""Extract visible text from an HTML email body, preserving line breaks."""
|
|
||||||
_BLOCK = {"p", "div", "br", "li", "tr", "h1", "h2", "h3", "h4", "h5", "h6", "blockquote"}
|
|
||||||
_SKIP = {"script", "style", "head", "noscript"}
|
|
||||||
|
|
||||||
def __init__(self):
|
|
||||||
super().__init__(convert_charrefs=True)
|
|
||||||
self._parts: list[str] = []
|
|
||||||
self._depth_skip = 0
|
|
||||||
|
|
||||||
def handle_starttag(self, tag, attrs):
|
|
||||||
tag = tag.lower()
|
|
||||||
if tag in self._SKIP:
|
|
||||||
self._depth_skip += 1
|
|
||||||
elif tag in self._BLOCK:
|
|
||||||
self._parts.append("\n")
|
|
||||||
|
|
||||||
def handle_endtag(self, tag):
|
|
||||||
if tag.lower() in self._SKIP:
|
|
||||||
self._depth_skip = max(0, self._depth_skip - 1)
|
|
||||||
|
|
||||||
def handle_data(self, data):
|
|
||||||
if not self._depth_skip:
|
|
||||||
self._parts.append(data)
|
|
||||||
|
|
||||||
def get_text(self) -> str:
|
|
||||||
text = "".join(self._parts)
|
|
||||||
lines = [ln.strip() for ln in text.splitlines()]
|
|
||||||
return "\n".join(ln for ln in lines if ln)
|
|
||||||
|
|
||||||
|
|
||||||
def strip_html(html_str: str) -> str:
|
|
||||||
"""Convert HTML email body to plain text. Pure stdlib, no dependencies."""
|
|
||||||
try:
|
|
||||||
extractor = _TextExtractor()
|
|
||||||
extractor.feed(html_str)
|
|
||||||
return extractor.get_text()
|
|
||||||
except Exception:
|
|
||||||
return re.sub(r"<[^>]+>", " ", html_str).strip()
|
|
||||||
|
|
||||||
|
|
||||||
def extract_body(msg: Any) -> str:
|
|
||||||
"""Return plain-text body. Strips HTML when no text/plain part exists."""
|
|
||||||
if msg.is_multipart():
|
|
||||||
html_fallback: str | None = None
|
|
||||||
for part in msg.walk():
|
|
||||||
ct = part.get_content_type()
|
|
||||||
if ct == "text/plain":
|
|
||||||
try:
|
|
||||||
charset = part.get_content_charset() or "utf-8"
|
|
||||||
return part.get_payload(decode=True).decode(charset, errors="replace")
|
|
||||||
except Exception:
|
|
||||||
pass
|
|
||||||
elif ct == "text/html" and html_fallback is None:
|
|
||||||
try:
|
|
||||||
charset = part.get_content_charset() or "utf-8"
|
|
||||||
raw = part.get_payload(decode=True).decode(charset, errors="replace")
|
|
||||||
html_fallback = strip_html(raw)
|
|
||||||
except Exception:
|
|
||||||
pass
|
|
||||||
return html_fallback or ""
|
|
||||||
else:
|
|
||||||
try:
|
|
||||||
charset = msg.get_content_charset() or "utf-8"
|
|
||||||
raw = msg.get_payload(decode=True).decode(charset, errors="replace")
|
|
||||||
if msg.get_content_type() == "text/html":
|
|
||||||
return strip_html(raw)
|
|
||||||
return raw
|
|
||||||
except Exception:
|
|
||||||
pass
|
|
||||||
return ""
|
|
||||||
|
|
||||||
|
|
||||||
def read_jsonl(path: Path) -> list[dict]:
|
|
||||||
"""Read a JSONL file, returning valid records. Skips blank lines and malformed JSON."""
|
|
||||||
if not path.exists():
|
|
||||||
return []
|
|
||||||
records: list[dict] = []
|
|
||||||
for line in path.read_text(encoding="utf-8").splitlines():
|
|
||||||
line = line.strip()
|
|
||||||
if not line:
|
|
||||||
continue
|
|
||||||
try:
|
|
||||||
records.append(json.loads(line))
|
|
||||||
except json.JSONDecodeError:
|
|
||||||
pass
|
|
||||||
return records
|
|
||||||
|
|
||||||
|
|
||||||
def write_jsonl(path: Path, records: list[dict]) -> None:
|
|
||||||
"""Write records to a JSONL file, overwriting any existing content."""
|
|
||||||
path.parent.mkdir(parents=True, exist_ok=True)
|
|
||||||
content = "\n".join(json.dumps(r) for r in records)
|
|
||||||
path.write_text(content + ("\n" if records else ""), encoding="utf-8")
|
|
||||||
|
|
||||||
|
|
||||||
def append_jsonl(path: Path, record: dict) -> None:
|
|
||||||
"""Append a single record to a JSONL file."""
|
|
||||||
path.parent.mkdir(parents=True, exist_ok=True)
|
|
||||||
with open(path, "a", encoding="utf-8") as fh:
|
|
||||||
fh.write(json.dumps(record) + "\n")
|
|
||||||
|
|
@ -21,8 +21,3 @@ accounts:
|
||||||
|
|
||||||
# Optional: limit emails fetched per account per run (0 = unlimited)
|
# Optional: limit emails fetched per account per run (0 = unlimited)
|
||||||
max_per_account: 500
|
max_per_account: 500
|
||||||
|
|
||||||
# cf-orch SFT candidate import — path to the bench_results/ directory
|
|
||||||
# produced by circuitforge-orch's benchmark harness.
|
|
||||||
sft:
|
|
||||||
bench_results_dir: /path/to/circuitforge-orch/scripts/bench_results
|
|
||||||
|
|
|
||||||
95
docs/plans/2026-03-08-anime-animation-design.md
Normal file
95
docs/plans/2026-03-08-anime-animation-design.md
Normal file
|
|
@ -0,0 +1,95 @@
|
||||||
|
# Anime.js Animation Integration — Design
|
||||||
|
|
||||||
|
**Date:** 2026-03-08
|
||||||
|
**Status:** Approved
|
||||||
|
**Branch:** feat/vue-label-tab
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
|
||||||
|
The current animation system mixes CSS keyframes, CSS transitions, and imperative inline-style bindings across three files. The seams between systems produce:
|
||||||
|
|
||||||
|
- Abrupt ball pickup (instant scale/borderRadius jump)
|
||||||
|
- No spring snap-back on release to no target
|
||||||
|
- Rigid CSS dismissals with no timing control
|
||||||
|
- Bucket grid and badge pop on basic `@keyframes`
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
Integrate **Anime.js v4** as a single animation layer. Vue reactive state is unchanged; Anime.js owns all DOM motion imperatively.
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
One new composable, minimal changes to two existing files, CSS cleanup in two files.
|
||||||
|
|
||||||
|
```
|
||||||
|
web/src/composables/useCardAnimation.ts ← NEW
|
||||||
|
web/src/components/EmailCardStack.vue ← modify
|
||||||
|
web/src/views/LabelView.vue ← modify
|
||||||
|
```
|
||||||
|
|
||||||
|
**Data flow:**
|
||||||
|
```
|
||||||
|
pointer events → Vue refs (isHeld, deltaX, deltaY, dismissType)
|
||||||
|
↓ watched by
|
||||||
|
useCardAnimation(cardEl, stackEl, isHeld, ...)
|
||||||
|
↓ imperatively drives
|
||||||
|
Anime.js → DOM transforms
|
||||||
|
```
|
||||||
|
|
||||||
|
`useCardAnimation` is a pure side-effect composable — returns nothing to the template. The `cardStyle` computed in `EmailCardStack.vue` is removed; Anime.js owns the element's transform directly.
|
||||||
|
|
||||||
|
## Animation Surfaces
|
||||||
|
|
||||||
|
### Pickup morph
|
||||||
|
```
|
||||||
|
animate(cardEl, { scale: 0.55, borderRadius: '50%', y: -80 }, { duration: 200, ease: spring(1, 80, 10) })
|
||||||
|
```
|
||||||
|
Replaces the instant CSS transform jump on `onPointerDown`.
|
||||||
|
|
||||||
|
### Drag tracking
|
||||||
|
Raw `cardEl.style.translate` update on `onPointerMove` — no animation, just position. Easing only at boundaries (pickup / release), not during active drag.
|
||||||
|
|
||||||
|
### Snap-back
|
||||||
|
```
|
||||||
|
animate(cardEl, { x: 0, y: 0, scale: 1, borderRadius: '1rem' }, { ease: spring(1, 80, 10) })
|
||||||
|
```
|
||||||
|
Fires on `onPointerUp` when no zone/bucket target was hit.
|
||||||
|
|
||||||
|
### Dismissals (replace CSS `@keyframes`)
|
||||||
|
- **fileAway** — `animate(cardEl, { y: '-120%', scale: 0.85, opacity: 0 }, { duration: 280, ease: 'out(3)' })`
|
||||||
|
- **crumple** — 2-step timeline: shrink + redden → `scale(0)` + rotate
|
||||||
|
- **slideUnder** — `animate(cardEl, { x: '110%', rotate: 5, opacity: 0 }, { duration: 260 })`
|
||||||
|
|
||||||
|
### Bucket grid rise
|
||||||
|
`animate(gridEl, { y: -8, opacity: 0.45 })` on `isHeld` → true; reversed on false. Spring easing.
|
||||||
|
|
||||||
|
### Badge pop
|
||||||
|
`animate(badgeEl, { scale: [0.6, 1], opacity: [0, 1] }, { ease: spring(1.5, 80, 8), duration: 300 })` triggered on badge mount via Vue's `onMounted` lifecycle hook in a `BadgePop` wrapper component or `v-enter-active` transition hook.
|
||||||
|
|
||||||
|
## Constraints
|
||||||
|
|
||||||
|
### Reduced motion
|
||||||
|
`useCardAnimation` checks `motion.rich.value` before firing any Anime.js call. If false, all animations are skipped — instant state changes only. Consistent with existing `useMotion` pattern.
|
||||||
|
|
||||||
|
### Bundle size
|
||||||
|
Anime.js v4 core ~17KB gzipped. Only `animate`, `spring`, and `createTimeline` are imported — Vite ESM tree-shaking keeps footprint minimal. The `draggable` module is not used.
|
||||||
|
|
||||||
|
### Tests
|
||||||
|
Existing `EmailCardStack.test.ts` tests emit behavior, not animation — they remain passing. Anime.js mocked at module level in Vitest via `vi.mock('animejs')` where needed.
|
||||||
|
|
||||||
|
### CSS cleanup
|
||||||
|
Remove from `EmailCardStack.vue` and `LabelView.vue`:
|
||||||
|
- `@keyframes fileAway`, `crumple`, `slideUnder`
|
||||||
|
- `@keyframes badge-pop`
|
||||||
|
- `.dismiss-label`, `.dismiss-skip`, `.dismiss-discard` classes (Anime.js fires on element refs directly)
|
||||||
|
- The `dismissClass` computed in `EmailCardStack.vue`
|
||||||
|
|
||||||
|
## Files Changed
|
||||||
|
|
||||||
|
| File | Change |
|
||||||
|
|------|--------|
|
||||||
|
| `web/package.json` | Add `animejs` dependency |
|
||||||
|
| `web/src/composables/useCardAnimation.ts` | New — all Anime.js animation logic |
|
||||||
|
| `web/src/components/EmailCardStack.vue` | Remove `cardStyle` computed + dismiss classes; call `useCardAnimation` |
|
||||||
|
| `web/src/views/LabelView.vue` | Badge pop + bucket grid rise via Anime.js |
|
||||||
|
| `web/src/assets/avocet.css` | Remove any global animation keyframes if present |
|
||||||
573
docs/plans/2026-03-08-anime-animation-plan.md
Normal file
573
docs/plans/2026-03-08-anime-animation-plan.md
Normal file
|
|
@ -0,0 +1,573 @@
|
||||||
|
# Anime.js Animation Integration — Implementation Plan
|
||||||
|
|
||||||
|
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
|
||||||
|
|
||||||
|
**Goal:** Replace the current mixed CSS keyframes / inline-style animation system with Anime.js v4 for all card motion — pickup morph, drag tracking, spring snap-back, dismissals, bucket grid rise, and badge pop.
|
||||||
|
|
||||||
|
**Architecture:** A new `useCardAnimation` composable owns all Anime.js calls imperatively against DOM refs. Vue reactive state (`isHeld`, `deltaX`, `deltaY`, `dismissType`) is unchanged. `cardStyle` computed and `dismissClass` computed are deleted; Anime.js writes to the element directly.
|
||||||
|
|
||||||
|
**Tech Stack:** Anime.js v4 (`animejs`), Vue 3 Composition API, `@vue/test-utils` + Vitest for tests.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task 1: Install Anime.js
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Modify: `web/package.json`
|
||||||
|
|
||||||
|
**Step 1: Install the package**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd /Library/Development/CircuitForge/avocet/web
|
||||||
|
npm install animejs
|
||||||
|
```
|
||||||
|
|
||||||
|
**Step 2: Verify the import resolves**
|
||||||
|
|
||||||
|
Create a throwaway check — open `web/src/main.ts` briefly and confirm:
|
||||||
|
```ts
|
||||||
|
import { animate, spring } from 'animejs'
|
||||||
|
```
|
||||||
|
resolves without error in the editor (TypeScript types ship with animejs v4).
|
||||||
|
Remove the import immediately after verifying — do not commit it.
|
||||||
|
|
||||||
|
**Step 3: Commit**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd /Library/Development/CircuitForge/avocet/web
|
||||||
|
git add package.json package-lock.json
|
||||||
|
git commit -m "feat(avocet): add animejs v4 dependency"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task 2: Create `useCardAnimation` composable
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Create: `web/src/composables/useCardAnimation.ts`
|
||||||
|
- Create: `web/src/composables/useCardAnimation.test.ts`
|
||||||
|
|
||||||
|
**Background — Anime.js v4 transform model:**
|
||||||
|
Anime.js v4 tracks `x`, `y`, `scale`, `rotate`, etc. as separate transform components internally.
|
||||||
|
Use `utils.set(el, props)` for instant (no-animation) property updates — this keeps the internal cache consistent.
|
||||||
|
Never mix direct `el.style.transform = "..."` with Anime.js on the same element, or the cache desyncs.
|
||||||
|
|
||||||
|
**Step 1: Write the failing tests**
|
||||||
|
|
||||||
|
`web/src/composables/useCardAnimation.test.ts`:
|
||||||
|
```ts
|
||||||
|
import { ref, nextTick } from 'vue'
|
||||||
|
import { describe, it, expect, vi, beforeEach } from 'vitest'
|
||||||
|
|
||||||
|
// Mock animejs before importing the composable
|
||||||
|
vi.mock('animejs', () => ({
|
||||||
|
animate: vi.fn(),
|
||||||
|
spring: vi.fn(() => 'mock-spring'),
|
||||||
|
utils: { set: vi.fn() },
|
||||||
|
}))
|
||||||
|
|
||||||
|
import { useCardAnimation } from './useCardAnimation'
|
||||||
|
import { animate, utils } from 'animejs'
|
||||||
|
|
||||||
|
const mockAnimate = animate as ReturnType<typeof vi.fn>
|
||||||
|
const mockSet = utils.set as ReturnType<typeof vi.fn>
|
||||||
|
|
||||||
|
function makeEl() {
|
||||||
|
return document.createElement('div')
|
||||||
|
}
|
||||||
|
|
||||||
|
describe('useCardAnimation', () => {
|
||||||
|
beforeEach(() => {
|
||||||
|
vi.clearAllMocks()
|
||||||
|
})
|
||||||
|
|
||||||
|
it('pickup() calls animate with ball shape', () => {
|
||||||
|
const el = makeEl()
|
||||||
|
const cardEl = ref<HTMLElement | null>(el)
|
||||||
|
const motion = { rich: ref(true) }
|
||||||
|
const { pickup } = useCardAnimation(cardEl, motion)
|
||||||
|
pickup()
|
||||||
|
expect(mockAnimate).toHaveBeenCalledWith(
|
||||||
|
el,
|
||||||
|
expect.objectContaining({ scale: 0.55, borderRadius: '50%' }),
|
||||||
|
expect.anything(),
|
||||||
|
)
|
||||||
|
})
|
||||||
|
|
||||||
|
it('pickup() is a no-op when motion.rich is false', () => {
|
||||||
|
const el = makeEl()
|
||||||
|
const cardEl = ref<HTMLElement | null>(el)
|
||||||
|
const motion = { rich: ref(false) }
|
||||||
|
const { pickup } = useCardAnimation(cardEl, motion)
|
||||||
|
pickup()
|
||||||
|
expect(mockAnimate).not.toHaveBeenCalled()
|
||||||
|
})
|
||||||
|
|
||||||
|
it('setDragPosition() calls utils.set with translated coords', () => {
|
||||||
|
const el = makeEl()
|
||||||
|
const cardEl = ref<HTMLElement | null>(el)
|
||||||
|
const motion = { rich: ref(true) }
|
||||||
|
const { setDragPosition } = useCardAnimation(cardEl, motion)
|
||||||
|
setDragPosition(50, 30)
|
||||||
|
expect(mockSet).toHaveBeenCalledWith(el, expect.objectContaining({ x: 50, y: -50 }))
|
||||||
|
// y = deltaY - 80 = 30 - 80 = -50
|
||||||
|
})
|
||||||
|
|
||||||
|
it('snapBack() calls animate returning to card shape', () => {
|
||||||
|
const el = makeEl()
|
||||||
|
const cardEl = ref<HTMLElement | null>(el)
|
||||||
|
const motion = { rich: ref(true) }
|
||||||
|
const { snapBack } = useCardAnimation(cardEl, motion)
|
||||||
|
snapBack()
|
||||||
|
expect(mockAnimate).toHaveBeenCalledWith(
|
||||||
|
el,
|
||||||
|
expect.objectContaining({ x: 0, y: 0, scale: 1 }),
|
||||||
|
expect.anything(),
|
||||||
|
)
|
||||||
|
})
|
||||||
|
|
||||||
|
it('animateDismiss("label") calls animate', () => {
|
||||||
|
const el = makeEl()
|
||||||
|
const cardEl = ref<HTMLElement | null>(el)
|
||||||
|
const motion = { rich: ref(true) }
|
||||||
|
const { animateDismiss } = useCardAnimation(cardEl, motion)
|
||||||
|
animateDismiss('label')
|
||||||
|
expect(mockAnimate).toHaveBeenCalled()
|
||||||
|
})
|
||||||
|
|
||||||
|
it('animateDismiss("discard") calls animate', () => {
|
||||||
|
const el = makeEl()
|
||||||
|
const cardEl = ref<HTMLElement | null>(el)
|
||||||
|
const motion = { rich: ref(true) }
|
||||||
|
const { animateDismiss } = useCardAnimation(cardEl, motion)
|
||||||
|
animateDismiss('discard')
|
||||||
|
expect(mockAnimate).toHaveBeenCalled()
|
||||||
|
})
|
||||||
|
|
||||||
|
it('animateDismiss("skip") calls animate', () => {
|
||||||
|
const el = makeEl()
|
||||||
|
const cardEl = ref<HTMLElement | null>(el)
|
||||||
|
const motion = { rich: ref(true) }
|
||||||
|
const { animateDismiss } = useCardAnimation(cardEl, motion)
|
||||||
|
animateDismiss('skip')
|
||||||
|
expect(mockAnimate).toHaveBeenCalled()
|
||||||
|
})
|
||||||
|
|
||||||
|
it('animateDismiss is a no-op when motion.rich is false', () => {
|
||||||
|
const el = makeEl()
|
||||||
|
const cardEl = ref<HTMLElement | null>(el)
|
||||||
|
const motion = { rich: ref(false) }
|
||||||
|
const { animateDismiss } = useCardAnimation(cardEl, motion)
|
||||||
|
animateDismiss('label')
|
||||||
|
expect(mockAnimate).not.toHaveBeenCalled()
|
||||||
|
})
|
||||||
|
})
|
||||||
|
```
|
||||||
|
|
||||||
|
**Step 2: Run tests to confirm they fail**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd /Library/Development/CircuitForge/avocet/web
|
||||||
|
npm test -- useCardAnimation
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected: FAIL — "Cannot find module './useCardAnimation'"
|
||||||
|
|
||||||
|
**Step 3: Implement the composable**
|
||||||
|
|
||||||
|
`web/src/composables/useCardAnimation.ts`:
|
||||||
|
```ts
|
||||||
|
import { type Ref } from 'vue'
|
||||||
|
import { animate, spring, utils } from 'animejs'
|
||||||
|
|
||||||
|
const BALL_SCALE = 0.55
|
||||||
|
const BALL_RADIUS = '50%'
|
||||||
|
const CARD_RADIUS = '1rem'
|
||||||
|
const PICKUP_Y_OFFSET = 80 // px above finger
|
||||||
|
const PICKUP_DURATION = 200
|
||||||
|
// NOTE: animejs v4 — spring() takes an object, not positional args
|
||||||
|
const SNAP_SPRING = spring({ mass: 1, stiffness: 80, damping: 10 })
|
||||||
|
|
||||||
|
interface Motion { rich: Ref<boolean> }
|
||||||
|
|
||||||
|
export function useCardAnimation(
|
||||||
|
cardEl: Ref<HTMLElement | null>,
|
||||||
|
motion: Motion,
|
||||||
|
) {
|
||||||
|
function pickup() {
|
||||||
|
if (!motion.rich.value || !cardEl.value) return
|
||||||
|
// NOTE: animejs v4 — animate() is 2-arg; timing options merge into the params object
|
||||||
|
animate(cardEl.value, {
|
||||||
|
scale: BALL_SCALE,
|
||||||
|
borderRadius: BALL_RADIUS,
|
||||||
|
y: -PICKUP_Y_OFFSET,
|
||||||
|
duration: PICKUP_DURATION,
|
||||||
|
ease: SNAP_SPRING,
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
function setDragPosition(dx: number, dy: number) {
|
||||||
|
if (!cardEl.value) return
|
||||||
|
utils.set(cardEl.value, { x: dx, y: dy - PICKUP_Y_OFFSET })
|
||||||
|
}
|
||||||
|
|
||||||
|
function snapBack() {
|
||||||
|
if (!motion.rich.value || !cardEl.value) return
|
||||||
|
// No duration — spring physics determines settling time
|
||||||
|
animate(cardEl.value, {
|
||||||
|
x: 0,
|
||||||
|
y: 0,
|
||||||
|
scale: 1,
|
||||||
|
borderRadius: CARD_RADIUS,
|
||||||
|
ease: SNAP_SPRING,
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
function animateDismiss(type: 'label' | 'skip' | 'discard') {
|
||||||
|
if (!motion.rich.value || !cardEl.value) return
|
||||||
|
const el = cardEl.value
|
||||||
|
if (type === 'label') {
|
||||||
|
animate(el, { y: '-120%', scale: 0.85, opacity: 0, duration: 280, ease: 'out(3)' })
|
||||||
|
} else if (type === 'discard') {
|
||||||
|
// Two-step: crumple then shrink (keyframes array in params object)
|
||||||
|
animate(el, { keyframes: [
|
||||||
|
{ scale: 0.95, rotate: 2, filter: 'brightness(0.6) sepia(1) hue-rotate(-20deg)', duration: 140 },
|
||||||
|
{ scale: 0, rotate: 8, opacity: 0, duration: 210 },
|
||||||
|
])
|
||||||
|
} else if (type === 'skip') {
|
||||||
|
animate(el, { x: '110%', rotate: 5, opacity: 0 }, { duration: 260, ease: 'out(2)' })
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return { pickup, setDragPosition, snapBack, animateDismiss }
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Step 4: Run tests — expect pass**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
npm test -- useCardAnimation
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected: All 8 tests PASS.
|
||||||
|
|
||||||
|
**Step 5: Commit**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git add web/src/composables/useCardAnimation.ts web/src/composables/useCardAnimation.test.ts
|
||||||
|
git commit -m "feat(avocet): add useCardAnimation composable with Anime.js"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task 3: Wire `useCardAnimation` into `EmailCardStack.vue`
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Modify: `web/src/components/EmailCardStack.vue`
|
||||||
|
- Modify: `web/src/components/EmailCardStack.test.ts`
|
||||||
|
|
||||||
|
**What changes:**
|
||||||
|
- Remove `cardStyle` computed and `:style="cardStyle"` binding
|
||||||
|
- Remove `dismissClass` computed and `:class="[dismissClass, ...]"` binding (keep `is-held`)
|
||||||
|
- Remove `deltaX`, `deltaY` reactive refs (position now owned by Anime.js)
|
||||||
|
- Call `pickup()` in `onPointerDown`, `setDragPosition()` in `onPointerMove`, `snapBack()` in `onPointerUp` (no-target path)
|
||||||
|
- Watch `props.dismissType` and call `animateDismiss()`
|
||||||
|
- Remove CSS `@keyframes fileAway`, `crumple`, `slideUnder` and their `.dismiss-*` rule blocks from `<style>`
|
||||||
|
|
||||||
|
**Step 1: Update the tests that check dismiss classes**
|
||||||
|
|
||||||
|
In `EmailCardStack.test.ts`, the 5 tests checking `.dismiss-label`, `.dismiss-discard`, `.dismiss-skip` classes are testing implementation (CSS class name), not behavior. Replace them with a single test that verifies `animateDismiss` is called:
|
||||||
|
|
||||||
|
```ts
|
||||||
|
// Add at the top of the file (after existing imports):
|
||||||
|
vi.mock('../composables/useCardAnimation', () => ({
|
||||||
|
useCardAnimation: vi.fn(() => ({
|
||||||
|
pickup: vi.fn(),
|
||||||
|
setDragPosition: vi.fn(),
|
||||||
|
snapBack: vi.fn(),
|
||||||
|
animateDismiss: vi.fn(),
|
||||||
|
})),
|
||||||
|
}))
|
||||||
|
|
||||||
|
import { useCardAnimation } from '../composables/useCardAnimation'
|
||||||
|
```
|
||||||
|
|
||||||
|
Replace the five `dismissType` class tests (lines 25–46) with:
|
||||||
|
|
||||||
|
```ts
|
||||||
|
it('calls animateDismiss with type when dismissType prop changes', async () => {
|
||||||
|
const w = mount(EmailCardStack, { props: { item, isBucketMode: false, dismissType: null } })
|
||||||
|
const { animateDismiss } = (useCardAnimation as ReturnType<typeof vi.fn>).mock.results[0].value
|
||||||
|
await w.setProps({ dismissType: 'label' })
|
||||||
|
await nextTick()
|
||||||
|
expect(animateDismiss).toHaveBeenCalledWith('label')
|
||||||
|
})
|
||||||
|
```
|
||||||
|
|
||||||
|
Add `nextTick` import to the test file header if not already present:
|
||||||
|
```ts
|
||||||
|
import { nextTick } from 'vue'
|
||||||
|
```
|
||||||
|
|
||||||
|
**Step 2: Run tests to confirm the replaced tests fail**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
npm test -- EmailCardStack
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected: FAIL — `animateDismiss` not called (not yet wired in component)
|
||||||
|
|
||||||
|
**Step 3: Modify `EmailCardStack.vue`**
|
||||||
|
|
||||||
|
Script section changes:
|
||||||
|
|
||||||
|
```ts
|
||||||
|
// Remove:
|
||||||
|
// import { ref, computed } from 'vue' → change to:
|
||||||
|
import { ref, watch } from 'vue'
|
||||||
|
|
||||||
|
// Add import:
|
||||||
|
import { useCardAnimation } from '../composables/useCardAnimation'
|
||||||
|
|
||||||
|
// Remove these refs:
|
||||||
|
// const deltaX = ref(0)
|
||||||
|
// const deltaY = ref(0)
|
||||||
|
|
||||||
|
// Add after const motion = useMotion():
|
||||||
|
const { pickup, setDragPosition, snapBack, animateDismiss } = useCardAnimation(cardEl, motion)
|
||||||
|
|
||||||
|
// Add watcher:
|
||||||
|
watch(() => props.dismissType, (type) => {
|
||||||
|
if (type) animateDismiss(type)
|
||||||
|
})
|
||||||
|
|
||||||
|
// Remove dismissClass computed entirely.
|
||||||
|
|
||||||
|
// In onPointerDown — add after isHeld.value = true:
|
||||||
|
pickup()
|
||||||
|
|
||||||
|
// In onPointerMove — replace deltaX/deltaY assignments with:
|
||||||
|
const dx = e.clientX - pickupX.value
|
||||||
|
const dy = e.clientY - pickupY.value
|
||||||
|
setDragPosition(dx, dy)
|
||||||
|
// (keep the zone/bucket detection that uses e.clientX/e.clientY — those stay the same)
|
||||||
|
|
||||||
|
// In onPointerUp — in the snap-back else branch, replace:
|
||||||
|
// deltaX.value = 0
|
||||||
|
// deltaY.value = 0
|
||||||
|
// with:
|
||||||
|
snapBack()
|
||||||
|
```
|
||||||
|
|
||||||
|
Template changes — on the `.card-wrapper` div:
|
||||||
|
```html
|
||||||
|
<!-- Remove: :class="[dismissClass, { 'is-held': isHeld }]" -->
|
||||||
|
<!-- Replace with: -->
|
||||||
|
:class="{ 'is-held': isHeld }"
|
||||||
|
<!-- Remove: :style="cardStyle" -->
|
||||||
|
```
|
||||||
|
|
||||||
|
CSS changes in `<style scoped>` — delete these entire blocks:
|
||||||
|
```
|
||||||
|
@keyframes fileAway { ... }
|
||||||
|
@keyframes crumple { ... }
|
||||||
|
@keyframes slideUnder { ... }
|
||||||
|
.card-wrapper.dismiss-label { ... }
|
||||||
|
.card-wrapper.dismiss-discard { ... }
|
||||||
|
.card-wrapper.dismiss-skip { ... }
|
||||||
|
```
|
||||||
|
|
||||||
|
Also delete `--card-dismiss` and `--card-skip` CSS var usages if present.
|
||||||
|
|
||||||
|
**Step 4: Run all tests**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
npm test
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected: All pass (both `useCardAnimation.test.ts` and `EmailCardStack.test.ts`).
|
||||||
|
|
||||||
|
**Step 5: Commit**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git add web/src/components/EmailCardStack.vue web/src/components/EmailCardStack.test.ts
|
||||||
|
git commit -m "feat(avocet): wire Anime.js card animation into EmailCardStack"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task 4: Bucket grid rise animation
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Modify: `web/src/views/LabelView.vue`
|
||||||
|
|
||||||
|
**What changes:**
|
||||||
|
Replace the CSS class-toggle animation on `.bucket-grid-footer.grid-active` with an Anime.js watch in `LabelView.vue`. The `position: sticky → fixed` switch stays as a CSS class (can't animate position), but `translateY` and `opacity` move to Anime.js.
|
||||||
|
|
||||||
|
**Step 1: Add gridEl ref and import animate**
|
||||||
|
|
||||||
|
In `LabelView.vue` `<script setup>`:
|
||||||
|
```ts
|
||||||
|
// Add to imports:
|
||||||
|
import { ref, onMounted, onUnmounted, watch } from 'vue'
|
||||||
|
import { animate, spring } from 'animejs'
|
||||||
|
|
||||||
|
// Add ref:
|
||||||
|
const gridEl = ref<HTMLElement | null>(null)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Step 2: Add watcher for isHeld**
|
||||||
|
|
||||||
|
```ts
|
||||||
|
watch(isHeld, (held) => {
|
||||||
|
if (!motion.rich.value || !gridEl.value) return
|
||||||
|
// animejs v4: 2-arg animate, spring() takes object
|
||||||
|
animate(gridEl.value,
|
||||||
|
held
|
||||||
|
? { y: -8, opacity: 0.45, ease: spring({ mass: 1, stiffness: 80, damping: 10 }), duration: 250 }
|
||||||
|
: { y: 0, opacity: 1, ease: spring({ mass: 1, stiffness: 80, damping: 10 }), duration: 250 }
|
||||||
|
)
|
||||||
|
})
|
||||||
|
```
|
||||||
|
|
||||||
|
**Step 3: Wire ref in template**
|
||||||
|
|
||||||
|
On the `.bucket-grid-footer` div:
|
||||||
|
```html
|
||||||
|
<div ref="gridEl" class="bucket-grid-footer" :class="{ 'grid-active': isHeld }">
|
||||||
|
```
|
||||||
|
|
||||||
|
**Step 4: Remove CSS transition from `.bucket-grid-footer`**
|
||||||
|
|
||||||
|
In `LabelView.vue <style scoped>`, delete the `transition:` line from `.bucket-grid-footer`:
|
||||||
|
```css
|
||||||
|
/* DELETE this line: */
|
||||||
|
transition: transform 250ms cubic-bezier(0.34, 1.56, 0.64, 1),
|
||||||
|
opacity 200ms ease,
|
||||||
|
background 200ms ease;
|
||||||
|
```
|
||||||
|
Keep the `transform: translateY(-8px)` and `opacity: 0.45` on `.bucket-grid-footer.grid-active` as fallback for reduced-motion users (no-JS fallback too).
|
||||||
|
|
||||||
|
Actually — keep `.grid-active` rules as-is for the no-motion path. The Anime.js `watch` guard (`if (!motion.rich.value)`) means reduced-motion users never hit Anime.js; the CSS class handles them.
|
||||||
|
|
||||||
|
**Step 5: Run tests**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
npm test
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected: All pass (LabelView has no dedicated tests, but full suite should be green).
|
||||||
|
|
||||||
|
**Step 6: Commit**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git add web/src/views/LabelView.vue
|
||||||
|
git commit -m "feat(avocet): animate bucket grid rise with Anime.js spring"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task 5: Badge pop animation
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Modify: `web/src/views/LabelView.vue`
|
||||||
|
|
||||||
|
**What changes:**
|
||||||
|
Replace `@keyframes badge-pop` (scale + opacity keyframe) with a Vue `<Transition>` `@enter` hook that calls `animate()`. Badges already appear/disappear via `v-if`, so they have natural mount/unmount lifecycle.
|
||||||
|
|
||||||
|
**Step 1: Wrap each badge in a `<Transition>`**
|
||||||
|
|
||||||
|
In `LabelView.vue` template, each badge `<span v-if="...">` gets wrapped:
|
||||||
|
|
||||||
|
```html
|
||||||
|
<Transition @enter="onBadgeEnter" :css="false">
|
||||||
|
<span v-if="onRoll" class="badge badge-roll">🔥 On a roll!</span>
|
||||||
|
</Transition>
|
||||||
|
<Transition @enter="onBadgeEnter" :css="false">
|
||||||
|
<span v-if="speedRound" class="badge badge-speed">⚡ Speed round!</span>
|
||||||
|
</Transition>
|
||||||
|
<!-- repeat for all 6 badges -->
|
||||||
|
```
|
||||||
|
|
||||||
|
`:css="false"` tells Vue not to apply any CSS transition classes — Anime.js owns the enter animation entirely.
|
||||||
|
|
||||||
|
**Step 2: Add `onBadgeEnter` hook**
|
||||||
|
|
||||||
|
```ts
|
||||||
|
function onBadgeEnter(el: Element, done: () => void) {
|
||||||
|
if (!motion.rich.value) { done(); return }
|
||||||
|
animate(el as HTMLElement,
|
||||||
|
{ scale: [0.6, 1], opacity: [0, 1] },
|
||||||
|
{ ease: spring(1.5, 80, 8), duration: 300, onComplete: done }
|
||||||
|
)
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Step 3: Remove `@keyframes badge-pop` from CSS**
|
||||||
|
|
||||||
|
In `LabelView.vue <style scoped>`:
|
||||||
|
```css
|
||||||
|
/* DELETE: */
|
||||||
|
@keyframes badge-pop {
|
||||||
|
from { transform: scale(0.6); opacity: 0; }
|
||||||
|
to { transform: scale(1); opacity: 1; }
|
||||||
|
}
|
||||||
|
|
||||||
|
/* DELETE animation line from .badge: */
|
||||||
|
animation: badge-pop 0.3s cubic-bezier(0.34, 1.56, 0.64, 1);
|
||||||
|
```
|
||||||
|
|
||||||
|
**Step 4: Run tests**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
npm test
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected: All pass.
|
||||||
|
|
||||||
|
**Step 5: Commit**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git add web/src/views/LabelView.vue
|
||||||
|
git commit -m "feat(avocet): badge pop via Anime.js spring transition hook"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task 6: Build and smoke test
|
||||||
|
|
||||||
|
**Step 1: Build the SPA**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd /Library/Development/CircuitForge/avocet
|
||||||
|
./manage.sh start-api
|
||||||
|
```
|
||||||
|
|
||||||
|
(This builds Vue + starts FastAPI on port 8503.)
|
||||||
|
|
||||||
|
**Step 2: Open the app**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./manage.sh open-api
|
||||||
|
```
|
||||||
|
|
||||||
|
**Step 3: Manual smoke test checklist**
|
||||||
|
|
||||||
|
- [ ] Pick up a card — ball morph is smooth (not instant jump)
|
||||||
|
- [ ] Drag ball around — follows finger with no lag
|
||||||
|
- [ ] Release in center — springs back to card with bounce
|
||||||
|
- [ ] Release in left zone — discard fires (card crumples)
|
||||||
|
- [ ] Release in right zone — skip fires (card slides right)
|
||||||
|
- [ ] Release on a bucket — label fires (card files up)
|
||||||
|
- [ ] Fling left fast — discard fires
|
||||||
|
- [ ] Bucket grid rises smoothly on pickup, falls on release
|
||||||
|
- [ ] Badge (label 10 in a row for 🔥) pops in with spring
|
||||||
|
- [ ] Reduced motion: toggle in system settings → no animations, instant behavior
|
||||||
|
- [ ] Keyboard labels (1–9) still work (pointer events unchanged)
|
||||||
|
|
||||||
|
**Step 4: Final commit if all green**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git add -A
|
||||||
|
git commit -m "feat(avocet): complete Anime.js animation integration"
|
||||||
|
```
|
||||||
1861
docs/superpowers/plans/2026-03-15-finetune-classifier.md
Normal file
1861
docs/superpowers/plans/2026-03-15-finetune-classifier.md
Normal file
File diff suppressed because it is too large
Load diff
254
docs/superpowers/specs/2026-03-15-finetune-classifier-design.md
Normal file
254
docs/superpowers/specs/2026-03-15-finetune-classifier-design.md
Normal file
|
|
@ -0,0 +1,254 @@
|
||||||
|
# Fine-tune Email Classifier — Design Spec
|
||||||
|
|
||||||
|
**Date:** 2026-03-15
|
||||||
|
**Status:** Approved
|
||||||
|
**Scope:** Avocet — `scripts/`, `app/api.py`, `web/src/views/BenchmarkView.vue`, `environment.yml`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
|
||||||
|
The benchmark baseline shows zero-shot macro-F1 of 0.366 for the best models (`deberta-zeroshot`, `deberta-base-anli`). Zero-shot inference cannot improve with more labeled data. Fine-tuning the fastest models (`deberta-small` at 111ms, `bge-m3` at 123ms) on the growing labeled dataset is the path to meaningful accuracy gains.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Constraints
|
||||||
|
|
||||||
|
- 501 labeled samples after dropping 2 non-canonical `profile_alert` rows
|
||||||
|
- Heavy class imbalance: `digest` 29%, `neutral` 26%, `new_lead` 2.6%, `survey_received` 3%
|
||||||
|
- 8.2 GB VRAM (shared with Peregrine vLLM during dev)
|
||||||
|
- Target models: `cross-encoder/nli-deberta-v3-small` (100M params), `MoritzLaurer/bge-m3-zeroshot-v2.0` (600M params)
|
||||||
|
- Output: local `models/avocet-{name}/` directory
|
||||||
|
- UI-triggerable via web interface (SSE streaming log)
|
||||||
|
- Stack: transformers 4.57.3, torch 2.10.0, accelerate 1.12.0, sklearn, CUDA 8.2GB
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Environment changes
|
||||||
|
|
||||||
|
`environment.yml` must add:
|
||||||
|
- `scikit-learn` — required for `train_test_split(stratify=...)` and `f1_score`
|
||||||
|
- `peft` is NOT used by this spec; it is available in the env but not required here
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
### New file: `scripts/finetune_classifier.py`
|
||||||
|
|
||||||
|
CLI entry point for fine-tuning. All prints use `flush=True` so stdout is SSE-streamable.
|
||||||
|
|
||||||
|
```
|
||||||
|
python scripts/finetune_classifier.py --model deberta-small [--epochs 5]
|
||||||
|
```
|
||||||
|
|
||||||
|
Supported `--model` values: `deberta-small`, `bge-m3`
|
||||||
|
|
||||||
|
**Model registry** (internal to this script):
|
||||||
|
|
||||||
|
| Key | Base model ID | Max tokens | fp16 | Batch size | Grad accum steps | Gradient checkpointing |
|
||||||
|
|-----|--------------|------------|------|------------|-----------------|----------------------|
|
||||||
|
| `deberta-small` | `cross-encoder/nli-deberta-v3-small` | 512 | No | 16 | 1 | No |
|
||||||
|
| `bge-m3` | `MoritzLaurer/bge-m3-zeroshot-v2.0` | 512 | Yes | 4 | 4 | Yes |
|
||||||
|
|
||||||
|
`bge-m3` uses `fp16=True` (halves optimizer state from ~4.8GB to ~2.4GB) with batch size 4 + gradient accumulation 4 = effective batch 16, matching `deberta-small`. These settings are required to fit within 8.2GB VRAM. Still stop Peregrine vLLM before running bge-m3 fine-tuning.
|
||||||
|
|
||||||
|
### Modified: `scripts/classifier_adapters.py`
|
||||||
|
|
||||||
|
Add `FineTunedAdapter(ClassifierAdapter)`:
|
||||||
|
- Takes `model_dir: str` (path to a `models/avocet-*/` checkpoint)
|
||||||
|
- Loads via `pipeline("text-classification", model=model_dir)`
|
||||||
|
- `classify()` input format: **`f"{subject} [SEP] {body[:400]}"`** — must match the training format exactly. Do NOT use the zero-shot adapters' `f"Subject: {subject}\n\n{body[:600]}"` format; distribution shift will degrade accuracy.
|
||||||
|
- Returns the top predicted label directly (single forward pass — no per-label NLI scoring loop)
|
||||||
|
- Expected inference speed: ~10–20ms/email vs 111–338ms for zero-shot
|
||||||
|
|
||||||
|
### Modified: `scripts/benchmark_classifier.py`
|
||||||
|
|
||||||
|
At startup, scan `models/` for subdirectories containing `training_info.json`. Register each as a dynamic entry in the model registry using `FineTunedAdapter`. Silently skips if `models/` does not exist. Existing CLI behaviour unchanged.
|
||||||
|
|
||||||
|
### Modified: `app/api.py`
|
||||||
|
|
||||||
|
Two new GET endpoints (GET required for `EventSource` compatibility):
|
||||||
|
|
||||||
|
**`GET /api/finetune/status`**
|
||||||
|
Scans `models/` for `training_info.json` files. Returns:
|
||||||
|
```json
|
||||||
|
[
|
||||||
|
{
|
||||||
|
"name": "avocet-deberta-small",
|
||||||
|
"base_model": "cross-encoder/nli-deberta-v3-small",
|
||||||
|
"val_macro_f1": 0.712,
|
||||||
|
"timestamp": "2026-03-15T12:00:00Z",
|
||||||
|
"sample_count": 401
|
||||||
|
}
|
||||||
|
]
|
||||||
|
```
|
||||||
|
Returns `[]` if no fine-tuned models exist.
|
||||||
|
|
||||||
|
**`GET /api/finetune/run?model=deberta-small&epochs=5`**
|
||||||
|
Spawns `finetune_classifier.py` via the `job-seeker-classifiers` Python binary. Streams stdout as SSE `{"type":"progress","message":"..."}` events. Emits `{"type":"complete"}` on clean exit, `{"type":"error","message":"..."}` on non-zero exit. Same implementation pattern as `/api/benchmark/run`.
|
||||||
|
|
||||||
|
### Modified: `web/src/views/BenchmarkView.vue`
|
||||||
|
|
||||||
|
**Trained models badge row** (top of view, conditional on fine-tuned models existing):
|
||||||
|
Shows each fine-tuned model name + val macro-F1 chip. Fetches from `/api/finetune/status` on mount.
|
||||||
|
|
||||||
|
**Fine-tune section** (collapsible, below benchmark charts):
|
||||||
|
- Dropdown: `deberta-small` | `bge-m3`
|
||||||
|
- Number input: epochs (default 5, range 1–20)
|
||||||
|
- Run button → streams into existing log component
|
||||||
|
- On `complete`: auto-triggers `/api/benchmark/run` (with `--save`) so charts update immediately
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Training Pipeline
|
||||||
|
|
||||||
|
### Data preparation
|
||||||
|
|
||||||
|
1. Load `data/email_score.jsonl`
|
||||||
|
2. Drop rows where `label` not in canonical `LABELS` (removes `profile_alert` etc.)
|
||||||
|
3. Check for classes with < 2 **total** samples (before any split). Drop those classes and warn. Additionally warn — but do not skip — classes with < 5 training samples, noting eval F1 for those classes will be unreliable.
|
||||||
|
4. Input text: `f"{subject} [SEP] {body[:400]}"` — fits within 512 tokens for both target models
|
||||||
|
5. Stratified 80/20 train/val split via `sklearn.model_selection.train_test_split(stratify=labels)`
|
||||||
|
|
||||||
|
### Class weighting
|
||||||
|
|
||||||
|
Compute per-class weights: `total_samples / (n_classes × class_count)`. Pass to a `WeightedTrainer` subclass:
|
||||||
|
|
||||||
|
```python
|
||||||
|
class WeightedTrainer(Trainer):
|
||||||
|
def compute_loss(self, model, inputs, return_outputs=False, **kwargs):
|
||||||
|
# **kwargs is required — absorbs num_items_in_batch added in Transformers 4.38.
|
||||||
|
# Do not remove it; removing it causes TypeError on the first training step.
|
||||||
|
labels = inputs.pop("labels")
|
||||||
|
outputs = model(**inputs)
|
||||||
|
# Move class_weights to the same device as logits — required for GPU training.
|
||||||
|
# class_weights is created on CPU; logits are on cuda:0 during training.
|
||||||
|
weight = self.class_weights.to(outputs.logits.device)
|
||||||
|
loss = F.cross_entropy(outputs.logits, labels, weight=weight)
|
||||||
|
return (loss, outputs) if return_outputs else loss
|
||||||
|
```
|
||||||
|
|
||||||
|
### Model setup
|
||||||
|
|
||||||
|
```python
|
||||||
|
AutoModelForSequenceClassification.from_pretrained(
|
||||||
|
base_model_id,
|
||||||
|
num_labels=10,
|
||||||
|
ignore_mismatched_sizes=True, # see note below
|
||||||
|
id2label=id2label,
|
||||||
|
label2id=label2id,
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Note on `ignore_mismatched_sizes=True`:** The pretrained NLI head is a 3-class linear projection. It mismatches the 10-class head constructed by `num_labels=10`, so its weights are skipped during loading. PyTorch initializes the new head from scratch using the model's default init scheme. The backbone weights load normally. Do not set this to `False` — it will raise a shape error.
|
||||||
|
|
||||||
|
### Training config and `compute_metrics`
|
||||||
|
|
||||||
|
The Trainer requires a `compute_metrics` callback that takes an `EvalPrediction` (logits + label_ids) and returns a dict with a `macro_f1` key. This is distinct from the existing `compute_metrics` in `classifier_adapters.py` (which operates on string predictions):
|
||||||
|
|
||||||
|
```python
|
||||||
|
def compute_metrics_for_trainer(eval_pred: EvalPrediction) -> dict:
|
||||||
|
logits, labels = eval_pred
|
||||||
|
preds = logits.argmax(axis=-1)
|
||||||
|
return {
|
||||||
|
"macro_f1": f1_score(labels, preds, average="macro", zero_division=0),
|
||||||
|
"accuracy": accuracy_score(labels, preds),
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
`TrainingArguments` must include:
|
||||||
|
- `load_best_model_at_end=True`
|
||||||
|
- `metric_for_best_model="macro_f1"`
|
||||||
|
- `greater_is_better=True`
|
||||||
|
|
||||||
|
These are required for `EarlyStoppingCallback` to work correctly. Without `load_best_model_at_end=True`, `EarlyStoppingCallback` raises `AssertionError` on init.
|
||||||
|
|
||||||
|
| Hyperparameter | deberta-small | bge-m3 |
|
||||||
|
|---------------|--------------|--------|
|
||||||
|
| Epochs | 5 (default, CLI-overridable) | 5 |
|
||||||
|
| Batch size | 16 | 4 |
|
||||||
|
| Gradient accumulation | 1 | 4 (effective batch = 16) |
|
||||||
|
| Learning rate | 2e-5 | 2e-5 |
|
||||||
|
| LR schedule | Linear with 10% warmup | same |
|
||||||
|
| Optimizer | AdamW | AdamW |
|
||||||
|
| fp16 | No | Yes |
|
||||||
|
| Gradient checkpointing | No | Yes |
|
||||||
|
| Eval strategy | Every epoch | Every epoch |
|
||||||
|
| Best checkpoint | By `macro_f1` | same |
|
||||||
|
| Early stopping patience | 3 epochs | 3 epochs |
|
||||||
|
|
||||||
|
### Output
|
||||||
|
|
||||||
|
Saved to `models/avocet-{name}/`:
|
||||||
|
- Model weights + tokenizer (standard HuggingFace format)
|
||||||
|
- `training_info.json`:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"name": "avocet-deberta-small",
|
||||||
|
"base_model_id": "cross-encoder/nli-deberta-v3-small",
|
||||||
|
"timestamp": "2026-03-15T12:00:00Z",
|
||||||
|
"epochs_run": 5,
|
||||||
|
"val_macro_f1": 0.712,
|
||||||
|
"val_accuracy": 0.798,
|
||||||
|
"sample_count": 401,
|
||||||
|
"label_counts": { "digest": 116, "neutral": 104, ... }
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Data Flow
|
||||||
|
|
||||||
|
```
|
||||||
|
email_score.jsonl
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
finetune_classifier.py
|
||||||
|
├── drop non-canonical labels
|
||||||
|
├── check for < 2 total samples per class (drop + warn)
|
||||||
|
├── stratified 80/20 split
|
||||||
|
├── tokenize (subject [SEP] body[:400])
|
||||||
|
├── compute class weights
|
||||||
|
├── WeightedTrainer + EarlyStoppingCallback
|
||||||
|
└── save → models/avocet-{name}/
|
||||||
|
│
|
||||||
|
├── FineTunedAdapter (classifier_adapters.py)
|
||||||
|
│ ├── pipeline("text-classification")
|
||||||
|
│ ├── input: subject [SEP] body[:400] ← must match training format
|
||||||
|
│ └── ~10–20ms/email inference
|
||||||
|
│
|
||||||
|
└── training_info.json
|
||||||
|
└── /api/finetune/status
|
||||||
|
└── BenchmarkView badge row
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Error Handling
|
||||||
|
|
||||||
|
- **Insufficient data (< 2 total samples in a class):** Drop class before split, print warning with class name and count.
|
||||||
|
- **Low data warning (< 5 training samples in a class):** Warn but continue; note eval F1 for that class will be unreliable.
|
||||||
|
- **VRAM OOM on bge-m3:** Surface as clear SSE error message. Suggest stopping Peregrine vLLM first (it holds ~5.7GB).
|
||||||
|
- **Missing score file:** Raise `FileNotFoundError` with actionable message (same pattern as `load_scoring_jsonl`).
|
||||||
|
- **Model dir already exists:** Overwrite with a warning log line. Re-running always produces a fresh checkpoint.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Testing
|
||||||
|
|
||||||
|
- Unit test `WeightedTrainer.compute_loss` with a mock model and known label distribution — verify weighted loss differs from unweighted; verify `**kwargs` does not raise `TypeError`
|
||||||
|
- Unit test `compute_metrics_for_trainer` — verify `macro_f1` key in output, correct value on known inputs
|
||||||
|
- Unit test `FineTunedAdapter.classify` with a mock pipeline — verify it returns a string from `LABELS` using `subject [SEP] body[:400]` format
|
||||||
|
- Unit test auto-discovery in `benchmark_classifier.py` — mock `models/` dir with two `training_info.json` files, verify both appear in the active registry
|
||||||
|
- Integration test: fine-tune on `data/email_score.jsonl.example` (8 samples, 5 of 10 labels represented, 1 epoch, `--model deberta-small`). The 5 missing labels trigger the `< 2 total samples` drop path — the test must verify the drop warning is emitted for each missing label rather than treating it as a failure. Verify `models/avocet-deberta-small/training_info.json` is written with correct keys.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Out of Scope
|
||||||
|
|
||||||
|
- Pushing fine-tuned weights to HuggingFace Hub (future)
|
||||||
|
- Cross-validation or k-fold evaluation (future — dataset too small to be meaningful now)
|
||||||
|
- Hyperparameter search (future)
|
||||||
|
- LoRA/PEFT adapter fine-tuning (future — relevant if model sizes grow beyond available VRAM)
|
||||||
|
- Fine-tuning models other than `deberta-small` and `bge-m3`
|
||||||
234
manage.sh
234
manage.sh
|
|
@ -19,8 +19,9 @@ LOG_FILE="${LOG_DIR}/label_tool.log"
|
||||||
DEFAULT_PORT=8503
|
DEFAULT_PORT=8503
|
||||||
|
|
||||||
CONDA_BASE="${CONDA_BASE:-/devl/miniconda3}"
|
CONDA_BASE="${CONDA_BASE:-/devl/miniconda3}"
|
||||||
ENV_UI="${AVOCET_ENV:-cf}"
|
ENV_UI="job-seeker"
|
||||||
ENV_BM="job-seeker-classifiers"
|
ENV_BM="job-seeker-classifiers"
|
||||||
|
STREAMLIT="${CONDA_BASE}/envs/${ENV_UI}/bin/streamlit"
|
||||||
PYTHON_BM="${CONDA_BASE}/envs/${ENV_BM}/bin/python"
|
PYTHON_BM="${CONDA_BASE}/envs/${ENV_BM}/bin/python"
|
||||||
PYTHON_UI="${CONDA_BASE}/envs/${ENV_UI}/bin/python"
|
PYTHON_UI="${CONDA_BASE}/envs/${ENV_UI}/bin/python"
|
||||||
|
|
||||||
|
|
@ -78,11 +79,13 @@ usage() {
|
||||||
echo ""
|
echo ""
|
||||||
echo " Usage: ./manage.sh <command> [args]"
|
echo " Usage: ./manage.sh <command> [args]"
|
||||||
echo ""
|
echo ""
|
||||||
echo " Vue UI + FastAPI:"
|
echo " Label tool:"
|
||||||
echo -e " ${GREEN}start${NC} Build Vue SPA + start FastAPI on port 8503"
|
echo -e " ${GREEN}start${NC} Start label tool UI (port collision-safe)"
|
||||||
echo -e " ${GREEN}stop${NC} Stop FastAPI server"
|
echo -e " ${GREEN}stop${NC} Stop label tool UI"
|
||||||
echo -e " ${GREEN}restart${NC} Stop + rebuild + restart FastAPI server"
|
echo -e " ${GREEN}restart${NC} Restart label tool UI"
|
||||||
echo -e " ${GREEN}open${NC} Open Vue UI in browser (http://localhost:8503)"
|
echo -e " ${GREEN}status${NC} Show running state and port"
|
||||||
|
echo -e " ${GREEN}logs${NC} Tail label tool log output"
|
||||||
|
echo -e " ${GREEN}open${NC} Open label tool in browser"
|
||||||
echo ""
|
echo ""
|
||||||
echo " Benchmark:"
|
echo " Benchmark:"
|
||||||
echo -e " ${GREEN}benchmark [args]${NC} Run benchmark_classifier.py (args passed through)"
|
echo -e " ${GREEN}benchmark [args]${NC} Run benchmark_classifier.py (args passed through)"
|
||||||
|
|
@ -90,8 +93,13 @@ usage() {
|
||||||
echo -e " ${GREEN}score [args]${NC} Shortcut: --score [args]"
|
echo -e " ${GREEN}score [args]${NC} Shortcut: --score [args]"
|
||||||
echo -e " ${GREEN}compare [args]${NC} Shortcut: --compare [args]"
|
echo -e " ${GREEN}compare [args]${NC} Shortcut: --compare [args]"
|
||||||
echo ""
|
echo ""
|
||||||
|
echo " Vue API:"
|
||||||
|
echo -e " ${GREEN}start-api${NC} Build Vue SPA + start FastAPI on port 8503"
|
||||||
|
echo -e " ${GREEN}stop-api${NC} Stop FastAPI server"
|
||||||
|
echo -e " ${GREEN}restart-api${NC} Stop + rebuild + restart FastAPI server"
|
||||||
|
echo -e " ${GREEN}open-api${NC} Open Vue UI in browser (http://localhost:8503)"
|
||||||
|
echo ""
|
||||||
echo " Dev:"
|
echo " Dev:"
|
||||||
echo -e " ${GREEN}dev${NC} Hot-reload: uvicorn --reload (:8503) + Vite HMR (:5173)"
|
|
||||||
echo -e " ${GREEN}test${NC} Run pytest suite"
|
echo -e " ${GREEN}test${NC} Run pytest suite"
|
||||||
echo ""
|
echo ""
|
||||||
echo " Port defaults to ${DEFAULT_PORT}; auto-increments if occupied."
|
echo " Port defaults to ${DEFAULT_PORT}; auto-increments if occupied."
|
||||||
|
|
@ -113,102 +121,102 @@ shift || true
|
||||||
case "$CMD" in
|
case "$CMD" in
|
||||||
|
|
||||||
start)
|
start)
|
||||||
API_PID_FILE=".avocet-api.pid"
|
pid=$(_running_pid)
|
||||||
API_PORT=8503
|
if [[ -n "$pid" ]]; then
|
||||||
if [[ -f "$API_PID_FILE" ]] && kill -0 "$(<"$API_PID_FILE")" 2>/dev/null; then
|
port=$(_running_port)
|
||||||
warn "API already running (PID $(<"$API_PID_FILE")) → http://localhost:${API_PORT}"
|
warn "Already running (PID ${pid}) on port ${port} → http://localhost:${port}"
|
||||||
exit 0
|
exit 0
|
||||||
fi
|
fi
|
||||||
|
|
||||||
|
if [[ ! -x "$STREAMLIT" ]]; then
|
||||||
|
error "Streamlit not found at ${STREAMLIT}\nActivate env: conda run -n ${ENV_UI} ..."
|
||||||
|
fi
|
||||||
|
|
||||||
|
port=$(_find_free_port "$DEFAULT_PORT")
|
||||||
mkdir -p "$LOG_DIR"
|
mkdir -p "$LOG_DIR"
|
||||||
API_LOG="${LOG_DIR}/api.log"
|
|
||||||
info "Building Vue SPA…"
|
info "Starting label tool on port ${port}…"
|
||||||
(cd web && npm run build) >> "$API_LOG" 2>&1
|
nohup "$STREAMLIT" run app/label_tool.py \
|
||||||
info "Starting FastAPI on port ${API_PORT}…"
|
--server.port "$port" \
|
||||||
nohup "$PYTHON_UI" -m uvicorn app.api:app \
|
--server.headless true \
|
||||||
--host 0.0.0.0 --port "$API_PORT" \
|
--server.fileWatcherType none \
|
||||||
>> "$API_LOG" 2>&1 &
|
>"$LOG_FILE" 2>&1 &
|
||||||
echo $! > "$API_PID_FILE"
|
|
||||||
# Poll until port is actually bound (up to 10 s), not just process alive
|
pid=$!
|
||||||
for _i in $(seq 1 20); do
|
echo "$pid" > "$PID_FILE"
|
||||||
sleep 0.5
|
echo "$port" > "$PORT_FILE"
|
||||||
if (echo "" >/dev/tcp/127.0.0.1/"$API_PORT") 2>/dev/null; then
|
|
||||||
success "Avocet started → http://localhost:${API_PORT} (PID $(<"$API_PID_FILE"))"
|
# Wait briefly and confirm the process survived
|
||||||
break
|
sleep 1
|
||||||
fi
|
if kill -0 "$pid" 2>/dev/null; then
|
||||||
if ! kill -0 "$(<"$API_PID_FILE")" 2>/dev/null; then
|
success "Avocet label tool started → http://localhost:${port} (PID ${pid})"
|
||||||
rm -f "$API_PID_FILE"
|
success "Logs: ${LOG_FILE}"
|
||||||
error "Server died during startup. Check ${API_LOG}"
|
else
|
||||||
fi
|
rm -f "$PID_FILE" "$PORT_FILE"
|
||||||
done
|
error "Process died immediately. Check ${LOG_FILE} for details."
|
||||||
if ! (echo "" >/dev/tcp/127.0.0.1/"$API_PORT") 2>/dev/null; then
|
|
||||||
error "Server did not bind to port ${API_PORT} within 10 s. Check ${API_LOG}"
|
|
||||||
fi
|
fi
|
||||||
;;
|
;;
|
||||||
|
|
||||||
stop)
|
stop)
|
||||||
API_PID_FILE=".avocet-api.pid"
|
pid=$(_running_pid)
|
||||||
if [[ ! -f "$API_PID_FILE" ]]; then
|
if [[ -z "$pid" ]]; then
|
||||||
warn "Not running."
|
warn "Not running."
|
||||||
exit 0
|
exit 0
|
||||||
fi
|
fi
|
||||||
PID="$(<"$API_PID_FILE")"
|
info "Stopping label tool (PID ${pid})…"
|
||||||
if kill -0 "$PID" 2>/dev/null; then
|
kill "$pid"
|
||||||
kill "$PID" && rm -f "$API_PID_FILE"
|
# Wait up to 5 s for clean exit
|
||||||
success "Stopped (PID ${PID})."
|
for _ in $(seq 1 10); do
|
||||||
else
|
kill -0 "$pid" 2>/dev/null || break
|
||||||
warn "Stale PID file (process ${PID} not running). Cleaning up."
|
sleep 0.5
|
||||||
rm -f "$API_PID_FILE"
|
done
|
||||||
|
if kill -0 "$pid" 2>/dev/null; then
|
||||||
|
warn "Process did not exit cleanly; sending SIGKILL…"
|
||||||
|
kill -9 "$pid" 2>/dev/null || true
|
||||||
fi
|
fi
|
||||||
|
rm -f "$PID_FILE" "$PORT_FILE"
|
||||||
|
success "Stopped."
|
||||||
;;
|
;;
|
||||||
|
|
||||||
restart)
|
restart)
|
||||||
bash "$0" stop
|
pid=$(_running_pid)
|
||||||
|
if [[ -n "$pid" ]]; then
|
||||||
|
info "Stopping existing process (PID ${pid})…"
|
||||||
|
kill "$pid"
|
||||||
|
for _ in $(seq 1 10); do
|
||||||
|
kill -0 "$pid" 2>/dev/null || break
|
||||||
|
sleep 0.5
|
||||||
|
done
|
||||||
|
kill -0 "$pid" 2>/dev/null && kill -9 "$pid" 2>/dev/null || true
|
||||||
|
rm -f "$PID_FILE" "$PORT_FILE"
|
||||||
|
fi
|
||||||
exec bash "$0" start
|
exec bash "$0" start
|
||||||
;;
|
;;
|
||||||
|
|
||||||
dev)
|
status)
|
||||||
API_PORT=8503
|
pid=$(_running_pid)
|
||||||
VITE_PORT=5173
|
if [[ -n "$pid" ]]; then
|
||||||
DEV_API_PID_FILE=".avocet-dev-api.pid"
|
port=$(_running_port)
|
||||||
mkdir -p "$LOG_DIR"
|
success "Running — PID ${pid} port ${port} → http://localhost:${port}"
|
||||||
DEV_API_LOG="${LOG_DIR}/dev-api.log"
|
|
||||||
|
|
||||||
if [[ -f "$DEV_API_PID_FILE" ]] && kill -0 "$(<"$DEV_API_PID_FILE")" 2>/dev/null; then
|
|
||||||
warn "Dev API already running (PID $(<"$DEV_API_PID_FILE"))"
|
|
||||||
else
|
else
|
||||||
info "Starting uvicorn with --reload on port ${API_PORT}…"
|
warn "Not running."
|
||||||
nohup "$PYTHON_UI" -m uvicorn app.api:app \
|
|
||||||
--host 0.0.0.0 --port "$API_PORT" --reload \
|
|
||||||
>> "$DEV_API_LOG" 2>&1 &
|
|
||||||
echo $! > "$DEV_API_PID_FILE"
|
|
||||||
# Wait for API to bind
|
|
||||||
for _i in $(seq 1 20); do
|
|
||||||
sleep 0.5
|
|
||||||
(echo "" >/dev/tcp/127.0.0.1/"$API_PORT") 2>/dev/null && break
|
|
||||||
if ! kill -0 "$(<"$DEV_API_PID_FILE")" 2>/dev/null; then
|
|
||||||
rm -f "$DEV_API_PID_FILE"
|
|
||||||
error "Dev API died during startup. Check ${DEV_API_LOG}"
|
|
||||||
fi
|
fi
|
||||||
done
|
;;
|
||||||
success "API (hot-reload) → http://localhost:${API_PORT}"
|
|
||||||
|
logs)
|
||||||
|
if [[ ! -f "$LOG_FILE" ]]; then
|
||||||
|
warn "No log file found at ${LOG_FILE}. Has the tool been started?"
|
||||||
|
exit 0
|
||||||
fi
|
fi
|
||||||
|
info "Tailing ${LOG_FILE} (Ctrl-C to stop)"
|
||||||
# Kill API on exit (Ctrl+C or Vite exits)
|
tail -f "$LOG_FILE"
|
||||||
_cleanup_dev() {
|
|
||||||
local pid
|
|
||||||
pid=$(<"$DEV_API_PID_FILE" 2>/dev/null || true)
|
|
||||||
[[ -n "$pid" ]] && kill "$pid" 2>/dev/null && rm -f "$DEV_API_PID_FILE"
|
|
||||||
info "Dev servers stopped."
|
|
||||||
}
|
|
||||||
trap _cleanup_dev EXIT INT TERM
|
|
||||||
|
|
||||||
info "Starting Vite HMR on port ${VITE_PORT} (proxy /api → :${API_PORT})…"
|
|
||||||
success "Frontend (HMR) → http://localhost:${VITE_PORT}"
|
|
||||||
(cd web && npm run dev -- --host 0.0.0.0 --port "$VITE_PORT")
|
|
||||||
;;
|
;;
|
||||||
|
|
||||||
open)
|
open)
|
||||||
URL="http://localhost:8503"
|
port=$(_running_port)
|
||||||
|
pid=$(_running_pid)
|
||||||
|
[[ -z "$pid" ]] && warn "Label tool does not appear to be running. Start with: ./manage.sh start"
|
||||||
|
URL="http://localhost:${port}"
|
||||||
info "Opening ${URL}"
|
info "Opening ${URL}"
|
||||||
if command -v xdg-open &>/dev/null; then
|
if command -v xdg-open &>/dev/null; then
|
||||||
xdg-open "$URL"
|
xdg-open "$URL"
|
||||||
|
|
@ -249,6 +257,72 @@ case "$CMD" in
|
||||||
exec "$0" benchmark --compare "$@"
|
exec "$0" benchmark --compare "$@"
|
||||||
;;
|
;;
|
||||||
|
|
||||||
|
start-api)
|
||||||
|
API_PID_FILE=".avocet-api.pid"
|
||||||
|
API_PORT=8503
|
||||||
|
if [[ -f "$API_PID_FILE" ]] && kill -0 "$(<"$API_PID_FILE")" 2>/dev/null; then
|
||||||
|
warn "API already running (PID $(<"$API_PID_FILE")) → http://localhost:${API_PORT}"
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
mkdir -p "$LOG_DIR"
|
||||||
|
API_LOG="${LOG_DIR}/api.log"
|
||||||
|
info "Building Vue SPA…"
|
||||||
|
(cd web && npm run build) >> "$API_LOG" 2>&1
|
||||||
|
info "Starting FastAPI on port ${API_PORT}…"
|
||||||
|
nohup "$PYTHON_UI" -m uvicorn app.api:app \
|
||||||
|
--host 0.0.0.0 --port "$API_PORT" \
|
||||||
|
>> "$API_LOG" 2>&1 &
|
||||||
|
echo $! > "$API_PID_FILE"
|
||||||
|
# Poll until port is actually bound (up to 10 s), not just process alive
|
||||||
|
for _i in $(seq 1 20); do
|
||||||
|
sleep 0.5
|
||||||
|
if (echo "" >/dev/tcp/127.0.0.1/"$API_PORT") 2>/dev/null; then
|
||||||
|
success "Avocet API started → http://localhost:${API_PORT} (PID $(<"$API_PID_FILE"))"
|
||||||
|
break
|
||||||
|
fi
|
||||||
|
if ! kill -0 "$(<"$API_PID_FILE")" 2>/dev/null; then
|
||||||
|
rm -f "$API_PID_FILE"
|
||||||
|
error "API died during startup. Check ${API_LOG}"
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
if ! (echo "" >/dev/tcp/127.0.0.1/"$API_PORT") 2>/dev/null; then
|
||||||
|
error "API did not bind to port ${API_PORT} within 10 s. Check ${API_LOG}"
|
||||||
|
fi
|
||||||
|
;;
|
||||||
|
|
||||||
|
stop-api)
|
||||||
|
API_PID_FILE=".avocet-api.pid"
|
||||||
|
if [[ ! -f "$API_PID_FILE" ]]; then
|
||||||
|
warn "API not running."
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
PID="$(<"$API_PID_FILE")"
|
||||||
|
if kill -0 "$PID" 2>/dev/null; then
|
||||||
|
kill "$PID" && rm -f "$API_PID_FILE"
|
||||||
|
success "API stopped (PID ${PID})."
|
||||||
|
else
|
||||||
|
warn "Stale PID file (process ${PID} not running). Cleaning up."
|
||||||
|
rm -f "$API_PID_FILE"
|
||||||
|
fi
|
||||||
|
;;
|
||||||
|
|
||||||
|
restart-api)
|
||||||
|
bash "$0" stop-api
|
||||||
|
exec bash "$0" start-api
|
||||||
|
;;
|
||||||
|
|
||||||
|
open-api)
|
||||||
|
URL="http://localhost:8503"
|
||||||
|
info "Opening ${URL}"
|
||||||
|
if command -v xdg-open &>/dev/null; then
|
||||||
|
xdg-open "$URL"
|
||||||
|
elif command -v open &>/dev/null; then
|
||||||
|
open "$URL"
|
||||||
|
else
|
||||||
|
echo "$URL"
|
||||||
|
fi
|
||||||
|
;;
|
||||||
|
|
||||||
help|--help|-h)
|
help|--help|-h)
|
||||||
usage
|
usage
|
||||||
;;
|
;;
|
||||||
|
|
|
||||||
|
|
@ -1,110 +0,0 @@
|
||||||
"""Avocet — SFT candidate run discovery and JSONL import.
|
|
||||||
|
|
||||||
No FastAPI dependency — pure Python file operations.
|
|
||||||
Used by app/sft.py endpoints and can be run standalone.
|
|
||||||
"""
|
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
import json
|
|
||||||
import logging
|
|
||||||
from pathlib import Path
|
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
|
||||||
|
|
||||||
_CANDIDATES_FILENAME = "sft_candidates.jsonl"
|
|
||||||
|
|
||||||
|
|
||||||
def discover_runs(bench_results_dir: Path) -> list[dict]:
|
|
||||||
"""Return one entry per run subdirectory that contains sft_candidates.jsonl.
|
|
||||||
|
|
||||||
Sorted newest-first by directory name (directories are named YYYY-MM-DD-HHMMSS
|
|
||||||
by the cf-orch benchmark harness, so lexicographic order is chronological).
|
|
||||||
|
|
||||||
Each entry: {run_id, timestamp, candidate_count, sft_path}
|
|
||||||
"""
|
|
||||||
if not bench_results_dir.exists() or not bench_results_dir.is_dir():
|
|
||||||
return []
|
|
||||||
runs = []
|
|
||||||
for subdir in bench_results_dir.iterdir():
|
|
||||||
if not subdir.is_dir():
|
|
||||||
continue
|
|
||||||
sft_path = subdir / _CANDIDATES_FILENAME
|
|
||||||
if not sft_path.exists():
|
|
||||||
continue
|
|
||||||
records = _read_jsonl(sft_path)
|
|
||||||
runs.append({
|
|
||||||
"run_id": subdir.name,
|
|
||||||
"timestamp": subdir.name,
|
|
||||||
"candidate_count": len(records),
|
|
||||||
"sft_path": sft_path,
|
|
||||||
})
|
|
||||||
runs.sort(key=lambda r: r["run_id"], reverse=True)
|
|
||||||
return runs
|
|
||||||
|
|
||||||
|
|
||||||
def import_run(sft_path: Path, data_dir: Path) -> dict[str, int]:
|
|
||||||
"""Append records from sft_path into data_dir/sft_candidates.jsonl.
|
|
||||||
|
|
||||||
Deduplicates on the `id` field — records whose id already exists in the
|
|
||||||
destination file are skipped silently. Records missing an `id` field are
|
|
||||||
also skipped (malformed input from a partial benchmark write).
|
|
||||||
|
|
||||||
Returns {imported: N, skipped: M}.
|
|
||||||
"""
|
|
||||||
dest = data_dir / _CANDIDATES_FILENAME
|
|
||||||
existing_ids = _read_existing_ids(dest)
|
|
||||||
|
|
||||||
new_records: list[dict] = []
|
|
||||||
skipped = 0
|
|
||||||
for record in _read_jsonl(sft_path):
|
|
||||||
if "id" not in record:
|
|
||||||
logger.warning("Skipping record missing 'id' field in %s", sft_path)
|
|
||||||
continue # malformed — skip without crashing
|
|
||||||
if record["id"] in existing_ids:
|
|
||||||
skipped += 1
|
|
||||||
continue
|
|
||||||
new_records.append(record)
|
|
||||||
existing_ids.add(record["id"])
|
|
||||||
|
|
||||||
if new_records:
|
|
||||||
dest.parent.mkdir(parents=True, exist_ok=True)
|
|
||||||
with open(dest, "a", encoding="utf-8") as fh:
|
|
||||||
for r in new_records:
|
|
||||||
fh.write(json.dumps(r) + "\n")
|
|
||||||
|
|
||||||
return {"imported": len(new_records), "skipped": skipped}
|
|
||||||
|
|
||||||
|
|
||||||
def _read_jsonl(path: Path) -> list[dict]:
|
|
||||||
"""Read a JSONL file, returning valid records. Skips blank lines and malformed JSON."""
|
|
||||||
if not path.exists():
|
|
||||||
return []
|
|
||||||
records: list[dict] = []
|
|
||||||
for line in path.read_text(encoding="utf-8").splitlines():
|
|
||||||
line = line.strip()
|
|
||||||
if not line:
|
|
||||||
continue
|
|
||||||
try:
|
|
||||||
records.append(json.loads(line))
|
|
||||||
except json.JSONDecodeError as exc:
|
|
||||||
logger.warning("Skipping malformed JSON line in %s: %s", path, exc)
|
|
||||||
return records
|
|
||||||
|
|
||||||
|
|
||||||
def _read_existing_ids(path: Path) -> set[str]:
|
|
||||||
"""Read only the id field from each line of a JSONL file."""
|
|
||||||
if not path.exists():
|
|
||||||
return set()
|
|
||||||
ids: set[str] = set()
|
|
||||||
with path.open() as f:
|
|
||||||
for line in f:
|
|
||||||
line = line.strip()
|
|
||||||
if not line:
|
|
||||||
continue
|
|
||||||
try:
|
|
||||||
record = json.loads(line)
|
|
||||||
if "id" in record:
|
|
||||||
ids.add(record["id"])
|
|
||||||
except json.JSONDecodeError:
|
|
||||||
pass # corrupt line, skip silently (ids file is our own output)
|
|
||||||
return ids
|
|
||||||
|
|
@ -5,83 +5,83 @@ These functions are stdlib-only and safe to test without an IMAP connection.
|
||||||
from email.mime.multipart import MIMEMultipart
|
from email.mime.multipart import MIMEMultipart
|
||||||
from email.mime.text import MIMEText
|
from email.mime.text import MIMEText
|
||||||
|
|
||||||
from app.utils import extract_body, strip_html
|
from app.label_tool import _extract_body, _strip_html
|
||||||
|
|
||||||
|
|
||||||
# ── strip_html ──────────────────────────────────────────────────────────────
|
# ── _strip_html ──────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
def test_strip_html_removes_tags():
|
def test_strip_html_removes_tags():
|
||||||
assert strip_html("<p>Hello <b>world</b></p>") == "Hello world"
|
assert _strip_html("<p>Hello <b>world</b></p>") == "Hello world"
|
||||||
|
|
||||||
|
|
||||||
def test_strip_html_skips_script_content():
|
def test_strip_html_skips_script_content():
|
||||||
result = strip_html("<script>doEvil()</script><p>real</p>")
|
result = _strip_html("<script>doEvil()</script><p>real</p>")
|
||||||
assert "doEvil" not in result
|
assert "doEvil" not in result
|
||||||
assert "real" in result
|
assert "real" in result
|
||||||
|
|
||||||
|
|
||||||
def test_strip_html_skips_style_content():
|
def test_strip_html_skips_style_content():
|
||||||
result = strip_html("<style>.foo{color:red}</style><p>visible</p>")
|
result = _strip_html("<style>.foo{color:red}</style><p>visible</p>")
|
||||||
assert ".foo" not in result
|
assert ".foo" not in result
|
||||||
assert "visible" in result
|
assert "visible" in result
|
||||||
|
|
||||||
|
|
||||||
def test_strip_html_handles_br_as_newline():
|
def test_strip_html_handles_br_as_newline():
|
||||||
result = strip_html("line1<br>line2")
|
result = _strip_html("line1<br>line2")
|
||||||
assert "line1" in result
|
assert "line1" in result
|
||||||
assert "line2" in result
|
assert "line2" in result
|
||||||
|
|
||||||
|
|
||||||
def test_strip_html_decodes_entities():
|
def test_strip_html_decodes_entities():
|
||||||
# convert_charrefs=True on HTMLParser handles & etc.
|
# convert_charrefs=True on HTMLParser handles & etc.
|
||||||
result = strip_html("<p>Hello & welcome</p>")
|
result = _strip_html("<p>Hello & welcome</p>")
|
||||||
assert "&" not in result
|
assert "&" not in result
|
||||||
assert "Hello" in result
|
assert "Hello" in result
|
||||||
assert "welcome" in result
|
assert "welcome" in result
|
||||||
|
|
||||||
|
|
||||||
def test_strip_html_empty_string():
|
def test_strip_html_empty_string():
|
||||||
assert strip_html("") == ""
|
assert _strip_html("") == ""
|
||||||
|
|
||||||
|
|
||||||
def test_strip_html_plain_text_passthrough():
|
def test_strip_html_plain_text_passthrough():
|
||||||
assert strip_html("no tags here") == "no tags here"
|
assert _strip_html("no tags here") == "no tags here"
|
||||||
|
|
||||||
|
|
||||||
# ── extract_body ────────────────────────────────────────────────────────────
|
# ── _extract_body ────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
def test_extract_body_prefers_plain_over_html():
|
def test_extract_body_prefers_plain_over_html():
|
||||||
msg = MIMEMultipart("alternative")
|
msg = MIMEMultipart("alternative")
|
||||||
msg.attach(MIMEText("plain body", "plain"))
|
msg.attach(MIMEText("plain body", "plain"))
|
||||||
msg.attach(MIMEText("<html><body>html body</body></html>", "html"))
|
msg.attach(MIMEText("<html><body>html body</body></html>", "html"))
|
||||||
assert extract_body(msg) == "plain body"
|
assert _extract_body(msg) == "plain body"
|
||||||
|
|
||||||
|
|
||||||
def test_extract_body_falls_back_to_html_when_no_plain():
|
def test_extract_body_falls_back_to_html_when_no_plain():
|
||||||
msg = MIMEMultipart("alternative")
|
msg = MIMEMultipart("alternative")
|
||||||
msg.attach(MIMEText("<html><body><p>HTML only email</p></body></html>", "html"))
|
msg.attach(MIMEText("<html><body><p>HTML only email</p></body></html>", "html"))
|
||||||
result = extract_body(msg)
|
result = _extract_body(msg)
|
||||||
assert "HTML only email" in result
|
assert "HTML only email" in result
|
||||||
assert "<" not in result # no raw HTML tags leaked through
|
assert "<" not in result # no raw HTML tags leaked through
|
||||||
|
|
||||||
|
|
||||||
def test_extract_body_non_multipart_html_stripped():
|
def test_extract_body_non_multipart_html_stripped():
|
||||||
msg = MIMEText("<html><body><p>Solo HTML</p></body></html>", "html")
|
msg = MIMEText("<html><body><p>Solo HTML</p></body></html>", "html")
|
||||||
result = extract_body(msg)
|
result = _extract_body(msg)
|
||||||
assert "Solo HTML" in result
|
assert "Solo HTML" in result
|
||||||
assert "<html>" not in result
|
assert "<html>" not in result
|
||||||
|
|
||||||
|
|
||||||
def test_extract_body_non_multipart_plain_unchanged():
|
def test_extract_body_non_multipart_plain_unchanged():
|
||||||
msg = MIMEText("just plain text", "plain")
|
msg = MIMEText("just plain text", "plain")
|
||||||
assert extract_body(msg) == "just plain text"
|
assert _extract_body(msg) == "just plain text"
|
||||||
|
|
||||||
|
|
||||||
def test_extract_body_empty_message():
|
def test_extract_body_empty_message():
|
||||||
msg = MIMEText("", "plain")
|
msg = MIMEText("", "plain")
|
||||||
assert extract_body(msg) == ""
|
assert _extract_body(msg) == ""
|
||||||
|
|
||||||
|
|
||||||
def test_extract_body_multipart_empty_returns_empty():
|
def test_extract_body_multipart_empty_returns_empty():
|
||||||
msg = MIMEMultipart("alternative")
|
msg = MIMEMultipart("alternative")
|
||||||
assert extract_body(msg) == ""
|
assert _extract_body(msg) == ""
|
||||||
|
|
|
||||||
|
|
@ -1,399 +0,0 @@
|
||||||
"""Tests for app/models.py — /api/models/* endpoints."""
|
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
import json
|
|
||||||
from pathlib import Path
|
|
||||||
from unittest.mock import MagicMock, patch
|
|
||||||
|
|
||||||
import pytest
|
|
||||||
from fastapi.testclient import TestClient
|
|
||||||
|
|
||||||
|
|
||||||
# ── Fixtures ───────────────────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
@pytest.fixture(autouse=True)
|
|
||||||
def reset_models_globals(tmp_path):
|
|
||||||
"""Redirect module-level dirs to tmp_path and reset download progress."""
|
|
||||||
from app import models as models_module
|
|
||||||
|
|
||||||
prev_models = models_module._MODELS_DIR
|
|
||||||
prev_queue = models_module._QUEUE_DIR
|
|
||||||
prev_progress = dict(models_module._download_progress)
|
|
||||||
|
|
||||||
models_dir = tmp_path / "models"
|
|
||||||
queue_dir = tmp_path / "data"
|
|
||||||
models_dir.mkdir()
|
|
||||||
queue_dir.mkdir()
|
|
||||||
|
|
||||||
models_module.set_models_dir(models_dir)
|
|
||||||
models_module.set_queue_dir(queue_dir)
|
|
||||||
models_module._download_progress = {}
|
|
||||||
|
|
||||||
yield
|
|
||||||
|
|
||||||
models_module.set_models_dir(prev_models)
|
|
||||||
models_module.set_queue_dir(prev_queue)
|
|
||||||
models_module._download_progress = prev_progress
|
|
||||||
|
|
||||||
|
|
||||||
@pytest.fixture
|
|
||||||
def client():
|
|
||||||
from app.api import app
|
|
||||||
return TestClient(app)
|
|
||||||
|
|
||||||
|
|
||||||
def _make_hf_response(repo_id: str = "org/model", pipeline_tag: str = "text-classification") -> dict:
|
|
||||||
"""Minimal HF API response payload."""
|
|
||||||
return {
|
|
||||||
"modelId": repo_id,
|
|
||||||
"pipeline_tag": pipeline_tag,
|
|
||||||
"tags": ["pytorch", pipeline_tag],
|
|
||||||
"downloads": 42000,
|
|
||||||
"siblings": [
|
|
||||||
{"rfilename": "pytorch_model.bin", "size": 500_000_000},
|
|
||||||
],
|
|
||||||
"cardData": {"description": "A test model description."},
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
def _queue_one(client, repo_id: str = "org/model") -> dict:
|
|
||||||
"""Helper: POST to /queue and return the created entry."""
|
|
||||||
r = client.post("/api/models/queue", json={
|
|
||||||
"repo_id": repo_id,
|
|
||||||
"pipeline_tag": "text-classification",
|
|
||||||
"adapter_recommendation": "ZeroShotAdapter",
|
|
||||||
})
|
|
||||||
assert r.status_code == 201, r.text
|
|
||||||
return r.json()
|
|
||||||
|
|
||||||
|
|
||||||
# ── GET /lookup ────────────────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
def test_lookup_invalid_repo_id_returns_422_no_slash(client):
|
|
||||||
"""repo_id without a '/' should be rejected with 422."""
|
|
||||||
r = client.get("/api/models/lookup", params={"repo_id": "noslash"})
|
|
||||||
assert r.status_code == 422
|
|
||||||
|
|
||||||
|
|
||||||
def test_lookup_invalid_repo_id_returns_422_whitespace(client):
|
|
||||||
"""repo_id containing whitespace should be rejected with 422."""
|
|
||||||
r = client.get("/api/models/lookup", params={"repo_id": "org/model name"})
|
|
||||||
assert r.status_code == 422
|
|
||||||
|
|
||||||
|
|
||||||
def test_lookup_hf_404_returns_404(client):
|
|
||||||
"""HF API returning 404 should surface as HTTP 404."""
|
|
||||||
mock_resp = MagicMock()
|
|
||||||
mock_resp.status_code = 404
|
|
||||||
|
|
||||||
with patch("app.models.httpx.get", return_value=mock_resp):
|
|
||||||
r = client.get("/api/models/lookup", params={"repo_id": "org/nonexistent"})
|
|
||||||
|
|
||||||
assert r.status_code == 404
|
|
||||||
|
|
||||||
|
|
||||||
def test_lookup_hf_network_error_returns_502(client):
|
|
||||||
"""Network error reaching HF API should return 502."""
|
|
||||||
import httpx as _httpx
|
|
||||||
|
|
||||||
with patch("app.models.httpx.get", side_effect=_httpx.RequestError("timeout")):
|
|
||||||
r = client.get("/api/models/lookup", params={"repo_id": "org/model"})
|
|
||||||
|
|
||||||
assert r.status_code == 502
|
|
||||||
|
|
||||||
|
|
||||||
def test_lookup_returns_correct_shape(client):
|
|
||||||
"""Successful lookup returns all required fields."""
|
|
||||||
mock_resp = MagicMock()
|
|
||||||
mock_resp.status_code = 200
|
|
||||||
mock_resp.json.return_value = _make_hf_response("org/mymodel", "text-classification")
|
|
||||||
|
|
||||||
with patch("app.models.httpx.get", return_value=mock_resp):
|
|
||||||
r = client.get("/api/models/lookup", params={"repo_id": "org/mymodel"})
|
|
||||||
|
|
||||||
assert r.status_code == 200
|
|
||||||
data = r.json()
|
|
||||||
assert data["repo_id"] == "org/mymodel"
|
|
||||||
assert data["pipeline_tag"] == "text-classification"
|
|
||||||
assert data["adapter_recommendation"] == "ZeroShotAdapter"
|
|
||||||
assert data["model_size_bytes"] == 500_000_000
|
|
||||||
assert data["downloads"] == 42000
|
|
||||||
assert data["already_installed"] is False
|
|
||||||
assert data["already_queued"] is False
|
|
||||||
|
|
||||||
|
|
||||||
def test_lookup_unknown_pipeline_tag_returns_null_adapter(client):
|
|
||||||
"""An unrecognised pipeline_tag yields adapter_recommendation=null."""
|
|
||||||
mock_resp = MagicMock()
|
|
||||||
mock_resp.status_code = 200
|
|
||||||
mock_resp.json.return_value = _make_hf_response("org/m", "audio-classification")
|
|
||||||
|
|
||||||
with patch("app.models.httpx.get", return_value=mock_resp):
|
|
||||||
r = client.get("/api/models/lookup", params={"repo_id": "org/m"})
|
|
||||||
|
|
||||||
assert r.status_code == 200
|
|
||||||
assert r.json()["adapter_recommendation"] is None
|
|
||||||
|
|
||||||
|
|
||||||
def test_lookup_already_queued_flag(client):
|
|
||||||
"""already_queued is True when repo_id is in the pending queue."""
|
|
||||||
_queue_one(client, "org/queued-model")
|
|
||||||
|
|
||||||
mock_resp = MagicMock()
|
|
||||||
mock_resp.status_code = 200
|
|
||||||
mock_resp.json.return_value = _make_hf_response("org/queued-model")
|
|
||||||
|
|
||||||
with patch("app.models.httpx.get", return_value=mock_resp):
|
|
||||||
r = client.get("/api/models/lookup", params={"repo_id": "org/queued-model"})
|
|
||||||
|
|
||||||
assert r.status_code == 200
|
|
||||||
assert r.json()["already_queued"] is True
|
|
||||||
|
|
||||||
|
|
||||||
# ── GET /queue ─────────────────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
def test_queue_empty_initially(client):
|
|
||||||
r = client.get("/api/models/queue")
|
|
||||||
assert r.status_code == 200
|
|
||||||
assert r.json() == []
|
|
||||||
|
|
||||||
|
|
||||||
def test_queue_add_and_list(client):
|
|
||||||
"""POST then GET /queue should return the entry."""
|
|
||||||
entry = _queue_one(client, "org/my-model")
|
|
||||||
|
|
||||||
r = client.get("/api/models/queue")
|
|
||||||
assert r.status_code == 200
|
|
||||||
items = r.json()
|
|
||||||
assert len(items) == 1
|
|
||||||
assert items[0]["repo_id"] == "org/my-model"
|
|
||||||
assert items[0]["status"] == "pending"
|
|
||||||
assert items[0]["id"] == entry["id"]
|
|
||||||
|
|
||||||
|
|
||||||
def test_queue_add_returns_entry_fields(client):
|
|
||||||
"""POST /queue returns an entry with all expected fields."""
|
|
||||||
entry = _queue_one(client)
|
|
||||||
assert "id" in entry
|
|
||||||
assert "queued_at" in entry
|
|
||||||
assert entry["status"] == "pending"
|
|
||||||
assert entry["pipeline_tag"] == "text-classification"
|
|
||||||
assert entry["adapter_recommendation"] == "ZeroShotAdapter"
|
|
||||||
|
|
||||||
|
|
||||||
# ── POST /queue — 409 duplicate ────────────────────────────────────────────────
|
|
||||||
|
|
||||||
def test_queue_duplicate_returns_409(client):
|
|
||||||
"""Posting the same repo_id twice should return 409."""
|
|
||||||
_queue_one(client, "org/dup-model")
|
|
||||||
|
|
||||||
r = client.post("/api/models/queue", json={
|
|
||||||
"repo_id": "org/dup-model",
|
|
||||||
"pipeline_tag": "text-classification",
|
|
||||||
"adapter_recommendation": "ZeroShotAdapter",
|
|
||||||
})
|
|
||||||
assert r.status_code == 409
|
|
||||||
|
|
||||||
|
|
||||||
def test_queue_multiple_different_models(client):
|
|
||||||
"""Multiple distinct repo_ids should all be accepted."""
|
|
||||||
_queue_one(client, "org/model-a")
|
|
||||||
_queue_one(client, "org/model-b")
|
|
||||||
_queue_one(client, "org/model-c")
|
|
||||||
|
|
||||||
r = client.get("/api/models/queue")
|
|
||||||
assert r.status_code == 200
|
|
||||||
assert len(r.json()) == 3
|
|
||||||
|
|
||||||
|
|
||||||
# ── DELETE /queue/{id} — dismiss ──────────────────────────────────────────────
|
|
||||||
|
|
||||||
def test_queue_dismiss(client):
|
|
||||||
"""DELETE /queue/{id} sets status=dismissed; entry not returned by GET /queue."""
|
|
||||||
entry = _queue_one(client)
|
|
||||||
entry_id = entry["id"]
|
|
||||||
|
|
||||||
r = client.delete(f"/api/models/queue/{entry_id}")
|
|
||||||
assert r.status_code == 200
|
|
||||||
assert r.json() == {"ok": True}
|
|
||||||
|
|
||||||
r2 = client.get("/api/models/queue")
|
|
||||||
assert r2.status_code == 200
|
|
||||||
assert r2.json() == []
|
|
||||||
|
|
||||||
|
|
||||||
def test_queue_dismiss_nonexistent_returns_404(client):
|
|
||||||
"""DELETE /queue/{id} with unknown id returns 404."""
|
|
||||||
r = client.delete("/api/models/queue/does-not-exist")
|
|
||||||
assert r.status_code == 404
|
|
||||||
|
|
||||||
|
|
||||||
def test_queue_dismiss_allows_re_queue(client):
|
|
||||||
"""After dismissal the same repo_id can be queued again."""
|
|
||||||
entry = _queue_one(client, "org/requeue-model")
|
|
||||||
client.delete(f"/api/models/queue/{entry['id']}")
|
|
||||||
|
|
||||||
r = client.post("/api/models/queue", json={
|
|
||||||
"repo_id": "org/requeue-model",
|
|
||||||
"pipeline_tag": None,
|
|
||||||
"adapter_recommendation": None,
|
|
||||||
})
|
|
||||||
assert r.status_code == 201
|
|
||||||
|
|
||||||
|
|
||||||
# ── POST /queue/{id}/approve ───────────────────────────────────────────────────
|
|
||||||
|
|
||||||
def test_approve_nonexistent_returns_404(client):
|
|
||||||
"""Approving an unknown id returns 404."""
|
|
||||||
r = client.post("/api/models/queue/ghost-id/approve")
|
|
||||||
assert r.status_code == 404
|
|
||||||
|
|
||||||
|
|
||||||
def test_approve_non_pending_returns_409(client):
|
|
||||||
"""Approving an entry that is not in 'pending' state returns 409."""
|
|
||||||
from app import models as models_module
|
|
||||||
|
|
||||||
entry = _queue_one(client)
|
|
||||||
# Manually flip status to 'failed'
|
|
||||||
models_module._update_queue_entry(entry["id"], {"status": "failed"})
|
|
||||||
|
|
||||||
r = client.post(f"/api/models/queue/{entry['id']}/approve")
|
|
||||||
assert r.status_code == 409
|
|
||||||
|
|
||||||
|
|
||||||
def test_approve_starts_download_and_returns_ok(client):
|
|
||||||
"""Approving a pending entry returns {ok: true} and starts a background thread."""
|
|
||||||
import time
|
|
||||||
import threading
|
|
||||||
|
|
||||||
entry = _queue_one(client)
|
|
||||||
|
|
||||||
# Patch snapshot_download so the thread doesn't actually hit the network.
|
|
||||||
# Use an Event so we can wait for the thread to finish before asserting.
|
|
||||||
thread_done = threading.Event()
|
|
||||||
original_run = None
|
|
||||||
|
|
||||||
def _fake_snapshot_download(**kwargs):
|
|
||||||
pass
|
|
||||||
|
|
||||||
with patch("app.models.snapshot_download", side_effect=_fake_snapshot_download):
|
|
||||||
r = client.post(f"/api/models/queue/{entry['id']}/approve")
|
|
||||||
assert r.status_code == 200
|
|
||||||
assert r.json() == {"ok": True}
|
|
||||||
# Give the background thread a moment to complete while snapshot_download is patched
|
|
||||||
time.sleep(0.3)
|
|
||||||
|
|
||||||
# Queue entry status should have moved to 'downloading' (or 'ready' if fast)
|
|
||||||
from app import models as models_module
|
|
||||||
updated = models_module._get_queue_entry(entry["id"])
|
|
||||||
assert updated is not None, "Queue entry not found — thread may have run after fixture teardown"
|
|
||||||
assert updated["status"] in ("downloading", "ready", "failed")
|
|
||||||
|
|
||||||
|
|
||||||
# ── GET /download/stream ───────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
def test_download_stream_idle_when_no_download(client):
|
|
||||||
"""GET /download/stream returns a single idle event when nothing is downloading."""
|
|
||||||
r = client.get("/api/models/download/stream")
|
|
||||||
assert r.status_code == 200
|
|
||||||
# SSE body should contain the idle event
|
|
||||||
assert "idle" in r.text
|
|
||||||
|
|
||||||
|
|
||||||
# ── GET /installed ─────────────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
def test_installed_empty(client):
|
|
||||||
"""GET /installed returns [] when models dir is empty."""
|
|
||||||
r = client.get("/api/models/installed")
|
|
||||||
assert r.status_code == 200
|
|
||||||
assert r.json() == []
|
|
||||||
|
|
||||||
|
|
||||||
def test_installed_detects_downloaded_model(client, tmp_path):
|
|
||||||
"""A subdir with config.json is surfaced as type='downloaded'."""
|
|
||||||
from app import models as models_module
|
|
||||||
|
|
||||||
model_dir = models_module._MODELS_DIR / "org--mymodel"
|
|
||||||
model_dir.mkdir()
|
|
||||||
(model_dir / "config.json").write_text(json.dumps({"model_type": "bert"}), encoding="utf-8")
|
|
||||||
(model_dir / "model_info.json").write_text(
|
|
||||||
json.dumps({"repo_id": "org/mymodel", "adapter_recommendation": "ZeroShotAdapter"}),
|
|
||||||
encoding="utf-8",
|
|
||||||
)
|
|
||||||
|
|
||||||
r = client.get("/api/models/installed")
|
|
||||||
assert r.status_code == 200
|
|
||||||
items = r.json()
|
|
||||||
assert len(items) == 1
|
|
||||||
assert items[0]["type"] == "downloaded"
|
|
||||||
assert items[0]["name"] == "org--mymodel"
|
|
||||||
assert items[0]["adapter"] == "ZeroShotAdapter"
|
|
||||||
assert items[0]["model_id"] == "org/mymodel"
|
|
||||||
|
|
||||||
|
|
||||||
def test_installed_detects_finetuned_model(client):
|
|
||||||
"""A subdir with training_info.json is surfaced as type='finetuned'."""
|
|
||||||
from app import models as models_module
|
|
||||||
|
|
||||||
model_dir = models_module._MODELS_DIR / "my-finetuned"
|
|
||||||
model_dir.mkdir()
|
|
||||||
(model_dir / "training_info.json").write_text(
|
|
||||||
json.dumps({"base_model": "org/base", "epochs": 5}), encoding="utf-8"
|
|
||||||
)
|
|
||||||
|
|
||||||
r = client.get("/api/models/installed")
|
|
||||||
assert r.status_code == 200
|
|
||||||
items = r.json()
|
|
||||||
assert len(items) == 1
|
|
||||||
assert items[0]["type"] == "finetuned"
|
|
||||||
assert items[0]["name"] == "my-finetuned"
|
|
||||||
|
|
||||||
|
|
||||||
# ── DELETE /installed/{name} ───────────────────────────────────────────────────
|
|
||||||
|
|
||||||
def test_delete_installed_removes_directory(client):
|
|
||||||
"""DELETE /installed/{name} removes the directory and returns {ok: true}."""
|
|
||||||
from app import models as models_module
|
|
||||||
|
|
||||||
model_dir = models_module._MODELS_DIR / "org--removeme"
|
|
||||||
model_dir.mkdir()
|
|
||||||
(model_dir / "config.json").write_text("{}", encoding="utf-8")
|
|
||||||
|
|
||||||
r = client.delete("/api/models/installed/org--removeme")
|
|
||||||
assert r.status_code == 200
|
|
||||||
assert r.json() == {"ok": True}
|
|
||||||
assert not model_dir.exists()
|
|
||||||
|
|
||||||
|
|
||||||
def test_delete_installed_not_found_returns_404(client):
|
|
||||||
r = client.delete("/api/models/installed/does-not-exist")
|
|
||||||
assert r.status_code == 404
|
|
||||||
|
|
||||||
|
|
||||||
def test_delete_installed_path_traversal_blocked(client):
|
|
||||||
"""DELETE /installed/../../etc must be blocked (400 or 422)."""
|
|
||||||
r = client.delete("/api/models/installed/../../etc")
|
|
||||||
assert r.status_code in (400, 404, 422)
|
|
||||||
|
|
||||||
|
|
||||||
def test_delete_installed_dotdot_name_blocked(client):
|
|
||||||
"""A name containing '..' in any form must be rejected."""
|
|
||||||
r = client.delete("/api/models/installed/..%2F..%2Fetc")
|
|
||||||
assert r.status_code in (400, 404, 422)
|
|
||||||
|
|
||||||
|
|
||||||
def test_delete_installed_name_with_slash_blocked(client):
|
|
||||||
"""A name containing a literal '/' after URL decoding must be rejected."""
|
|
||||||
from app import models as models_module
|
|
||||||
|
|
||||||
# The router will see the path segment after /installed/ — a second '/' would
|
|
||||||
# be parsed as a new path segment, so we test via the validation helper directly.
|
|
||||||
with pytest.raises(Exception):
|
|
||||||
# Simulate calling delete logic with a slash-containing name directly
|
|
||||||
from fastapi import HTTPException as _HTTPException
|
|
||||||
from app.models import delete_installed
|
|
||||||
try:
|
|
||||||
delete_installed("org/traversal")
|
|
||||||
except _HTTPException as exc:
|
|
||||||
assert exc.status_code in (400, 404)
|
|
||||||
raise
|
|
||||||
|
|
@ -1,377 +0,0 @@
|
||||||
"""API integration tests for app/sft.py — /api/sft/* endpoints."""
|
|
||||||
import json
|
|
||||||
import pytest
|
|
||||||
from fastapi.testclient import TestClient
|
|
||||||
from pathlib import Path
|
|
||||||
|
|
||||||
|
|
||||||
@pytest.fixture(autouse=True)
|
|
||||||
def reset_sft_globals(tmp_path):
|
|
||||||
from app import sft as sft_module
|
|
||||||
_prev_data = sft_module._SFT_DATA_DIR
|
|
||||||
_prev_cfg = sft_module._SFT_CONFIG_DIR
|
|
||||||
sft_module.set_sft_data_dir(tmp_path)
|
|
||||||
sft_module.set_sft_config_dir(tmp_path)
|
|
||||||
yield
|
|
||||||
sft_module.set_sft_data_dir(_prev_data)
|
|
||||||
sft_module.set_sft_config_dir(_prev_cfg)
|
|
||||||
|
|
||||||
|
|
||||||
@pytest.fixture
|
|
||||||
def client():
|
|
||||||
from app.api import app
|
|
||||||
return TestClient(app)
|
|
||||||
|
|
||||||
|
|
||||||
def _make_record(id: str, run_id: str = "2026-04-07-143022") -> dict:
|
|
||||||
return {
|
|
||||||
"id": id, "source": "cf-orch-benchmark",
|
|
||||||
"benchmark_run_id": run_id, "timestamp": "2026-04-07T10:00:00Z",
|
|
||||||
"status": "needs_review",
|
|
||||||
"prompt_messages": [
|
|
||||||
{"role": "system", "content": "You are a helpful assistant."},
|
|
||||||
{"role": "user", "content": "Write a Python function that adds two numbers."},
|
|
||||||
],
|
|
||||||
"model_response": "def add(a, b): return a - b",
|
|
||||||
"corrected_response": None,
|
|
||||||
"quality_score": 0.2, "failure_reason": "pattern_match: 0/2 matched",
|
|
||||||
"task_id": "code-fn", "task_type": "code",
|
|
||||||
"task_name": "Code: Write a Python function",
|
|
||||||
"model_id": "Qwen/Qwen2.5-3B", "model_name": "Qwen2.5-3B",
|
|
||||||
"node_id": "heimdall", "gpu_id": 0, "tokens_per_sec": 38.4,
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
def _write_run(tmp_path, run_id: str, records: list[dict]) -> Path:
|
|
||||||
run_dir = tmp_path / "bench_results" / run_id
|
|
||||||
run_dir.mkdir(parents=True)
|
|
||||||
sft_path = run_dir / "sft_candidates.jsonl"
|
|
||||||
sft_path.write_text(
|
|
||||||
"\n".join(json.dumps(r) for r in records) + "\n", encoding="utf-8"
|
|
||||||
)
|
|
||||||
return sft_path
|
|
||||||
|
|
||||||
|
|
||||||
def _write_config(tmp_path, bench_results_dir: Path) -> None:
|
|
||||||
import yaml
|
|
||||||
cfg = {"sft": {"bench_results_dir": str(bench_results_dir)}}
|
|
||||||
(tmp_path / "label_tool.yaml").write_text(
|
|
||||||
yaml.dump(cfg, allow_unicode=True), encoding="utf-8"
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
# ── /api/sft/runs ──────────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
def test_runs_returns_empty_when_no_config(client):
|
|
||||||
r = client.get("/api/sft/runs")
|
|
||||||
assert r.status_code == 200
|
|
||||||
assert r.json() == []
|
|
||||||
|
|
||||||
|
|
||||||
def test_runs_returns_available_runs(client, tmp_path):
|
|
||||||
_write_run(tmp_path, "2026-04-07-143022", [_make_record("a"), _make_record("b")])
|
|
||||||
_write_config(tmp_path, tmp_path / "bench_results")
|
|
||||||
r = client.get("/api/sft/runs")
|
|
||||||
assert r.status_code == 200
|
|
||||||
data = r.json()
|
|
||||||
assert len(data) == 1
|
|
||||||
assert data[0]["run_id"] == "2026-04-07-143022"
|
|
||||||
assert data[0]["candidate_count"] == 2
|
|
||||||
assert data[0]["already_imported"] is False
|
|
||||||
|
|
||||||
|
|
||||||
def test_runs_marks_already_imported(client, tmp_path):
|
|
||||||
_write_run(tmp_path, "2026-04-07-143022", [_make_record("a")])
|
|
||||||
_write_config(tmp_path, tmp_path / "bench_results")
|
|
||||||
from app import sft as sft_module
|
|
||||||
candidates = sft_module._candidates_file()
|
|
||||||
candidates.parent.mkdir(parents=True, exist_ok=True)
|
|
||||||
candidates.write_text(
|
|
||||||
json.dumps(_make_record("a", run_id="2026-04-07-143022")) + "\n",
|
|
||||||
encoding="utf-8"
|
|
||||||
)
|
|
||||||
r = client.get("/api/sft/runs")
|
|
||||||
assert r.json()[0]["already_imported"] is True
|
|
||||||
|
|
||||||
|
|
||||||
# ── /api/sft/import ─────────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
def test_import_adds_records(client, tmp_path):
|
|
||||||
_write_run(tmp_path, "2026-04-07-143022", [_make_record("a"), _make_record("b")])
|
|
||||||
_write_config(tmp_path, tmp_path / "bench_results")
|
|
||||||
r = client.post("/api/sft/import", json={"run_id": "2026-04-07-143022"})
|
|
||||||
assert r.status_code == 200
|
|
||||||
assert r.json() == {"imported": 2, "skipped": 0}
|
|
||||||
|
|
||||||
|
|
||||||
def test_import_is_idempotent(client, tmp_path):
|
|
||||||
_write_run(tmp_path, "2026-04-07-143022", [_make_record("a")])
|
|
||||||
_write_config(tmp_path, tmp_path / "bench_results")
|
|
||||||
client.post("/api/sft/import", json={"run_id": "2026-04-07-143022"})
|
|
||||||
r = client.post("/api/sft/import", json={"run_id": "2026-04-07-143022"})
|
|
||||||
assert r.json() == {"imported": 0, "skipped": 1}
|
|
||||||
|
|
||||||
|
|
||||||
def test_import_unknown_run_returns_404(client, tmp_path):
|
|
||||||
_write_config(tmp_path, tmp_path / "bench_results")
|
|
||||||
r = client.post("/api/sft/import", json={"run_id": "nonexistent"})
|
|
||||||
assert r.status_code == 404
|
|
||||||
|
|
||||||
|
|
||||||
# ── /api/sft/queue ──────────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
def _populate_candidates(tmp_path, records: list[dict]) -> None:
|
|
||||||
from app import sft as sft_module
|
|
||||||
path = sft_module._candidates_file()
|
|
||||||
path.parent.mkdir(parents=True, exist_ok=True)
|
|
||||||
path.write_text(
|
|
||||||
"\n".join(json.dumps(r) for r in records) + "\n", encoding="utf-8"
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def test_queue_returns_needs_review_only(client, tmp_path):
|
|
||||||
records = [
|
|
||||||
_make_record("a"), # needs_review
|
|
||||||
{**_make_record("b"), "status": "approved"}, # should not appear
|
|
||||||
{**_make_record("c"), "status": "discarded"}, # should not appear
|
|
||||||
]
|
|
||||||
_populate_candidates(tmp_path, records)
|
|
||||||
r = client.get("/api/sft/queue")
|
|
||||||
assert r.status_code == 200
|
|
||||||
data = r.json()
|
|
||||||
assert data["total"] == 1
|
|
||||||
assert len(data["items"]) == 1
|
|
||||||
assert data["items"][0]["id"] == "a"
|
|
||||||
|
|
||||||
|
|
||||||
def test_queue_pagination(client, tmp_path):
|
|
||||||
records = [_make_record(str(i)) for i in range(25)]
|
|
||||||
_populate_candidates(tmp_path, records)
|
|
||||||
r = client.get("/api/sft/queue?page=1&per_page=10")
|
|
||||||
data = r.json()
|
|
||||||
assert data["total"] == 25
|
|
||||||
assert len(data["items"]) == 10
|
|
||||||
r2 = client.get("/api/sft/queue?page=3&per_page=10")
|
|
||||||
assert len(r2.json()["items"]) == 5
|
|
||||||
|
|
||||||
|
|
||||||
def test_queue_empty_when_no_file(client):
|
|
||||||
r = client.get("/api/sft/queue")
|
|
||||||
assert r.status_code == 200
|
|
||||||
assert r.json() == {"items": [], "total": 0, "page": 1, "per_page": 20}
|
|
||||||
|
|
||||||
|
|
||||||
# ── /api/sft/submit ─────────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
def test_submit_correct_sets_approved(client, tmp_path):
|
|
||||||
_populate_candidates(tmp_path, [_make_record("a")])
|
|
||||||
r = client.post("/api/sft/submit", json={
|
|
||||||
"id": "a", "action": "correct",
|
|
||||||
"corrected_response": "def add(a, b): return a + b",
|
|
||||||
})
|
|
||||||
assert r.status_code == 200
|
|
||||||
from app import sft as sft_module
|
|
||||||
records = sft_module._read_candidates()
|
|
||||||
assert records[0]["status"] == "approved"
|
|
||||||
assert records[0]["corrected_response"] == "def add(a, b): return a + b"
|
|
||||||
|
|
||||||
|
|
||||||
def test_submit_correct_also_appends_to_approved_file(client, tmp_path):
|
|
||||||
_populate_candidates(tmp_path, [_make_record("a")])
|
|
||||||
client.post("/api/sft/submit", json={
|
|
||||||
"id": "a", "action": "correct",
|
|
||||||
"corrected_response": "def add(a, b): return a + b",
|
|
||||||
})
|
|
||||||
from app import sft as sft_module
|
|
||||||
from app.utils import read_jsonl
|
|
||||||
approved = read_jsonl(sft_module._approved_file())
|
|
||||||
assert len(approved) == 1
|
|
||||||
assert approved[0]["id"] == "a"
|
|
||||||
|
|
||||||
|
|
||||||
def test_submit_discard_sets_discarded(client, tmp_path):
|
|
||||||
_populate_candidates(tmp_path, [_make_record("a")])
|
|
||||||
r = client.post("/api/sft/submit", json={"id": "a", "action": "discard"})
|
|
||||||
assert r.status_code == 200
|
|
||||||
from app import sft as sft_module
|
|
||||||
assert sft_module._read_candidates()[0]["status"] == "discarded"
|
|
||||||
|
|
||||||
|
|
||||||
def test_submit_flag_sets_model_rejected(client, tmp_path):
|
|
||||||
_populate_candidates(tmp_path, [_make_record("a")])
|
|
||||||
r = client.post("/api/sft/submit", json={"id": "a", "action": "flag"})
|
|
||||||
assert r.status_code == 200
|
|
||||||
from app import sft as sft_module
|
|
||||||
assert sft_module._read_candidates()[0]["status"] == "model_rejected"
|
|
||||||
|
|
||||||
|
|
||||||
def test_submit_correct_empty_response_returns_422(client, tmp_path):
|
|
||||||
_populate_candidates(tmp_path, [_make_record("a")])
|
|
||||||
r = client.post("/api/sft/submit", json={
|
|
||||||
"id": "a", "action": "correct", "corrected_response": " ",
|
|
||||||
})
|
|
||||||
assert r.status_code == 422
|
|
||||||
|
|
||||||
|
|
||||||
def test_submit_correct_null_response_returns_422(client, tmp_path):
|
|
||||||
_populate_candidates(tmp_path, [_make_record("a")])
|
|
||||||
r = client.post("/api/sft/submit", json={
|
|
||||||
"id": "a", "action": "correct", "corrected_response": None,
|
|
||||||
})
|
|
||||||
assert r.status_code == 422
|
|
||||||
|
|
||||||
|
|
||||||
def test_submit_unknown_id_returns_404(client, tmp_path):
|
|
||||||
r = client.post("/api/sft/submit", json={"id": "nope", "action": "discard"})
|
|
||||||
assert r.status_code == 404
|
|
||||||
|
|
||||||
|
|
||||||
def test_submit_already_approved_returns_409(client, tmp_path):
|
|
||||||
_populate_candidates(tmp_path, [{**_make_record("a"), "status": "approved"}])
|
|
||||||
r = client.post("/api/sft/submit", json={"id": "a", "action": "discard"})
|
|
||||||
assert r.status_code == 409
|
|
||||||
|
|
||||||
|
|
||||||
def test_submit_correct_stores_failure_category(client, tmp_path):
|
|
||||||
_populate_candidates(tmp_path, [_make_record("a")])
|
|
||||||
r = client.post("/api/sft/submit", json={
|
|
||||||
"id": "a", "action": "correct",
|
|
||||||
"corrected_response": "def add(a, b): return a + b",
|
|
||||||
"failure_category": "style_violation",
|
|
||||||
})
|
|
||||||
assert r.status_code == 200
|
|
||||||
from app import sft as sft_module
|
|
||||||
records = sft_module._read_candidates()
|
|
||||||
assert records[0]["failure_category"] == "style_violation"
|
|
||||||
|
|
||||||
|
|
||||||
def test_submit_correct_null_failure_category(client, tmp_path):
|
|
||||||
_populate_candidates(tmp_path, [_make_record("a")])
|
|
||||||
r = client.post("/api/sft/submit", json={
|
|
||||||
"id": "a", "action": "correct",
|
|
||||||
"corrected_response": "def add(a, b): return a + b",
|
|
||||||
})
|
|
||||||
assert r.status_code == 200
|
|
||||||
from app import sft as sft_module
|
|
||||||
records = sft_module._read_candidates()
|
|
||||||
assert records[0]["failure_category"] is None
|
|
||||||
|
|
||||||
|
|
||||||
def test_submit_invalid_failure_category_returns_422(client, tmp_path):
|
|
||||||
_populate_candidates(tmp_path, [_make_record("a")])
|
|
||||||
r = client.post("/api/sft/submit", json={
|
|
||||||
"id": "a", "action": "correct",
|
|
||||||
"corrected_response": "def add(a, b): return a + b",
|
|
||||||
"failure_category": "nonsense",
|
|
||||||
})
|
|
||||||
assert r.status_code == 422
|
|
||||||
|
|
||||||
|
|
||||||
# ── /api/sft/undo ────────────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
def test_undo_restores_discarded_to_needs_review(client, tmp_path):
|
|
||||||
_populate_candidates(tmp_path, [_make_record("a")])
|
|
||||||
client.post("/api/sft/submit", json={"id": "a", "action": "discard"})
|
|
||||||
r = client.post("/api/sft/undo", json={"id": "a"})
|
|
||||||
assert r.status_code == 200
|
|
||||||
from app import sft as sft_module
|
|
||||||
assert sft_module._read_candidates()[0]["status"] == "needs_review"
|
|
||||||
|
|
||||||
|
|
||||||
def test_undo_removes_approved_from_approved_file(client, tmp_path):
|
|
||||||
_populate_candidates(tmp_path, [_make_record("a")])
|
|
||||||
client.post("/api/sft/submit", json={
|
|
||||||
"id": "a", "action": "correct",
|
|
||||||
"corrected_response": "def add(a, b): return a + b",
|
|
||||||
})
|
|
||||||
client.post("/api/sft/undo", json={"id": "a"})
|
|
||||||
from app import sft as sft_module
|
|
||||||
from app.utils import read_jsonl
|
|
||||||
approved = read_jsonl(sft_module._approved_file())
|
|
||||||
assert not any(r["id"] == "a" for r in approved)
|
|
||||||
|
|
||||||
|
|
||||||
def test_undo_already_needs_review_returns_409(client, tmp_path):
|
|
||||||
_populate_candidates(tmp_path, [_make_record("a")])
|
|
||||||
r = client.post("/api/sft/undo", json={"id": "a"})
|
|
||||||
assert r.status_code == 409
|
|
||||||
|
|
||||||
|
|
||||||
# ── /api/sft/export ──────────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
def test_export_returns_approved_as_sft_jsonl(client, tmp_path):
|
|
||||||
from app import sft as sft_module
|
|
||||||
from app.utils import write_jsonl
|
|
||||||
approved = {
|
|
||||||
**_make_record("a"),
|
|
||||||
"status": "approved",
|
|
||||||
"corrected_response": "def add(a, b): return a + b",
|
|
||||||
"prompt_messages": [
|
|
||||||
{"role": "system", "content": "You are a coding assistant."},
|
|
||||||
{"role": "user", "content": "Write a Python add function."},
|
|
||||||
],
|
|
||||||
}
|
|
||||||
write_jsonl(sft_module._approved_file(), [approved])
|
|
||||||
_populate_candidates(tmp_path, [approved])
|
|
||||||
|
|
||||||
r = client.get("/api/sft/export")
|
|
||||||
assert r.status_code == 200
|
|
||||||
assert "application/x-ndjson" in r.headers["content-type"]
|
|
||||||
lines = [l for l in r.text.splitlines() if l.strip()]
|
|
||||||
assert len(lines) == 1
|
|
||||||
record = json.loads(lines[0])
|
|
||||||
assert record["messages"][-1] == {
|
|
||||||
"role": "assistant", "content": "def add(a, b): return a + b"
|
|
||||||
}
|
|
||||||
assert record["messages"][0]["role"] == "system"
|
|
||||||
assert record["messages"][1]["role"] == "user"
|
|
||||||
|
|
||||||
|
|
||||||
def test_export_excludes_non_approved(client, tmp_path):
|
|
||||||
from app import sft as sft_module
|
|
||||||
from app.utils import write_jsonl
|
|
||||||
records = [
|
|
||||||
{**_make_record("a"), "status": "discarded", "corrected_response": None},
|
|
||||||
{**_make_record("b"), "status": "needs_review", "corrected_response": None},
|
|
||||||
]
|
|
||||||
write_jsonl(sft_module._approved_file(), records)
|
|
||||||
r = client.get("/api/sft/export")
|
|
||||||
assert r.text.strip() == ""
|
|
||||||
|
|
||||||
|
|
||||||
def test_export_empty_when_no_approved_file(client):
|
|
||||||
r = client.get("/api/sft/export")
|
|
||||||
assert r.status_code == 200
|
|
||||||
assert r.text.strip() == ""
|
|
||||||
|
|
||||||
|
|
||||||
# ── /api/sft/stats ───────────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
def test_stats_counts_by_status(client, tmp_path):
|
|
||||||
from app import sft as sft_module
|
|
||||||
from app.utils import write_jsonl
|
|
||||||
records = [
|
|
||||||
_make_record("a"),
|
|
||||||
{**_make_record("b"), "status": "approved", "corrected_response": "ok"},
|
|
||||||
{**_make_record("c"), "status": "discarded"},
|
|
||||||
{**_make_record("d"), "status": "model_rejected"},
|
|
||||||
]
|
|
||||||
_populate_candidates(tmp_path, records)
|
|
||||||
write_jsonl(sft_module._approved_file(), [records[1]])
|
|
||||||
r = client.get("/api/sft/stats")
|
|
||||||
assert r.status_code == 200
|
|
||||||
data = r.json()
|
|
||||||
assert data["total"] == 4
|
|
||||||
assert data["by_status"]["needs_review"] == 1
|
|
||||||
assert data["by_status"]["approved"] == 1
|
|
||||||
assert data["by_status"]["discarded"] == 1
|
|
||||||
assert data["by_status"]["model_rejected"] == 1
|
|
||||||
assert data["export_ready"] == 1
|
|
||||||
|
|
||||||
|
|
||||||
def test_stats_empty_when_no_data(client):
|
|
||||||
r = client.get("/api/sft/stats")
|
|
||||||
assert r.status_code == 200
|
|
||||||
data = r.json()
|
|
||||||
assert data["total"] == 0
|
|
||||||
assert data["export_ready"] == 0
|
|
||||||
|
|
@ -1,95 +0,0 @@
|
||||||
"""Unit tests for scripts/sft_import.py — run discovery and JSONL deduplication."""
|
|
||||||
import json
|
|
||||||
import pytest
|
|
||||||
from pathlib import Path
|
|
||||||
|
|
||||||
|
|
||||||
def _write_candidates(path: Path, records: list[dict]) -> None:
|
|
||||||
path.parent.mkdir(parents=True, exist_ok=True)
|
|
||||||
path.write_text("\n".join(json.dumps(r) for r in records) + "\n", encoding="utf-8")
|
|
||||||
|
|
||||||
|
|
||||||
def _make_record(id: str, run_id: str = "run1") -> dict:
|
|
||||||
return {
|
|
||||||
"id": id, "source": "cf-orch-benchmark",
|
|
||||||
"benchmark_run_id": run_id, "timestamp": "2026-04-07T10:00:00Z",
|
|
||||||
"status": "needs_review", "prompt_messages": [],
|
|
||||||
"model_response": "bad", "corrected_response": None,
|
|
||||||
"quality_score": 0.3, "failure_reason": "missing patterns",
|
|
||||||
"task_id": "code-fn", "task_type": "code", "task_name": "Code: fn",
|
|
||||||
"model_id": "Qwen/Qwen2.5-3B", "model_name": "Qwen2.5-3B",
|
|
||||||
"node_id": "heimdall", "gpu_id": 0, "tokens_per_sec": 38.4,
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
def test_discover_runs_empty_when_dir_missing(tmp_path):
|
|
||||||
from scripts.sft_import import discover_runs
|
|
||||||
result = discover_runs(tmp_path / "nonexistent")
|
|
||||||
assert result == []
|
|
||||||
|
|
||||||
|
|
||||||
def test_discover_runs_returns_runs(tmp_path):
|
|
||||||
from scripts.sft_import import discover_runs
|
|
||||||
run_dir = tmp_path / "2026-04-07-143022"
|
|
||||||
_write_candidates(run_dir / "sft_candidates.jsonl", [_make_record("a"), _make_record("b")])
|
|
||||||
result = discover_runs(tmp_path)
|
|
||||||
assert len(result) == 1
|
|
||||||
assert result[0]["run_id"] == "2026-04-07-143022"
|
|
||||||
assert result[0]["candidate_count"] == 2
|
|
||||||
assert "sft_path" in result[0]
|
|
||||||
|
|
||||||
|
|
||||||
def test_discover_runs_skips_dirs_without_sft_file(tmp_path):
|
|
||||||
from scripts.sft_import import discover_runs
|
|
||||||
(tmp_path / "2026-04-07-no-sft").mkdir()
|
|
||||||
result = discover_runs(tmp_path)
|
|
||||||
assert result == []
|
|
||||||
|
|
||||||
|
|
||||||
def test_discover_runs_sorted_newest_first(tmp_path):
|
|
||||||
from scripts.sft_import import discover_runs
|
|
||||||
for name in ["2026-04-05-120000", "2026-04-07-143022", "2026-04-06-090000"]:
|
|
||||||
run_dir = tmp_path / name
|
|
||||||
_write_candidates(run_dir / "sft_candidates.jsonl", [_make_record("x")])
|
|
||||||
result = discover_runs(tmp_path)
|
|
||||||
assert [r["run_id"] for r in result] == [
|
|
||||||
"2026-04-07-143022", "2026-04-06-090000", "2026-04-05-120000"
|
|
||||||
]
|
|
||||||
|
|
||||||
|
|
||||||
def test_import_run_imports_new_records(tmp_path):
|
|
||||||
from scripts.sft_import import import_run
|
|
||||||
sft_path = tmp_path / "run1" / "sft_candidates.jsonl"
|
|
||||||
_write_candidates(sft_path, [_make_record("a"), _make_record("b")])
|
|
||||||
result = import_run(sft_path, tmp_path)
|
|
||||||
assert result == {"imported": 2, "skipped": 0}
|
|
||||||
dest = tmp_path / "sft_candidates.jsonl"
|
|
||||||
lines = [json.loads(l) for l in dest.read_text().splitlines() if l.strip()]
|
|
||||||
assert len(lines) == 2
|
|
||||||
|
|
||||||
|
|
||||||
def test_import_run_deduplicates_on_id(tmp_path):
|
|
||||||
from scripts.sft_import import import_run
|
|
||||||
sft_path = tmp_path / "run1" / "sft_candidates.jsonl"
|
|
||||||
_write_candidates(sft_path, [_make_record("a"), _make_record("b")])
|
|
||||||
import_run(sft_path, tmp_path)
|
|
||||||
result = import_run(sft_path, tmp_path) # second import
|
|
||||||
assert result == {"imported": 0, "skipped": 2}
|
|
||||||
dest = tmp_path / "sft_candidates.jsonl"
|
|
||||||
lines = [l for l in dest.read_text().splitlines() if l.strip()]
|
|
||||||
assert len(lines) == 2 # no duplicates
|
|
||||||
|
|
||||||
|
|
||||||
def test_import_run_skips_records_missing_id(tmp_path, caplog):
|
|
||||||
import logging
|
|
||||||
from scripts.sft_import import import_run
|
|
||||||
sft_path = tmp_path / "run1" / "sft_candidates.jsonl"
|
|
||||||
sft_path.parent.mkdir()
|
|
||||||
sft_path.write_text(
|
|
||||||
json.dumps({"model_response": "bad", "status": "needs_review"}) + "\n"
|
|
||||||
+ json.dumps({"id": "abc123", "model_response": "good", "status": "needs_review"}) + "\n"
|
|
||||||
)
|
|
||||||
with caplog.at_level(logging.WARNING, logger="scripts.sft_import"):
|
|
||||||
result = import_run(sft_path, tmp_path)
|
|
||||||
assert result == {"imported": 1, "skipped": 0}
|
|
||||||
assert "missing 'id'" in caplog.text
|
|
||||||
|
|
@ -66,8 +66,6 @@ const navItems = [
|
||||||
{ path: '/fetch', icon: '📥', label: 'Fetch' },
|
{ path: '/fetch', icon: '📥', label: 'Fetch' },
|
||||||
{ path: '/stats', icon: '📊', label: 'Stats' },
|
{ path: '/stats', icon: '📊', label: 'Stats' },
|
||||||
{ path: '/benchmark', icon: '🏁', label: 'Benchmark' },
|
{ path: '/benchmark', icon: '🏁', label: 'Benchmark' },
|
||||||
{ path: '/models', icon: '🤗', label: 'Models' },
|
|
||||||
{ path: '/corrections', icon: '✍️', label: 'Corrections' },
|
|
||||||
{ path: '/settings', icon: '⚙️', label: 'Settings' },
|
{ path: '/settings', icon: '⚙️', label: 'Settings' },
|
||||||
]
|
]
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -1,179 +0,0 @@
|
||||||
import { mount } from '@vue/test-utils'
|
|
||||||
import SftCard from './SftCard.vue'
|
|
||||||
import type { SftQueueItem } from '../stores/sft'
|
|
||||||
import { describe, it, expect } from 'vitest'
|
|
||||||
|
|
||||||
const LOW_QUALITY_ITEM: SftQueueItem = {
|
|
||||||
id: 'abc', source: 'cf-orch-benchmark', benchmark_run_id: 'run1',
|
|
||||||
timestamp: '2026-04-07T10:00:00Z', status: 'needs_review',
|
|
||||||
prompt_messages: [
|
|
||||||
{ role: 'system', content: 'You are a coding assistant.' },
|
|
||||||
{ role: 'user', content: 'Write a Python add function.' },
|
|
||||||
],
|
|
||||||
model_response: 'def add(a, b): return a - b',
|
|
||||||
corrected_response: null, quality_score: 0.2,
|
|
||||||
failure_reason: 'pattern_match: 0/2 matched',
|
|
||||||
failure_category: null,
|
|
||||||
task_id: 'code-fn', task_type: 'code', task_name: 'Code: Write a function',
|
|
||||||
model_id: 'Qwen/Qwen2.5-3B', model_name: 'Qwen2.5-3B',
|
|
||||||
node_id: 'heimdall', gpu_id: 0, tokens_per_sec: 38.4,
|
|
||||||
}
|
|
||||||
|
|
||||||
const MID_QUALITY_ITEM: SftQueueItem = { ...LOW_QUALITY_ITEM, id: 'mid', quality_score: 0.55 }
|
|
||||||
const HIGH_QUALITY_ITEM: SftQueueItem = { ...LOW_QUALITY_ITEM, id: 'hi', quality_score: 0.72 }
|
|
||||||
|
|
||||||
describe('SftCard', () => {
|
|
||||||
it('renders model name chip', () => {
|
|
||||||
const w = mount(SftCard, { props: { item: LOW_QUALITY_ITEM } })
|
|
||||||
expect(w.text()).toContain('Qwen2.5-3B')
|
|
||||||
})
|
|
||||||
|
|
||||||
it('renders task type chip', () => {
|
|
||||||
const w = mount(SftCard, { props: { item: LOW_QUALITY_ITEM } })
|
|
||||||
expect(w.text()).toContain('code')
|
|
||||||
})
|
|
||||||
|
|
||||||
it('renders failure reason', () => {
|
|
||||||
const w = mount(SftCard, { props: { item: LOW_QUALITY_ITEM } })
|
|
||||||
expect(w.text()).toContain('pattern_match: 0/2 matched')
|
|
||||||
})
|
|
||||||
|
|
||||||
it('renders model response', () => {
|
|
||||||
const w = mount(SftCard, { props: { item: LOW_QUALITY_ITEM } })
|
|
||||||
expect(w.text()).toContain('def add(a, b): return a - b')
|
|
||||||
})
|
|
||||||
|
|
||||||
it('quality chip shows numeric value for low quality', () => {
|
|
||||||
const w = mount(SftCard, { props: { item: LOW_QUALITY_ITEM } })
|
|
||||||
expect(w.text()).toContain('0.20')
|
|
||||||
})
|
|
||||||
|
|
||||||
it('quality chip has low-quality class when score < 0.4', () => {
|
|
||||||
const w = mount(SftCard, { props: { item: LOW_QUALITY_ITEM } })
|
|
||||||
expect(w.find('[data-testid="quality-chip"]').classes()).toContain('quality-low')
|
|
||||||
})
|
|
||||||
|
|
||||||
it('quality chip has mid-quality class when score is 0.4 to <0.7', () => {
|
|
||||||
const w = mount(SftCard, { props: { item: MID_QUALITY_ITEM } })
|
|
||||||
expect(w.find('[data-testid="quality-chip"]').classes()).toContain('quality-mid')
|
|
||||||
})
|
|
||||||
|
|
||||||
it('quality chip has acceptable class when score >= 0.7', () => {
|
|
||||||
const w = mount(SftCard, { props: { item: HIGH_QUALITY_ITEM } })
|
|
||||||
expect(w.find('[data-testid="quality-chip"]').classes()).toContain('quality-ok')
|
|
||||||
})
|
|
||||||
|
|
||||||
it('clicking Correct button emits correct', async () => {
|
|
||||||
const w = mount(SftCard, { props: { item: LOW_QUALITY_ITEM } })
|
|
||||||
await w.find('[data-testid="correct-btn"]').trigger('click')
|
|
||||||
expect(w.emitted('correct')).toBeTruthy()
|
|
||||||
})
|
|
||||||
|
|
||||||
it('clicking Discard button then confirming emits discard', async () => {
|
|
||||||
const w = mount(SftCard, { props: { item: LOW_QUALITY_ITEM } })
|
|
||||||
await w.find('[data-testid="discard-btn"]').trigger('click')
|
|
||||||
await w.find('[data-testid="confirm-pending-btn"]').trigger('click')
|
|
||||||
expect(w.emitted('discard')).toBeTruthy()
|
|
||||||
})
|
|
||||||
|
|
||||||
it('clicking Flag Model button then confirming emits flag', async () => {
|
|
||||||
const w = mount(SftCard, { props: { item: LOW_QUALITY_ITEM } })
|
|
||||||
await w.find('[data-testid="flag-btn"]').trigger('click')
|
|
||||||
await w.find('[data-testid="confirm-pending-btn"]').trigger('click')
|
|
||||||
expect(w.emitted('flag')).toBeTruthy()
|
|
||||||
})
|
|
||||||
|
|
||||||
it('correction area hidden initially', () => {
|
|
||||||
const w = mount(SftCard, { props: { item: LOW_QUALITY_ITEM } })
|
|
||||||
expect(w.find('[data-testid="correction-area"]').exists()).toBe(false)
|
|
||||||
})
|
|
||||||
|
|
||||||
it('correction area shown when correcting prop is true', () => {
|
|
||||||
const w = mount(SftCard, { props: { item: LOW_QUALITY_ITEM, correcting: true } })
|
|
||||||
expect(w.find('[data-testid="correction-area"]').exists()).toBe(true)
|
|
||||||
})
|
|
||||||
|
|
||||||
it('renders nothing for failure reason when null', () => {
|
|
||||||
const item = { ...LOW_QUALITY_ITEM, failure_reason: null }
|
|
||||||
const w = mount(SftCard, { props: { item } })
|
|
||||||
expect(w.find('.failure-reason').exists()).toBe(false)
|
|
||||||
})
|
|
||||||
|
|
||||||
// ── Failure category chip-group ───────────────────────────────────
|
|
||||||
it('failure category section hidden when not correcting and no pending action', () => {
|
|
||||||
const w = mount(SftCard, { props: { item: LOW_QUALITY_ITEM } })
|
|
||||||
expect(w.find('[data-testid="failure-category-section"]').exists()).toBe(false)
|
|
||||||
})
|
|
||||||
|
|
||||||
it('failure category section shown when correcting prop is true', () => {
|
|
||||||
const w = mount(SftCard, { props: { item: LOW_QUALITY_ITEM, correcting: true } })
|
|
||||||
expect(w.find('[data-testid="failure-category-section"]').exists()).toBe(true)
|
|
||||||
})
|
|
||||||
|
|
||||||
it('renders all six category chips when correcting', () => {
|
|
||||||
const w = mount(SftCard, { props: { item: LOW_QUALITY_ITEM, correcting: true } })
|
|
||||||
const chips = w.findAll('.category-chip')
|
|
||||||
expect(chips).toHaveLength(6)
|
|
||||||
})
|
|
||||||
|
|
||||||
it('clicking a category chip selects it (adds active class)', async () => {
|
|
||||||
const w = mount(SftCard, { props: { item: LOW_QUALITY_ITEM, correcting: true } })
|
|
||||||
const chip = w.find('[data-testid="category-chip-wrong_answer"]')
|
|
||||||
await chip.trigger('click')
|
|
||||||
expect(chip.classes()).toContain('category-chip--active')
|
|
||||||
})
|
|
||||||
|
|
||||||
it('clicking the active chip again deselects it', async () => {
|
|
||||||
const w = mount(SftCard, { props: { item: LOW_QUALITY_ITEM, correcting: true } })
|
|
||||||
const chip = w.find('[data-testid="category-chip-hallucination"]')
|
|
||||||
await chip.trigger('click')
|
|
||||||
expect(chip.classes()).toContain('category-chip--active')
|
|
||||||
await chip.trigger('click')
|
|
||||||
expect(chip.classes()).not.toContain('category-chip--active')
|
|
||||||
})
|
|
||||||
|
|
||||||
it('only one chip can be active at a time', async () => {
|
|
||||||
const w = mount(SftCard, { props: { item: LOW_QUALITY_ITEM, correcting: true } })
|
|
||||||
await w.find('[data-testid="category-chip-wrong_answer"]').trigger('click')
|
|
||||||
await w.find('[data-testid="category-chip-hallucination"]').trigger('click')
|
|
||||||
const active = w.findAll('.category-chip--active')
|
|
||||||
expect(active).toHaveLength(1)
|
|
||||||
expect(active[0].attributes('data-testid')).toBe('category-chip-hallucination')
|
|
||||||
})
|
|
||||||
|
|
||||||
it('clicking Discard shows pending action row with category section', async () => {
|
|
||||||
const w = mount(SftCard, { props: { item: LOW_QUALITY_ITEM } })
|
|
||||||
await w.find('[data-testid="discard-btn"]').trigger('click')
|
|
||||||
expect(w.find('[data-testid="failure-category-section"]').exists()).toBe(true)
|
|
||||||
expect(w.find('[data-testid="pending-action-row"]').exists()).toBe(true)
|
|
||||||
})
|
|
||||||
|
|
||||||
it('clicking Flag shows pending action row', async () => {
|
|
||||||
const w = mount(SftCard, { props: { item: LOW_QUALITY_ITEM } })
|
|
||||||
await w.find('[data-testid="flag-btn"]').trigger('click')
|
|
||||||
expect(w.find('[data-testid="pending-action-row"]').exists()).toBe(true)
|
|
||||||
})
|
|
||||||
|
|
||||||
it('confirming discard emits discard with null when no category selected', async () => {
|
|
||||||
const w = mount(SftCard, { props: { item: LOW_QUALITY_ITEM } })
|
|
||||||
await w.find('[data-testid="discard-btn"]').trigger('click')
|
|
||||||
await w.find('[data-testid="confirm-pending-btn"]').trigger('click')
|
|
||||||
expect(w.emitted('discard')).toBeTruthy()
|
|
||||||
expect(w.emitted('discard')![0]).toEqual([null])
|
|
||||||
})
|
|
||||||
|
|
||||||
it('confirming discard emits discard with selected category', async () => {
|
|
||||||
const w = mount(SftCard, { props: { item: LOW_QUALITY_ITEM } })
|
|
||||||
await w.find('[data-testid="discard-btn"]').trigger('click')
|
|
||||||
await w.find('[data-testid="category-chip-scoring_artifact"]').trigger('click')
|
|
||||||
await w.find('[data-testid="confirm-pending-btn"]').trigger('click')
|
|
||||||
expect(w.emitted('discard')![0]).toEqual(['scoring_artifact'])
|
|
||||||
})
|
|
||||||
|
|
||||||
it('cancelling pending action hides the pending row', async () => {
|
|
||||||
const w = mount(SftCard, { props: { item: LOW_QUALITY_ITEM } })
|
|
||||||
await w.find('[data-testid="discard-btn"]').trigger('click')
|
|
||||||
await w.find('[data-testid="cancel-pending-btn"]').trigger('click')
|
|
||||||
expect(w.find('[data-testid="pending-action-row"]').exists()).toBe(false)
|
|
||||||
})
|
|
||||||
})
|
|
||||||
|
|
@ -1,393 +0,0 @@
|
||||||
<template>
|
|
||||||
<article class="sft-card">
|
|
||||||
<!-- Chips row -->
|
|
||||||
<div class="chips-row">
|
|
||||||
<span class="chip chip-model">{{ item.model_name }}</span>
|
|
||||||
<span class="chip chip-task">{{ item.task_type }}</span>
|
|
||||||
<span class="chip chip-node">{{ item.node_id }} · GPU {{ item.gpu_id }}</span>
|
|
||||||
<span class="chip chip-speed">{{ item.tokens_per_sec.toFixed(1) }} tok/s</span>
|
|
||||||
<span
|
|
||||||
class="chip quality-chip"
|
|
||||||
:class="qualityClass"
|
|
||||||
data-testid="quality-chip"
|
|
||||||
:title="qualityLabel"
|
|
||||||
>
|
|
||||||
{{ item.quality_score.toFixed(2) }} · {{ qualityLabel }}
|
|
||||||
</span>
|
|
||||||
</div>
|
|
||||||
|
|
||||||
<!-- Failure reason -->
|
|
||||||
<p v-if="item.failure_reason" class="failure-reason">{{ item.failure_reason }}</p>
|
|
||||||
|
|
||||||
<!-- Prompt (collapsible) -->
|
|
||||||
<div class="prompt-section">
|
|
||||||
<button
|
|
||||||
class="prompt-toggle"
|
|
||||||
:aria-expanded="promptExpanded"
|
|
||||||
@click="promptExpanded = !promptExpanded"
|
|
||||||
>
|
|
||||||
{{ promptExpanded ? 'Hide prompt ↑' : 'Show full prompt ↓' }}
|
|
||||||
</button>
|
|
||||||
<div v-if="promptExpanded" class="prompt-messages">
|
|
||||||
<div
|
|
||||||
v-for="(msg, i) in item.prompt_messages"
|
|
||||||
:key="i"
|
|
||||||
class="prompt-message"
|
|
||||||
:class="`role-${msg.role}`"
|
|
||||||
>
|
|
||||||
<span class="role-label">{{ msg.role }}</span>
|
|
||||||
<pre class="message-content">{{ msg.content }}</pre>
|
|
||||||
</div>
|
|
||||||
</div>
|
|
||||||
</div>
|
|
||||||
|
|
||||||
<!-- Model response -->
|
|
||||||
<div class="model-response-section">
|
|
||||||
<p class="section-label">Model output (incorrect)</p>
|
|
||||||
<pre class="model-response">{{ item.model_response }}</pre>
|
|
||||||
</div>
|
|
||||||
|
|
||||||
<!-- Action bar -->
|
|
||||||
<div class="action-bar">
|
|
||||||
<button
|
|
||||||
data-testid="correct-btn"
|
|
||||||
class="btn-correct"
|
|
||||||
@click="$emit('correct')"
|
|
||||||
>✓ Correct</button>
|
|
||||||
<button
|
|
||||||
data-testid="discard-btn"
|
|
||||||
class="btn-discard"
|
|
||||||
@click="emitWithCategory('discard')"
|
|
||||||
>✕ Discard</button>
|
|
||||||
<button
|
|
||||||
data-testid="flag-btn"
|
|
||||||
class="btn-flag"
|
|
||||||
@click="emitWithCategory('flag')"
|
|
||||||
>⚑ Flag Model</button>
|
|
||||||
</div>
|
|
||||||
|
|
||||||
<!-- Failure category selector (shown when correcting or acting) -->
|
|
||||||
<div
|
|
||||||
v-if="correcting || pendingAction"
|
|
||||||
class="failure-category-section"
|
|
||||||
data-testid="failure-category-section"
|
|
||||||
>
|
|
||||||
<p class="section-label">Failure category <span class="optional-label">(optional)</span></p>
|
|
||||||
<div class="category-chips" role="group" aria-label="Failure category">
|
|
||||||
<button
|
|
||||||
v-for="cat in FAILURE_CATEGORIES"
|
|
||||||
:key="cat.value"
|
|
||||||
type="button"
|
|
||||||
class="category-chip"
|
|
||||||
:class="{ 'category-chip--active': selectedCategory === cat.value }"
|
|
||||||
:aria-pressed="selectedCategory === cat.value || undefined"
|
|
||||||
:data-testid="'category-chip-' + cat.value"
|
|
||||||
@click="toggleCategory(cat.value)"
|
|
||||||
>{{ cat.label }}</button>
|
|
||||||
</div>
|
|
||||||
|
|
||||||
<!-- Pending discard/flag confirm row -->
|
|
||||||
<div v-if="pendingAction" class="pending-action-row" data-testid="pending-action-row">
|
|
||||||
<button class="btn-confirm" @click="confirmPendingAction" data-testid="confirm-pending-btn">
|
|
||||||
Confirm {{ pendingAction }}
|
|
||||||
</button>
|
|
||||||
<button class="btn-cancel-pending" @click="cancelPendingAction" data-testid="cancel-pending-btn">
|
|
||||||
Cancel
|
|
||||||
</button>
|
|
||||||
</div>
|
|
||||||
</div>
|
|
||||||
|
|
||||||
<!-- Correction area (shown when correcting = true) -->
|
|
||||||
<div v-if="correcting" data-testid="correction-area">
|
|
||||||
<SftCorrectionArea
|
|
||||||
ref="correctionAreaEl"
|
|
||||||
:described-by="'sft-failure-' + item.id"
|
|
||||||
@submit="handleSubmitCorrection"
|
|
||||||
@cancel="$emit('cancel-correction')"
|
|
||||||
/>
|
|
||||||
</div>
|
|
||||||
</article>
|
|
||||||
</template>
|
|
||||||
|
|
||||||
<script setup lang="ts">
|
|
||||||
import { ref, computed } from 'vue'
|
|
||||||
import type { SftQueueItem, SftFailureCategory } from '../stores/sft'
|
|
||||||
import SftCorrectionArea from './SftCorrectionArea.vue'
|
|
||||||
|
|
||||||
const props = defineProps<{ item: SftQueueItem; correcting?: boolean }>()
|
|
||||||
|
|
||||||
const emit = defineEmits<{
|
|
||||||
correct: []
|
|
||||||
discard: [category: SftFailureCategory | null]
|
|
||||||
flag: [category: SftFailureCategory | null]
|
|
||||||
'submit-correction': [text: string, category: SftFailureCategory | null]
|
|
||||||
'cancel-correction': []
|
|
||||||
}>()
|
|
||||||
|
|
||||||
const FAILURE_CATEGORIES: { value: SftFailureCategory; label: string }[] = [
|
|
||||||
{ value: 'scoring_artifact', label: 'Scoring artifact' },
|
|
||||||
{ value: 'style_violation', label: 'Style violation' },
|
|
||||||
{ value: 'partial_answer', label: 'Partial answer' },
|
|
||||||
{ value: 'wrong_answer', label: 'Wrong answer' },
|
|
||||||
{ value: 'format_error', label: 'Format error' },
|
|
||||||
{ value: 'hallucination', label: 'Hallucination' },
|
|
||||||
]
|
|
||||||
|
|
||||||
const promptExpanded = ref(false)
|
|
||||||
const correctionAreaEl = ref<InstanceType<typeof SftCorrectionArea> | null>(null)
|
|
||||||
const selectedCategory = ref<SftFailureCategory | null>(null)
|
|
||||||
const pendingAction = ref<'discard' | 'flag' | null>(null)
|
|
||||||
|
|
||||||
const qualityClass = computed(() => {
|
|
||||||
const s = props.item.quality_score
|
|
||||||
if (s < 0.4) return 'quality-low'
|
|
||||||
if (s < 0.7) return 'quality-mid'
|
|
||||||
return 'quality-ok'
|
|
||||||
})
|
|
||||||
|
|
||||||
const qualityLabel = computed(() => {
|
|
||||||
const s = props.item.quality_score
|
|
||||||
if (s < 0.4) return 'low quality'
|
|
||||||
if (s < 0.7) return 'fair'
|
|
||||||
return 'acceptable'
|
|
||||||
})
|
|
||||||
|
|
||||||
function toggleCategory(cat: SftFailureCategory) {
|
|
||||||
selectedCategory.value = selectedCategory.value === cat ? null : cat
|
|
||||||
}
|
|
||||||
|
|
||||||
function emitWithCategory(action: 'discard' | 'flag') {
|
|
||||||
pendingAction.value = action
|
|
||||||
}
|
|
||||||
|
|
||||||
function confirmPendingAction() {
|
|
||||||
if (!pendingAction.value) return
|
|
||||||
emit(pendingAction.value, selectedCategory.value)
|
|
||||||
pendingAction.value = null
|
|
||||||
selectedCategory.value = null
|
|
||||||
}
|
|
||||||
|
|
||||||
function cancelPendingAction() {
|
|
||||||
pendingAction.value = null
|
|
||||||
}
|
|
||||||
|
|
||||||
function handleSubmitCorrection(text: string) {
|
|
||||||
emit('submit-correction', text, selectedCategory.value)
|
|
||||||
selectedCategory.value = null
|
|
||||||
}
|
|
||||||
|
|
||||||
function resetCorrection() {
|
|
||||||
correctionAreaEl.value?.reset()
|
|
||||||
selectedCategory.value = null
|
|
||||||
pendingAction.value = null
|
|
||||||
}
|
|
||||||
|
|
||||||
defineExpose({ resetCorrection })
|
|
||||||
</script>
|
|
||||||
|
|
||||||
<style scoped>
|
|
||||||
.sft-card {
|
|
||||||
background: var(--color-surface-raised);
|
|
||||||
border: 1px solid var(--color-border);
|
|
||||||
border-radius: var(--radius-lg);
|
|
||||||
padding: var(--space-4);
|
|
||||||
display: flex;
|
|
||||||
flex-direction: column;
|
|
||||||
gap: var(--space-3);
|
|
||||||
}
|
|
||||||
|
|
||||||
.chips-row {
|
|
||||||
display: flex;
|
|
||||||
flex-wrap: wrap;
|
|
||||||
gap: var(--space-2);
|
|
||||||
}
|
|
||||||
|
|
||||||
.chip {
|
|
||||||
padding: var(--space-1) var(--space-2);
|
|
||||||
border-radius: var(--radius-full);
|
|
||||||
font-size: 0.78rem;
|
|
||||||
font-weight: 600;
|
|
||||||
white-space: nowrap;
|
|
||||||
}
|
|
||||||
|
|
||||||
.chip-model { background: var(--color-primary-light, #e8f2e7); color: var(--color-primary); }
|
|
||||||
.chip-task { background: var(--color-surface-alt); color: var(--color-text-muted); }
|
|
||||||
.chip-node { background: var(--color-surface-alt); color: var(--color-text-muted); }
|
|
||||||
.chip-speed { background: var(--color-surface-alt); color: var(--color-text-muted); }
|
|
||||||
|
|
||||||
.quality-chip { color: #fff; }
|
|
||||||
.quality-low { background: var(--color-error, #c0392b); }
|
|
||||||
.quality-mid { background: var(--color-warning, #d4891a); }
|
|
||||||
.quality-ok { background: var(--color-success, #3a7a32); }
|
|
||||||
|
|
||||||
.failure-reason {
|
|
||||||
font-size: 0.82rem;
|
|
||||||
color: var(--color-text-muted);
|
|
||||||
font-style: italic;
|
|
||||||
}
|
|
||||||
|
|
||||||
.prompt-toggle {
|
|
||||||
background: none;
|
|
||||||
border: none;
|
|
||||||
color: var(--color-accent);
|
|
||||||
font-size: 0.85rem;
|
|
||||||
cursor: pointer;
|
|
||||||
padding: 0;
|
|
||||||
text-decoration: underline;
|
|
||||||
}
|
|
||||||
|
|
||||||
.prompt-messages {
|
|
||||||
margin-top: var(--space-2);
|
|
||||||
display: flex;
|
|
||||||
flex-direction: column;
|
|
||||||
gap: var(--space-2);
|
|
||||||
}
|
|
||||||
|
|
||||||
.prompt-message {
|
|
||||||
display: flex;
|
|
||||||
flex-direction: column;
|
|
||||||
gap: var(--space-1);
|
|
||||||
}
|
|
||||||
|
|
||||||
.role-label {
|
|
||||||
font-size: 0.75rem;
|
|
||||||
font-weight: 700;
|
|
||||||
text-transform: uppercase;
|
|
||||||
letter-spacing: 0.05em;
|
|
||||||
color: var(--color-text-muted);
|
|
||||||
}
|
|
||||||
|
|
||||||
.message-content {
|
|
||||||
font-family: var(--font-mono);
|
|
||||||
font-size: 0.82rem;
|
|
||||||
white-space: pre-wrap;
|
|
||||||
background: var(--color-surface-alt);
|
|
||||||
padding: var(--space-2) var(--space-3);
|
|
||||||
border-radius: var(--radius-md);
|
|
||||||
max-height: 200px;
|
|
||||||
overflow-y: auto;
|
|
||||||
}
|
|
||||||
|
|
||||||
.section-label {
|
|
||||||
font-size: 0.82rem;
|
|
||||||
font-weight: 600;
|
|
||||||
color: var(--color-text-muted);
|
|
||||||
margin-bottom: var(--space-1);
|
|
||||||
}
|
|
||||||
|
|
||||||
.model-response {
|
|
||||||
font-family: var(--font-mono);
|
|
||||||
font-size: 0.88rem;
|
|
||||||
white-space: pre-wrap;
|
|
||||||
background: color-mix(in srgb, var(--color-error, #c0392b) 8%, var(--color-surface-alt));
|
|
||||||
border-left: 3px solid var(--color-error, #c0392b);
|
|
||||||
padding: var(--space-3);
|
|
||||||
border-radius: var(--radius-md);
|
|
||||||
max-height: 300px;
|
|
||||||
overflow-y: auto;
|
|
||||||
}
|
|
||||||
|
|
||||||
.action-bar {
|
|
||||||
display: flex;
|
|
||||||
gap: var(--space-3);
|
|
||||||
flex-wrap: wrap;
|
|
||||||
}
|
|
||||||
|
|
||||||
.action-bar button {
|
|
||||||
padding: var(--space-2) var(--space-4);
|
|
||||||
border-radius: var(--radius-md);
|
|
||||||
border: 1px solid var(--color-border);
|
|
||||||
font-size: 0.9rem;
|
|
||||||
cursor: pointer;
|
|
||||||
background: var(--color-surface-raised);
|
|
||||||
color: var(--color-text);
|
|
||||||
}
|
|
||||||
|
|
||||||
.btn-correct { border-color: var(--color-success); color: var(--color-success); }
|
|
||||||
.btn-correct:hover { background: color-mix(in srgb, var(--color-success) 10%, transparent); }
|
|
||||||
|
|
||||||
.btn-discard { border-color: var(--color-error); color: var(--color-error); }
|
|
||||||
.btn-discard:hover { background: color-mix(in srgb, var(--color-error) 10%, transparent); }
|
|
||||||
|
|
||||||
.btn-flag { border-color: var(--color-warning); color: var(--color-warning); }
|
|
||||||
.btn-flag:hover { background: color-mix(in srgb, var(--color-warning) 10%, transparent); }
|
|
||||||
|
|
||||||
/* ── Failure category selector ─────────────────── */
|
|
||||||
.failure-category-section {
|
|
||||||
display: flex;
|
|
||||||
flex-direction: column;
|
|
||||||
gap: var(--space-2);
|
|
||||||
}
|
|
||||||
|
|
||||||
.optional-label {
|
|
||||||
font-size: 0.75rem;
|
|
||||||
font-weight: 400;
|
|
||||||
color: var(--color-text-muted);
|
|
||||||
}
|
|
||||||
|
|
||||||
.category-chips {
|
|
||||||
display: flex;
|
|
||||||
flex-wrap: wrap;
|
|
||||||
gap: var(--space-2);
|
|
||||||
}
|
|
||||||
|
|
||||||
.category-chip {
|
|
||||||
padding: var(--space-1) var(--space-3);
|
|
||||||
border-radius: var(--radius-full);
|
|
||||||
border: 1px solid var(--color-border);
|
|
||||||
background: var(--color-surface-alt);
|
|
||||||
color: var(--color-text-muted);
|
|
||||||
font-size: 0.78rem;
|
|
||||||
font-weight: 500;
|
|
||||||
cursor: pointer;
|
|
||||||
transition: background var(--transition), color var(--transition), border-color var(--transition);
|
|
||||||
}
|
|
||||||
|
|
||||||
.category-chip:hover {
|
|
||||||
border-color: var(--color-accent);
|
|
||||||
color: var(--color-accent);
|
|
||||||
background: var(--color-accent-light);
|
|
||||||
}
|
|
||||||
|
|
||||||
.category-chip--active {
|
|
||||||
background: var(--color-accent-light);
|
|
||||||
border-color: var(--color-accent);
|
|
||||||
color: var(--color-accent);
|
|
||||||
font-weight: 700;
|
|
||||||
}
|
|
||||||
|
|
||||||
.pending-action-row {
|
|
||||||
display: flex;
|
|
||||||
gap: var(--space-2);
|
|
||||||
margin-top: var(--space-1);
|
|
||||||
}
|
|
||||||
|
|
||||||
.btn-confirm {
|
|
||||||
padding: var(--space-1) var(--space-3);
|
|
||||||
border-radius: var(--radius-md);
|
|
||||||
border: 1px solid var(--color-accent);
|
|
||||||
background: var(--color-accent-light);
|
|
||||||
color: var(--color-accent);
|
|
||||||
font-size: 0.85rem;
|
|
||||||
font-weight: 600;
|
|
||||||
cursor: pointer;
|
|
||||||
}
|
|
||||||
|
|
||||||
.btn-confirm:hover {
|
|
||||||
background: color-mix(in srgb, var(--color-accent) 15%, transparent);
|
|
||||||
}
|
|
||||||
|
|
||||||
.btn-cancel-pending {
|
|
||||||
padding: var(--space-1) var(--space-3);
|
|
||||||
border-radius: var(--radius-md);
|
|
||||||
border: 1px solid var(--color-border);
|
|
||||||
background: none;
|
|
||||||
color: var(--color-text-muted);
|
|
||||||
font-size: 0.85rem;
|
|
||||||
cursor: pointer;
|
|
||||||
}
|
|
||||||
|
|
||||||
.btn-cancel-pending:hover {
|
|
||||||
background: var(--color-surface-alt);
|
|
||||||
}
|
|
||||||
</style>
|
|
||||||
|
|
@ -1,68 +0,0 @@
|
||||||
import { mount } from '@vue/test-utils'
|
|
||||||
import SftCorrectionArea from './SftCorrectionArea.vue'
|
|
||||||
import { describe, it, expect } from 'vitest'
|
|
||||||
|
|
||||||
describe('SftCorrectionArea', () => {
|
|
||||||
it('renders a textarea', () => {
|
|
||||||
const w = mount(SftCorrectionArea)
|
|
||||||
expect(w.find('textarea').exists()).toBe(true)
|
|
||||||
})
|
|
||||||
|
|
||||||
it('submit button is disabled when textarea is empty', () => {
|
|
||||||
const w = mount(SftCorrectionArea)
|
|
||||||
const btn = w.find('[data-testid="submit-btn"]')
|
|
||||||
expect((btn.element as HTMLButtonElement).disabled).toBe(true)
|
|
||||||
})
|
|
||||||
|
|
||||||
it('submit button is disabled when textarea is whitespace only', async () => {
|
|
||||||
const w = mount(SftCorrectionArea)
|
|
||||||
await w.find('textarea').setValue(' ')
|
|
||||||
const btn = w.find('[data-testid="submit-btn"]')
|
|
||||||
expect((btn.element as HTMLButtonElement).disabled).toBe(true)
|
|
||||||
})
|
|
||||||
|
|
||||||
it('submit button is enabled when textarea has content', async () => {
|
|
||||||
const w = mount(SftCorrectionArea)
|
|
||||||
await w.find('textarea').setValue('def add(a, b): return a + b')
|
|
||||||
const btn = w.find('[data-testid="submit-btn"]')
|
|
||||||
expect((btn.element as HTMLButtonElement).disabled).toBe(false)
|
|
||||||
})
|
|
||||||
|
|
||||||
it('clicking submit emits submit with trimmed text', async () => {
|
|
||||||
const w = mount(SftCorrectionArea)
|
|
||||||
await w.find('textarea').setValue(' def add(a, b): return a + b ')
|
|
||||||
await w.find('[data-testid="submit-btn"]').trigger('click')
|
|
||||||
expect(w.emitted('submit')?.[0]).toEqual(['def add(a, b): return a + b'])
|
|
||||||
})
|
|
||||||
|
|
||||||
it('clicking cancel emits cancel', async () => {
|
|
||||||
const w = mount(SftCorrectionArea)
|
|
||||||
await w.find('[data-testid="cancel-btn"]').trigger('click')
|
|
||||||
expect(w.emitted('cancel')).toBeTruthy()
|
|
||||||
})
|
|
||||||
|
|
||||||
it('Escape key emits cancel', async () => {
|
|
||||||
const w = mount(SftCorrectionArea)
|
|
||||||
await w.find('textarea').trigger('keydown', { key: 'Escape' })
|
|
||||||
expect(w.emitted('cancel')).toBeTruthy()
|
|
||||||
})
|
|
||||||
|
|
||||||
it('Ctrl+Enter emits submit when text is non-empty', async () => {
|
|
||||||
const w = mount(SftCorrectionArea)
|
|
||||||
await w.find('textarea').setValue('correct answer')
|
|
||||||
await w.find('textarea').trigger('keydown', { key: 'Enter', ctrlKey: true })
|
|
||||||
expect(w.emitted('submit')?.[0]).toEqual(['correct answer'])
|
|
||||||
})
|
|
||||||
|
|
||||||
it('Ctrl+Enter does not emit submit when text is empty', async () => {
|
|
||||||
const w = mount(SftCorrectionArea)
|
|
||||||
await w.find('textarea').trigger('keydown', { key: 'Enter', ctrlKey: true })
|
|
||||||
expect(w.emitted('submit')).toBeFalsy()
|
|
||||||
})
|
|
||||||
|
|
||||||
it('omits aria-describedby when describedBy prop is not provided', () => {
|
|
||||||
const w = mount(SftCorrectionArea)
|
|
||||||
const textarea = w.find('textarea')
|
|
||||||
expect(textarea.attributes('aria-describedby')).toBeUndefined()
|
|
||||||
})
|
|
||||||
})
|
|
||||||
|
|
@ -1,130 +0,0 @@
|
||||||
<template>
|
|
||||||
<div class="correction-area">
|
|
||||||
<label class="correction-label" for="correction-textarea">
|
|
||||||
Write the corrected response:
|
|
||||||
</label>
|
|
||||||
<textarea
|
|
||||||
id="correction-textarea"
|
|
||||||
ref="textareaEl"
|
|
||||||
v-model="text"
|
|
||||||
class="correction-textarea"
|
|
||||||
aria-label="Write corrected response"
|
|
||||||
aria-required="true"
|
|
||||||
:aria-describedby="describedBy || undefined"
|
|
||||||
placeholder="Write the response this model should have given..."
|
|
||||||
rows="4"
|
|
||||||
@keydown.escape="$emit('cancel')"
|
|
||||||
@keydown.enter.ctrl.prevent="submitIfValid"
|
|
||||||
@keydown.enter.meta.prevent="submitIfValid"
|
|
||||||
/>
|
|
||||||
<div class="correction-actions">
|
|
||||||
<button
|
|
||||||
data-testid="submit-btn"
|
|
||||||
class="btn-submit"
|
|
||||||
:disabled="!isValid"
|
|
||||||
@click="submitIfValid"
|
|
||||||
>
|
|
||||||
Submit correction
|
|
||||||
</button>
|
|
||||||
<button data-testid="cancel-btn" class="btn-cancel" @click="$emit('cancel')">
|
|
||||||
Cancel
|
|
||||||
</button>
|
|
||||||
</div>
|
|
||||||
</div>
|
|
||||||
</template>
|
|
||||||
|
|
||||||
<script setup lang="ts">
|
|
||||||
import { ref, computed, onMounted } from 'vue'
|
|
||||||
|
|
||||||
const props = withDefaults(defineProps<{ describedBy?: string }>(), { describedBy: undefined })
|
|
||||||
|
|
||||||
const emit = defineEmits<{ submit: [text: string]; cancel: [] }>()
|
|
||||||
|
|
||||||
const text = ref('')
|
|
||||||
const textareaEl = ref<HTMLTextAreaElement | null>(null)
|
|
||||||
const isValid = computed(() => text.value.trim().length > 0)
|
|
||||||
|
|
||||||
onMounted(() => textareaEl.value?.focus())
|
|
||||||
|
|
||||||
function submitIfValid() {
|
|
||||||
if (isValid.value) emit('submit', text.value.trim())
|
|
||||||
}
|
|
||||||
|
|
||||||
function reset() {
|
|
||||||
text.value = ''
|
|
||||||
}
|
|
||||||
|
|
||||||
defineExpose({ reset })
|
|
||||||
</script>
|
|
||||||
|
|
||||||
<style scoped>
|
|
||||||
.correction-area {
|
|
||||||
display: flex;
|
|
||||||
flex-direction: column;
|
|
||||||
gap: var(--space-3);
|
|
||||||
padding: var(--space-4);
|
|
||||||
border-top: 1px solid var(--color-border);
|
|
||||||
background: var(--color-surface-alt, var(--color-surface));
|
|
||||||
border-radius: 0 0 var(--radius-lg) var(--radius-lg);
|
|
||||||
}
|
|
||||||
|
|
||||||
.correction-label {
|
|
||||||
font-size: 0.85rem;
|
|
||||||
font-weight: 600;
|
|
||||||
color: var(--color-text-muted);
|
|
||||||
}
|
|
||||||
|
|
||||||
.correction-textarea {
|
|
||||||
width: 100%;
|
|
||||||
min-height: 7rem;
|
|
||||||
padding: var(--space-3);
|
|
||||||
border: 1px solid var(--color-border);
|
|
||||||
border-radius: var(--radius-md);
|
|
||||||
background: var(--color-surface-raised);
|
|
||||||
color: var(--color-text);
|
|
||||||
font-family: var(--font-mono);
|
|
||||||
font-size: 0.88rem;
|
|
||||||
line-height: 1.5;
|
|
||||||
resize: vertical;
|
|
||||||
}
|
|
||||||
|
|
||||||
.correction-textarea:focus {
|
|
||||||
outline: 2px solid var(--color-primary);
|
|
||||||
outline-offset: 1px;
|
|
||||||
}
|
|
||||||
|
|
||||||
.correction-actions {
|
|
||||||
display: flex;
|
|
||||||
gap: var(--space-3);
|
|
||||||
align-items: center;
|
|
||||||
}
|
|
||||||
|
|
||||||
.btn-submit {
|
|
||||||
padding: var(--space-2) var(--space-4);
|
|
||||||
background: var(--color-primary);
|
|
||||||
color: var(--color-text-inverse, #fff);
|
|
||||||
border: none;
|
|
||||||
border-radius: var(--radius-md);
|
|
||||||
font-size: 0.9rem;
|
|
||||||
cursor: pointer;
|
|
||||||
}
|
|
||||||
|
|
||||||
.btn-submit:disabled {
|
|
||||||
opacity: 0.45;
|
|
||||||
cursor: not-allowed;
|
|
||||||
}
|
|
||||||
|
|
||||||
.btn-submit:not(:disabled):hover {
|
|
||||||
background: var(--color-primary-hover, var(--color-primary));
|
|
||||||
}
|
|
||||||
|
|
||||||
.btn-cancel {
|
|
||||||
background: none;
|
|
||||||
border: none;
|
|
||||||
color: var(--color-text-muted);
|
|
||||||
font-size: 0.9rem;
|
|
||||||
cursor: pointer;
|
|
||||||
text-decoration: underline;
|
|
||||||
padding: 0;
|
|
||||||
}
|
|
||||||
</style>
|
|
||||||
|
|
@ -1,42 +0,0 @@
|
||||||
// src/composables/useSftKeyboard.ts
|
|
||||||
import { onUnmounted, getCurrentInstance } from 'vue'
|
|
||||||
|
|
||||||
interface Options {
|
|
||||||
onCorrect: () => void
|
|
||||||
onDiscard: () => void
|
|
||||||
onFlag: () => void
|
|
||||||
onEscape: () => void
|
|
||||||
onSubmit: () => void
|
|
||||||
isEditing: () => boolean // returns true when correction area is open
|
|
||||||
}
|
|
||||||
|
|
||||||
export function useSftKeyboard(opts: Options) {
|
|
||||||
function handler(e: KeyboardEvent) {
|
|
||||||
// Never intercept keys when focus is in an input (correction textarea handles its own keys)
|
|
||||||
if (e.target instanceof HTMLInputElement) return
|
|
||||||
|
|
||||||
// When correction area is open, only handle Escape (textarea handles Ctrl+Enter itself)
|
|
||||||
if (e.target instanceof HTMLTextAreaElement) return
|
|
||||||
|
|
||||||
const k = e.key.toLowerCase()
|
|
||||||
|
|
||||||
if (opts.isEditing()) {
|
|
||||||
if (k === 'escape') opts.onEscape()
|
|
||||||
return
|
|
||||||
}
|
|
||||||
|
|
||||||
if (k === 'c') { opts.onCorrect(); return }
|
|
||||||
if (k === 'd') { opts.onDiscard(); return }
|
|
||||||
if (k === 'f') { opts.onFlag(); return }
|
|
||||||
if (k === 'escape') { opts.onEscape(); return }
|
|
||||||
}
|
|
||||||
|
|
||||||
window.addEventListener('keydown', handler)
|
|
||||||
const cleanup = () => window.removeEventListener('keydown', handler)
|
|
||||||
|
|
||||||
if (getCurrentInstance()) {
|
|
||||||
onUnmounted(cleanup)
|
|
||||||
}
|
|
||||||
|
|
||||||
return { cleanup }
|
|
||||||
}
|
|
||||||
|
|
@ -6,8 +6,6 @@ const FetchView = () => import('../views/FetchView.vue')
|
||||||
const StatsView = () => import('../views/StatsView.vue')
|
const StatsView = () => import('../views/StatsView.vue')
|
||||||
const BenchmarkView = () => import('../views/BenchmarkView.vue')
|
const BenchmarkView = () => import('../views/BenchmarkView.vue')
|
||||||
const SettingsView = () => import('../views/SettingsView.vue')
|
const SettingsView = () => import('../views/SettingsView.vue')
|
||||||
const CorrectionsView = () => import('../views/CorrectionsView.vue')
|
|
||||||
const ModelsView = () => import('../views/ModelsView.vue')
|
|
||||||
|
|
||||||
export const router = createRouter({
|
export const router = createRouter({
|
||||||
history: createWebHashHistory(),
|
history: createWebHashHistory(),
|
||||||
|
|
@ -16,8 +14,6 @@ export const router = createRouter({
|
||||||
{ path: '/fetch', component: FetchView, meta: { title: 'Fetch' } },
|
{ path: '/fetch', component: FetchView, meta: { title: 'Fetch' } },
|
||||||
{ path: '/stats', component: StatsView, meta: { title: 'Stats' } },
|
{ path: '/stats', component: StatsView, meta: { title: 'Stats' } },
|
||||||
{ path: '/benchmark', component: BenchmarkView, meta: { title: 'Benchmark' } },
|
{ path: '/benchmark', component: BenchmarkView, meta: { title: 'Benchmark' } },
|
||||||
{ path: '/models', component: ModelsView, meta: { title: 'Models' } },
|
|
||||||
{ path: '/corrections', component: CorrectionsView, meta: { title: 'Corrections' } },
|
|
||||||
{ path: '/settings', component: SettingsView, meta: { title: 'Settings' } },
|
{ path: '/settings', component: SettingsView, meta: { title: 'Settings' } },
|
||||||
],
|
],
|
||||||
})
|
})
|
||||||
|
|
|
||||||
|
|
@ -1,78 +0,0 @@
|
||||||
import { setActivePinia, createPinia } from 'pinia'
|
|
||||||
import { useSftStore } from './sft'
|
|
||||||
import type { SftQueueItem } from './sft'
|
|
||||||
import { beforeEach, describe, it, expect } from 'vitest'
|
|
||||||
|
|
||||||
function makeMockItem(overrides: Partial<SftQueueItem> = {}): SftQueueItem {
|
|
||||||
return {
|
|
||||||
id: 'abc',
|
|
||||||
source: 'cf-orch-benchmark',
|
|
||||||
benchmark_run_id: 'run1',
|
|
||||||
timestamp: '2026-04-07T10:00:00Z',
|
|
||||||
status: 'needs_review',
|
|
||||||
prompt_messages: [
|
|
||||||
{ role: 'system', content: 'You are a coding assistant.' },
|
|
||||||
{ role: 'user', content: 'Write a Python add function.' },
|
|
||||||
],
|
|
||||||
model_response: 'def add(a, b): return a - b',
|
|
||||||
corrected_response: null,
|
|
||||||
quality_score: 0.2,
|
|
||||||
failure_reason: 'pattern_match: 0/2 matched',
|
|
||||||
task_id: 'code-fn',
|
|
||||||
task_type: 'code',
|
|
||||||
task_name: 'Code: Write a Python function',
|
|
||||||
model_id: 'Qwen/Qwen2.5-3B',
|
|
||||||
model_name: 'Qwen2.5-3B',
|
|
||||||
node_id: 'heimdall',
|
|
||||||
gpu_id: 0,
|
|
||||||
tokens_per_sec: 38.4,
|
|
||||||
...overrides,
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
describe('useSftStore', () => {
|
|
||||||
beforeEach(() => setActivePinia(createPinia()))
|
|
||||||
|
|
||||||
it('starts with empty queue', () => {
|
|
||||||
const store = useSftStore()
|
|
||||||
expect(store.queue).toEqual([])
|
|
||||||
expect(store.current).toBeNull()
|
|
||||||
})
|
|
||||||
|
|
||||||
it('current returns first item', () => {
|
|
||||||
const store = useSftStore()
|
|
||||||
store.queue = [makeMockItem()]
|
|
||||||
expect(store.current?.id).toBe('abc')
|
|
||||||
})
|
|
||||||
|
|
||||||
it('removeCurrentFromQueue removes first item', () => {
|
|
||||||
const store = useSftStore()
|
|
||||||
const second = makeMockItem({ id: 'def' })
|
|
||||||
store.queue = [makeMockItem(), second]
|
|
||||||
store.removeCurrentFromQueue()
|
|
||||||
expect(store.queue[0].id).toBe('def')
|
|
||||||
})
|
|
||||||
|
|
||||||
it('restoreItem adds to front of queue', () => {
|
|
||||||
const store = useSftStore()
|
|
||||||
const second = makeMockItem({ id: 'def' })
|
|
||||||
store.queue = [second]
|
|
||||||
store.restoreItem(makeMockItem())
|
|
||||||
expect(store.queue[0].id).toBe('abc')
|
|
||||||
expect(store.queue[1].id).toBe('def')
|
|
||||||
})
|
|
||||||
|
|
||||||
it('setLastAction records the action', () => {
|
|
||||||
const store = useSftStore()
|
|
||||||
store.setLastAction('discard', makeMockItem())
|
|
||||||
expect(store.lastAction?.type).toBe('discard')
|
|
||||||
expect(store.lastAction?.item.id).toBe('abc')
|
|
||||||
})
|
|
||||||
|
|
||||||
it('clearLastAction nulls lastAction', () => {
|
|
||||||
const store = useSftStore()
|
|
||||||
store.setLastAction('flag', makeMockItem())
|
|
||||||
store.clearLastAction()
|
|
||||||
expect(store.lastAction).toBeNull()
|
|
||||||
})
|
|
||||||
})
|
|
||||||
|
|
@ -1,72 +0,0 @@
|
||||||
// src/stores/sft.ts
|
|
||||||
import { defineStore } from 'pinia'
|
|
||||||
import { computed, ref } from 'vue'
|
|
||||||
|
|
||||||
export type SftFailureCategory =
|
|
||||||
| 'scoring_artifact'
|
|
||||||
| 'style_violation'
|
|
||||||
| 'partial_answer'
|
|
||||||
| 'wrong_answer'
|
|
||||||
| 'format_error'
|
|
||||||
| 'hallucination'
|
|
||||||
|
|
||||||
export interface SftQueueItem {
|
|
||||||
id: string
|
|
||||||
source: 'cf-orch-benchmark'
|
|
||||||
benchmark_run_id: string
|
|
||||||
timestamp: string
|
|
||||||
status: 'needs_review' | 'approved' | 'discarded' | 'model_rejected'
|
|
||||||
prompt_messages: { role: string; content: string }[]
|
|
||||||
model_response: string
|
|
||||||
corrected_response: string | null
|
|
||||||
quality_score: number // 0.0 to 1.0
|
|
||||||
failure_reason: string | null
|
|
||||||
failure_category: SftFailureCategory | null
|
|
||||||
task_id: string
|
|
||||||
task_type: string
|
|
||||||
task_name: string
|
|
||||||
model_id: string
|
|
||||||
model_name: string
|
|
||||||
node_id: string
|
|
||||||
gpu_id: number
|
|
||||||
tokens_per_sec: number
|
|
||||||
}
|
|
||||||
|
|
||||||
export interface SftLastAction {
|
|
||||||
type: 'correct' | 'discard' | 'flag'
|
|
||||||
item: SftQueueItem
|
|
||||||
failure_category?: SftFailureCategory | null
|
|
||||||
}
|
|
||||||
|
|
||||||
export const useSftStore = defineStore('sft', () => {
|
|
||||||
const queue = ref<SftQueueItem[]>([])
|
|
||||||
const totalRemaining = ref(0)
|
|
||||||
const lastAction = ref<SftLastAction | null>(null)
|
|
||||||
|
|
||||||
const current = computed(() => queue.value[0] ?? null)
|
|
||||||
|
|
||||||
function removeCurrentFromQueue() {
|
|
||||||
queue.value.shift()
|
|
||||||
}
|
|
||||||
|
|
||||||
function setLastAction(
|
|
||||||
type: SftLastAction['type'],
|
|
||||||
item: SftQueueItem,
|
|
||||||
failure_category?: SftFailureCategory | null,
|
|
||||||
) {
|
|
||||||
lastAction.value = { type, item, failure_category }
|
|
||||||
}
|
|
||||||
|
|
||||||
function clearLastAction() {
|
|
||||||
lastAction.value = null
|
|
||||||
}
|
|
||||||
|
|
||||||
function restoreItem(item: SftQueueItem) {
|
|
||||||
queue.value.unshift(item)
|
|
||||||
}
|
|
||||||
|
|
||||||
return {
|
|
||||||
queue, totalRemaining, lastAction, current,
|
|
||||||
removeCurrentFromQueue, setLastAction, clearLastAction, restoreItem,
|
|
||||||
}
|
|
||||||
})
|
|
||||||
|
|
@ -24,54 +24,6 @@
|
||||||
</div>
|
</div>
|
||||||
</header>
|
</header>
|
||||||
|
|
||||||
<!-- Model Picker -->
|
|
||||||
<details class="model-picker" ref="pickerEl">
|
|
||||||
<summary class="picker-summary">
|
|
||||||
<span class="picker-title">🎯 Model Selection</span>
|
|
||||||
<span class="picker-badge">{{ pickerSummaryText }}</span>
|
|
||||||
</summary>
|
|
||||||
<div class="picker-body">
|
|
||||||
<div v-if="modelsLoading" class="picker-loading">Loading models…</div>
|
|
||||||
<div v-else-if="Object.keys(modelCategories).length === 0" class="picker-empty">
|
|
||||||
No models found — check API connection.
|
|
||||||
</div>
|
|
||||||
<template v-else>
|
|
||||||
<div
|
|
||||||
v-for="(models, category) in modelCategories"
|
|
||||||
:key="category"
|
|
||||||
class="picker-category"
|
|
||||||
>
|
|
||||||
<label class="picker-cat-header">
|
|
||||||
<input
|
|
||||||
type="checkbox"
|
|
||||||
:checked="isCategoryAllSelected(models)"
|
|
||||||
:indeterminate="isCategoryIndeterminate(models)"
|
|
||||||
@change="toggleCategory(models, ($event.target as HTMLInputElement).checked)"
|
|
||||||
/>
|
|
||||||
<span class="picker-cat-name">{{ category }}</span>
|
|
||||||
<span class="picker-cat-count">({{ models.length }})</span>
|
|
||||||
</label>
|
|
||||||
<div v-if="models.length === 0" class="picker-no-models">No models installed</div>
|
|
||||||
<div v-else class="picker-model-list">
|
|
||||||
<label
|
|
||||||
v-for="m in models"
|
|
||||||
:key="m.name"
|
|
||||||
class="picker-model-row"
|
|
||||||
>
|
|
||||||
<input
|
|
||||||
type="checkbox"
|
|
||||||
:checked="selectedModels.has(m.name)"
|
|
||||||
@change="toggleModel(m.name, ($event.target as HTMLInputElement).checked)"
|
|
||||||
/>
|
|
||||||
<span class="picker-model-name" :title="m.repo_id ?? m.name">{{ m.name }}</span>
|
|
||||||
<span class="picker-adapter-type">{{ m.adapter_type }}</span>
|
|
||||||
</label>
|
|
||||||
</div>
|
|
||||||
</div>
|
|
||||||
</template>
|
|
||||||
</div>
|
|
||||||
</details>
|
|
||||||
|
|
||||||
<!-- Trained models badge row -->
|
<!-- Trained models badge row -->
|
||||||
<div v-if="fineTunedModels.length > 0" class="trained-models-row">
|
<div v-if="fineTunedModels.length > 0" class="trained-models-row">
|
||||||
<span class="trained-label">Trained:</span>
|
<span class="trained-label">Trained:</span>
|
||||||
|
|
@ -272,16 +224,6 @@ const LABEL_META: Record<string, { emoji: string }> = {
|
||||||
}
|
}
|
||||||
|
|
||||||
// ── Types ────────────────────────────────────────────────────────────────────
|
// ── Types ────────────────────────────────────────────────────────────────────
|
||||||
interface AvailableModel {
|
|
||||||
name: string
|
|
||||||
repo_id?: string
|
|
||||||
adapter_type: string
|
|
||||||
}
|
|
||||||
|
|
||||||
interface ModelCategoriesResponse {
|
|
||||||
categories: Record<string, AvailableModel[]>
|
|
||||||
}
|
|
||||||
|
|
||||||
interface FineTunedModel {
|
interface FineTunedModel {
|
||||||
name: string
|
name: string
|
||||||
base_model_id?: string
|
base_model_id?: string
|
||||||
|
|
@ -312,13 +254,6 @@ const runError = ref('')
|
||||||
const includeSlow = ref(false)
|
const includeSlow = ref(false)
|
||||||
const logEl = ref<HTMLElement | null>(null)
|
const logEl = ref<HTMLElement | null>(null)
|
||||||
|
|
||||||
// Model picker state
|
|
||||||
const modelCategories = ref<Record<string, AvailableModel[]>>({})
|
|
||||||
const selectedModels = ref<Set<string>>(new Set())
|
|
||||||
const allModels = ref<string[]>([])
|
|
||||||
const modelsLoading = ref(false)
|
|
||||||
const pickerEl = ref<HTMLDetailsElement | null>(null)
|
|
||||||
|
|
||||||
// Fine-tune state
|
// Fine-tune state
|
||||||
const fineTunedModels = ref<FineTunedModel[]>([])
|
const fineTunedModels = ref<FineTunedModel[]>([])
|
||||||
const ftModel = ref('deberta-small')
|
const ftModel = ref('deberta-small')
|
||||||
|
|
@ -339,52 +274,6 @@ async function cancelFinetune() {
|
||||||
await fetch('/api/finetune/cancel', { method: 'POST' }).catch(() => {})
|
await fetch('/api/finetune/cancel', { method: 'POST' }).catch(() => {})
|
||||||
}
|
}
|
||||||
|
|
||||||
// ── Model picker computed ─────────────────────────────────────────────────────
|
|
||||||
const pickerSummaryText = computed(() => {
|
|
||||||
const total = allModels.value.length
|
|
||||||
if (total === 0) return 'No models available'
|
|
||||||
const selected = selectedModels.value.size
|
|
||||||
if (selected === total) return `All models (${total})`
|
|
||||||
return `${selected} of ${total} selected`
|
|
||||||
})
|
|
||||||
|
|
||||||
function isCategoryAllSelected(models: AvailableModel[]): boolean {
|
|
||||||
return models.length > 0 && models.every(m => selectedModels.value.has(m.name))
|
|
||||||
}
|
|
||||||
|
|
||||||
function isCategoryIndeterminate(models: AvailableModel[]): boolean {
|
|
||||||
const someSelected = models.some(m => selectedModels.value.has(m.name))
|
|
||||||
return someSelected && !isCategoryAllSelected(models)
|
|
||||||
}
|
|
||||||
|
|
||||||
function toggleModel(name: string, checked: boolean) {
|
|
||||||
const next = new Set(selectedModels.value)
|
|
||||||
if (checked) next.add(name)
|
|
||||||
else next.delete(name)
|
|
||||||
selectedModels.value = next
|
|
||||||
}
|
|
||||||
|
|
||||||
function toggleCategory(models: AvailableModel[], checked: boolean) {
|
|
||||||
const next = new Set(selectedModels.value)
|
|
||||||
for (const m of models) {
|
|
||||||
if (checked) next.add(m.name)
|
|
||||||
else next.delete(m.name)
|
|
||||||
}
|
|
||||||
selectedModels.value = next
|
|
||||||
}
|
|
||||||
|
|
||||||
async function loadModelCategories() {
|
|
||||||
modelsLoading.value = true
|
|
||||||
const { data } = await useApiFetch<ModelCategoriesResponse>('/api/benchmark/models')
|
|
||||||
modelsLoading.value = false
|
|
||||||
if (data?.categories) {
|
|
||||||
modelCategories.value = data.categories
|
|
||||||
const flat = Object.values(data.categories).flat().map(m => m.name)
|
|
||||||
allModels.value = flat
|
|
||||||
selectedModels.value = new Set(flat)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// ── Derived ──────────────────────────────────────────────────────────────────
|
// ── Derived ──────────────────────────────────────────────────────────────────
|
||||||
const modelNames = computed(() => Object.keys(results.value?.models ?? {}))
|
const modelNames = computed(() => Object.keys(results.value?.models ?? {}))
|
||||||
const modelCount = computed(() => modelNames.value.length)
|
const modelCount = computed(() => modelNames.value.length)
|
||||||
|
|
@ -466,16 +355,7 @@ function startBenchmark() {
|
||||||
runError.value = ''
|
runError.value = ''
|
||||||
runCancelled.value = false
|
runCancelled.value = false
|
||||||
|
|
||||||
const params = new URLSearchParams()
|
const url = `/api/benchmark/run${includeSlow.value ? '?include_slow=true' : ''}`
|
||||||
if (includeSlow.value) params.set('include_slow', 'true')
|
|
||||||
// Only send model_names when a subset is selected (not all, not none)
|
|
||||||
const total = allModels.value.length
|
|
||||||
const selected = selectedModels.value.size
|
|
||||||
if (total > 0 && selected > 0 && selected < total) {
|
|
||||||
params.set('model_names', [...selectedModels.value].join(','))
|
|
||||||
}
|
|
||||||
const qs = params.toString()
|
|
||||||
const url = `/api/benchmark/run${qs ? `?${qs}` : ''}`
|
|
||||||
useApiSSE(
|
useApiSSE(
|
||||||
url,
|
url,
|
||||||
async (event) => {
|
async (event) => {
|
||||||
|
|
@ -547,7 +427,6 @@ function startFinetune() {
|
||||||
onMounted(() => {
|
onMounted(() => {
|
||||||
loadResults()
|
loadResults()
|
||||||
loadFineTunedModels()
|
loadFineTunedModels()
|
||||||
loadModelCategories()
|
|
||||||
})
|
})
|
||||||
</script>
|
</script>
|
||||||
|
|
||||||
|
|
@ -883,134 +762,6 @@ onMounted(() => {
|
||||||
font-weight: 700;
|
font-weight: 700;
|
||||||
}
|
}
|
||||||
|
|
||||||
/* ── Model Picker ───────────────────────────────────────── */
|
|
||||||
.model-picker {
|
|
||||||
border: 1px solid var(--color-border, #d0d7e8);
|
|
||||||
border-radius: 0.5rem;
|
|
||||||
overflow: hidden;
|
|
||||||
}
|
|
||||||
|
|
||||||
.picker-summary {
|
|
||||||
display: flex;
|
|
||||||
align-items: center;
|
|
||||||
gap: 0.6rem;
|
|
||||||
padding: 0.65rem 0.9rem;
|
|
||||||
cursor: pointer;
|
|
||||||
user-select: none;
|
|
||||||
list-style: none;
|
|
||||||
background: var(--color-surface-raised, #e4ebf5);
|
|
||||||
}
|
|
||||||
.picker-summary::-webkit-details-marker { display: none; }
|
|
||||||
.picker-summary::before { content: '▶ '; font-size: 0.65rem; color: var(--color-text-secondary, #6b7a99); }
|
|
||||||
details[open] .picker-summary::before { content: '▼ '; }
|
|
||||||
|
|
||||||
.picker-title {
|
|
||||||
font-size: 0.9rem;
|
|
||||||
font-weight: 600;
|
|
||||||
color: var(--color-text, #1a2338);
|
|
||||||
}
|
|
||||||
|
|
||||||
.picker-badge {
|
|
||||||
font-size: 0.75rem;
|
|
||||||
color: var(--color-text-secondary, #6b7a99);
|
|
||||||
background: var(--color-surface, #fff);
|
|
||||||
border: 1px solid var(--color-border, #d0d7e8);
|
|
||||||
padding: 0.15rem 0.5rem;
|
|
||||||
border-radius: 1rem;
|
|
||||||
font-family: var(--font-mono, monospace);
|
|
||||||
margin-left: auto;
|
|
||||||
}
|
|
||||||
|
|
||||||
.picker-body {
|
|
||||||
padding: 0.75rem;
|
|
||||||
border-top: 1px solid var(--color-border, #d0d7e8);
|
|
||||||
display: flex;
|
|
||||||
flex-direction: column;
|
|
||||||
gap: 0.75rem;
|
|
||||||
}
|
|
||||||
|
|
||||||
.picker-loading,
|
|
||||||
.picker-empty {
|
|
||||||
font-size: 0.85rem;
|
|
||||||
color: var(--color-text-secondary, #6b7a99);
|
|
||||||
padding: 0.5rem 0;
|
|
||||||
}
|
|
||||||
|
|
||||||
.picker-category {
|
|
||||||
display: flex;
|
|
||||||
flex-direction: column;
|
|
||||||
gap: 0.3rem;
|
|
||||||
}
|
|
||||||
|
|
||||||
.picker-cat-header {
|
|
||||||
display: flex;
|
|
||||||
align-items: center;
|
|
||||||
gap: 0.45rem;
|
|
||||||
font-size: 0.82rem;
|
|
||||||
font-weight: 700;
|
|
||||||
color: var(--color-text, #1a2338);
|
|
||||||
text-transform: uppercase;
|
|
||||||
letter-spacing: 0.04em;
|
|
||||||
cursor: pointer;
|
|
||||||
}
|
|
||||||
|
|
||||||
.picker-cat-count {
|
|
||||||
font-weight: 400;
|
|
||||||
color: var(--color-text-secondary, #6b7a99);
|
|
||||||
font-family: var(--font-mono, monospace);
|
|
||||||
font-size: 0.75rem;
|
|
||||||
text-transform: none;
|
|
||||||
letter-spacing: 0;
|
|
||||||
}
|
|
||||||
|
|
||||||
.picker-no-models {
|
|
||||||
font-size: 0.78rem;
|
|
||||||
color: var(--color-text-secondary, #6b7a99);
|
|
||||||
opacity: 0.65;
|
|
||||||
padding-left: 1.4rem;
|
|
||||||
font-style: italic;
|
|
||||||
}
|
|
||||||
|
|
||||||
.picker-model-list {
|
|
||||||
display: flex;
|
|
||||||
flex-wrap: wrap;
|
|
||||||
gap: 0.35rem 0.75rem;
|
|
||||||
padding-left: 1.4rem;
|
|
||||||
}
|
|
||||||
|
|
||||||
.picker-model-row {
|
|
||||||
display: flex;
|
|
||||||
align-items: center;
|
|
||||||
gap: 0.35rem;
|
|
||||||
font-size: 0.82rem;
|
|
||||||
cursor: pointer;
|
|
||||||
color: var(--color-text, #1a2338);
|
|
||||||
}
|
|
||||||
|
|
||||||
.picker-model-name {
|
|
||||||
font-family: var(--font-mono, monospace);
|
|
||||||
font-size: 0.78rem;
|
|
||||||
white-space: nowrap;
|
|
||||||
max-width: 18ch;
|
|
||||||
overflow: hidden;
|
|
||||||
text-overflow: ellipsis;
|
|
||||||
}
|
|
||||||
|
|
||||||
.picker-adapter-type {
|
|
||||||
font-size: 0.68rem;
|
|
||||||
color: var(--color-text-secondary, #6b7a99);
|
|
||||||
background: var(--color-surface-raised, #e4ebf5);
|
|
||||||
border: 1px solid var(--color-border, #d0d7e8);
|
|
||||||
border-radius: 0.25rem;
|
|
||||||
padding: 0.05rem 0.3rem;
|
|
||||||
font-family: var(--font-mono, monospace);
|
|
||||||
}
|
|
||||||
|
|
||||||
@media (max-width: 600px) {
|
|
||||||
.picker-model-list { padding-left: 0; }
|
|
||||||
.picker-model-name { max-width: 14ch; }
|
|
||||||
}
|
|
||||||
|
|
||||||
/* ── Fine-tune section ──────────────────────────────────── */
|
/* ── Fine-tune section ──────────────────────────────────── */
|
||||||
.ft-section {
|
.ft-section {
|
||||||
border: 1px solid var(--color-border, #d0d7e8);
|
border: 1px solid var(--color-border, #d0d7e8);
|
||||||
|
|
|
||||||
|
|
@ -1,328 +0,0 @@
|
||||||
<template>
|
|
||||||
<div class="corrections-view">
|
|
||||||
<header class="cv-header">
|
|
||||||
<span class="queue-count">
|
|
||||||
<template v-if="loading">Loading…</template>
|
|
||||||
<template v-else-if="store.totalRemaining > 0">
|
|
||||||
{{ store.totalRemaining }} remaining
|
|
||||||
</template>
|
|
||||||
<span v-else class="queue-empty-label">All caught up</span>
|
|
||||||
</span>
|
|
||||||
<div class="header-actions">
|
|
||||||
<button @click="handleUndo" :disabled="!store.lastAction" class="btn-action">↩ Undo</button>
|
|
||||||
</div>
|
|
||||||
</header>
|
|
||||||
|
|
||||||
<!-- States -->
|
|
||||||
<div v-if="loading" class="skeleton-card" aria-label="Loading candidates" />
|
|
||||||
|
|
||||||
<div v-else-if="apiError" class="error-display" role="alert">
|
|
||||||
<p>Couldn't reach Avocet API.</p>
|
|
||||||
<button @click="fetchBatch" class="btn-action">Retry</button>
|
|
||||||
</div>
|
|
||||||
|
|
||||||
<div v-else-if="!store.current" class="empty-state">
|
|
||||||
<p>No candidates need review.</p>
|
|
||||||
<p class="empty-hint">Import a benchmark run from the Settings tab to get started.</p>
|
|
||||||
</div>
|
|
||||||
|
|
||||||
<template v-else>
|
|
||||||
<div class="card-wrapper">
|
|
||||||
<SftCard
|
|
||||||
:item="store.current"
|
|
||||||
:correcting="correcting"
|
|
||||||
@correct="startCorrection"
|
|
||||||
@discard="handleDiscard"
|
|
||||||
@flag="handleFlag"
|
|
||||||
@submit-correction="handleCorrect"
|
|
||||||
@cancel-correction="correcting = false"
|
|
||||||
ref="sftCardEl"
|
|
||||||
/>
|
|
||||||
</div>
|
|
||||||
</template>
|
|
||||||
|
|
||||||
<!-- Stats footer -->
|
|
||||||
<footer v-if="stats" class="stats-footer">
|
|
||||||
<span class="stat">✓ {{ stats.by_status?.approved ?? 0 }} approved</span>
|
|
||||||
<span class="stat">✕ {{ stats.by_status?.discarded ?? 0 }} discarded</span>
|
|
||||||
<span class="stat">⚑ {{ stats.by_status?.model_rejected ?? 0 }} flagged</span>
|
|
||||||
<a
|
|
||||||
v-if="(stats.export_ready ?? 0) > 0"
|
|
||||||
:href="exportUrl"
|
|
||||||
download
|
|
||||||
class="btn-export"
|
|
||||||
>
|
|
||||||
⬇ Export {{ stats.export_ready }} corrections
|
|
||||||
</a>
|
|
||||||
</footer>
|
|
||||||
|
|
||||||
<!-- Undo toast (inline — UndoToast.vue uses label store's LastAction shape, not SFT's) -->
|
|
||||||
<div v-if="store.lastAction" class="undo-toast">
|
|
||||||
<span>Last: {{ store.lastAction.type }}</span>
|
|
||||||
<button @click="handleUndo" class="btn-undo">↩ Undo</button>
|
|
||||||
<button @click="store.clearLastAction()" class="btn-dismiss">✕</button>
|
|
||||||
</div>
|
|
||||||
</div>
|
|
||||||
</template>
|
|
||||||
|
|
||||||
<script setup lang="ts">
|
|
||||||
import { ref, onMounted } from 'vue'
|
|
||||||
import { useSftStore } from '../stores/sft'
|
|
||||||
import type { SftFailureCategory } from '../stores/sft'
|
|
||||||
import { useSftKeyboard } from '../composables/useSftKeyboard'
|
|
||||||
import SftCard from '../components/SftCard.vue'
|
|
||||||
|
|
||||||
const store = useSftStore()
|
|
||||||
const loading = ref(false)
|
|
||||||
const apiError = ref(false)
|
|
||||||
const correcting = ref(false)
|
|
||||||
const stats = ref<Record<string, any> | null>(null)
|
|
||||||
const exportUrl = '/api/sft/export'
|
|
||||||
const sftCardEl = ref<InstanceType<typeof SftCard> | null>(null)
|
|
||||||
|
|
||||||
useSftKeyboard({
|
|
||||||
onCorrect: () => { if (store.current && !correcting.value) correcting.value = true },
|
|
||||||
onDiscard: () => { if (store.current && !correcting.value) handleDiscard() },
|
|
||||||
onFlag: () => { if (store.current && !correcting.value) handleFlag() },
|
|
||||||
onEscape: () => { correcting.value = false },
|
|
||||||
onSubmit: () => {},
|
|
||||||
isEditing: () => correcting.value,
|
|
||||||
})
|
|
||||||
|
|
||||||
async function fetchBatch() {
|
|
||||||
loading.value = true
|
|
||||||
apiError.value = false
|
|
||||||
try {
|
|
||||||
const res = await fetch('/api/sft/queue?per_page=20')
|
|
||||||
if (!res.ok) throw new Error('API error')
|
|
||||||
const data = await res.json()
|
|
||||||
store.queue = data.items
|
|
||||||
store.totalRemaining = data.total
|
|
||||||
} catch {
|
|
||||||
apiError.value = true
|
|
||||||
} finally {
|
|
||||||
loading.value = false
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
async function fetchStats() {
|
|
||||||
try {
|
|
||||||
const res = await fetch('/api/sft/stats')
|
|
||||||
if (res.ok) stats.value = await res.json()
|
|
||||||
} catch { /* ignore */ }
|
|
||||||
}
|
|
||||||
|
|
||||||
function startCorrection() {
|
|
||||||
correcting.value = true
|
|
||||||
}
|
|
||||||
|
|
||||||
async function handleCorrect(text: string, category: SftFailureCategory | null = null) {
|
|
||||||
if (!store.current) return
|
|
||||||
const item = store.current
|
|
||||||
correcting.value = false
|
|
||||||
try {
|
|
||||||
const body: Record<string, unknown> = { id: item.id, action: 'correct', corrected_response: text }
|
|
||||||
if (category != null) body.failure_category = category
|
|
||||||
const res = await fetch('/api/sft/submit', {
|
|
||||||
method: 'POST',
|
|
||||||
headers: { 'Content-Type': 'application/json' },
|
|
||||||
body: JSON.stringify(body),
|
|
||||||
})
|
|
||||||
if (!res.ok) throw new Error(`HTTP ${res.status}`)
|
|
||||||
store.removeCurrentFromQueue()
|
|
||||||
store.setLastAction('correct', item, category)
|
|
||||||
store.totalRemaining = Math.max(0, store.totalRemaining - 1)
|
|
||||||
fetchStats()
|
|
||||||
if (store.queue.length < 5) fetchBatch()
|
|
||||||
} catch (err) {
|
|
||||||
console.error('handleCorrect failed:', err)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
async function handleDiscard(category: SftFailureCategory | null = null) {
|
|
||||||
if (!store.current) return
|
|
||||||
const item = store.current
|
|
||||||
try {
|
|
||||||
const body: Record<string, unknown> = { id: item.id, action: 'discard' }
|
|
||||||
if (category != null) body.failure_category = category
|
|
||||||
const res = await fetch('/api/sft/submit', {
|
|
||||||
method: 'POST',
|
|
||||||
headers: { 'Content-Type': 'application/json' },
|
|
||||||
body: JSON.stringify(body),
|
|
||||||
})
|
|
||||||
if (!res.ok) throw new Error(`HTTP ${res.status}`)
|
|
||||||
store.removeCurrentFromQueue()
|
|
||||||
store.setLastAction('discard', item, category)
|
|
||||||
store.totalRemaining = Math.max(0, store.totalRemaining - 1)
|
|
||||||
fetchStats()
|
|
||||||
if (store.queue.length < 5) fetchBatch()
|
|
||||||
} catch (err) {
|
|
||||||
console.error('handleDiscard failed:', err)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
async function handleFlag(category: SftFailureCategory | null = null) {
|
|
||||||
if (!store.current) return
|
|
||||||
const item = store.current
|
|
||||||
try {
|
|
||||||
const body: Record<string, unknown> = { id: item.id, action: 'flag' }
|
|
||||||
if (category != null) body.failure_category = category
|
|
||||||
const res = await fetch('/api/sft/submit', {
|
|
||||||
method: 'POST',
|
|
||||||
headers: { 'Content-Type': 'application/json' },
|
|
||||||
body: JSON.stringify(body),
|
|
||||||
})
|
|
||||||
if (!res.ok) throw new Error(`HTTP ${res.status}`)
|
|
||||||
store.removeCurrentFromQueue()
|
|
||||||
store.setLastAction('flag', item, category)
|
|
||||||
store.totalRemaining = Math.max(0, store.totalRemaining - 1)
|
|
||||||
fetchStats()
|
|
||||||
if (store.queue.length < 5) fetchBatch()
|
|
||||||
} catch (err) {
|
|
||||||
console.error('handleFlag failed:', err)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
async function handleUndo() {
|
|
||||||
if (!store.lastAction) return
|
|
||||||
const action = store.lastAction
|
|
||||||
const { item } = action
|
|
||||||
try {
|
|
||||||
const res = await fetch('/api/sft/undo', {
|
|
||||||
method: 'POST',
|
|
||||||
headers: { 'Content-Type': 'application/json' },
|
|
||||||
body: JSON.stringify({ id: item.id }),
|
|
||||||
})
|
|
||||||
if (!res.ok) throw new Error(`HTTP ${res.status}`)
|
|
||||||
store.restoreItem(item)
|
|
||||||
store.totalRemaining++
|
|
||||||
store.clearLastAction()
|
|
||||||
fetchStats()
|
|
||||||
} catch (err) {
|
|
||||||
// Backend did not restore — clear the undo UI without restoring queue state
|
|
||||||
console.error('handleUndo failed:', err)
|
|
||||||
store.clearLastAction()
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
onMounted(() => {
|
|
||||||
fetchBatch()
|
|
||||||
fetchStats()
|
|
||||||
})
|
|
||||||
</script>
|
|
||||||
|
|
||||||
<style scoped>
|
|
||||||
.corrections-view {
|
|
||||||
display: flex;
|
|
||||||
flex-direction: column;
|
|
||||||
min-height: 100dvh;
|
|
||||||
padding: var(--space-4);
|
|
||||||
gap: var(--space-4);
|
|
||||||
max-width: 760px;
|
|
||||||
margin: 0 auto;
|
|
||||||
}
|
|
||||||
|
|
||||||
.cv-header {
|
|
||||||
display: flex;
|
|
||||||
justify-content: space-between;
|
|
||||||
align-items: center;
|
|
||||||
}
|
|
||||||
|
|
||||||
.queue-count {
|
|
||||||
font-size: 1rem;
|
|
||||||
font-weight: 600;
|
|
||||||
color: var(--color-text);
|
|
||||||
}
|
|
||||||
|
|
||||||
.queue-empty-label { color: var(--color-text-muted); }
|
|
||||||
|
|
||||||
.btn-action {
|
|
||||||
padding: var(--space-2) var(--space-3);
|
|
||||||
border: 1px solid var(--color-border);
|
|
||||||
border-radius: var(--radius-md);
|
|
||||||
background: var(--color-surface-raised);
|
|
||||||
cursor: pointer;
|
|
||||||
font-size: 0.88rem;
|
|
||||||
}
|
|
||||||
|
|
||||||
.btn-action:disabled { opacity: 0.4; cursor: not-allowed; }
|
|
||||||
|
|
||||||
.skeleton-card {
|
|
||||||
height: 320px;
|
|
||||||
background: var(--color-surface-alt);
|
|
||||||
border-radius: var(--radius-lg);
|
|
||||||
animation: pulse 1.5s ease-in-out infinite;
|
|
||||||
}
|
|
||||||
|
|
||||||
@keyframes pulse {
|
|
||||||
0%, 100% { opacity: 1; }
|
|
||||||
50% { opacity: 0.5; }
|
|
||||||
}
|
|
||||||
|
|
||||||
@media (prefers-reduced-motion: reduce) {
|
|
||||||
.skeleton-card { animation: none; }
|
|
||||||
}
|
|
||||||
|
|
||||||
.error-display, .empty-state {
|
|
||||||
text-align: center;
|
|
||||||
padding: var(--space-12);
|
|
||||||
color: var(--color-text-muted);
|
|
||||||
}
|
|
||||||
|
|
||||||
.empty-hint { font-size: 0.88rem; margin-top: var(--space-2); }
|
|
||||||
|
|
||||||
.stats-footer {
|
|
||||||
display: flex;
|
|
||||||
gap: var(--space-4);
|
|
||||||
align-items: center;
|
|
||||||
flex-wrap: wrap;
|
|
||||||
padding: var(--space-3) 0;
|
|
||||||
border-top: 1px solid var(--color-border-light);
|
|
||||||
font-size: 0.85rem;
|
|
||||||
color: var(--color-text-muted);
|
|
||||||
}
|
|
||||||
|
|
||||||
.btn-export {
|
|
||||||
margin-left: auto;
|
|
||||||
padding: var(--space-2) var(--space-3);
|
|
||||||
background: var(--color-primary);
|
|
||||||
color: var(--color-text-inverse, #fff);
|
|
||||||
border-radius: var(--radius-md);
|
|
||||||
text-decoration: none;
|
|
||||||
font-size: 0.88rem;
|
|
||||||
}
|
|
||||||
|
|
||||||
.undo-toast {
|
|
||||||
position: fixed;
|
|
||||||
bottom: var(--space-6);
|
|
||||||
left: 50%;
|
|
||||||
transform: translateX(-50%);
|
|
||||||
display: flex;
|
|
||||||
align-items: center;
|
|
||||||
gap: var(--space-3);
|
|
||||||
background: var(--color-surface-raised);
|
|
||||||
border: 1px solid var(--color-border);
|
|
||||||
border-radius: var(--radius-md);
|
|
||||||
padding: var(--space-3) var(--space-4);
|
|
||||||
box-shadow: 0 4px 12px rgba(0,0,0,0.15);
|
|
||||||
font-size: 0.9rem;
|
|
||||||
}
|
|
||||||
|
|
||||||
.btn-undo {
|
|
||||||
background: var(--color-primary);
|
|
||||||
color: var(--color-text-inverse, #fff);
|
|
||||||
border: none;
|
|
||||||
border-radius: var(--radius-sm);
|
|
||||||
padding: var(--space-1) var(--space-3);
|
|
||||||
cursor: pointer;
|
|
||||||
font-size: 0.88rem;
|
|
||||||
}
|
|
||||||
|
|
||||||
.btn-dismiss {
|
|
||||||
background: none;
|
|
||||||
border: none;
|
|
||||||
color: var(--color-text-muted);
|
|
||||||
cursor: pointer;
|
|
||||||
font-size: 1rem;
|
|
||||||
}
|
|
||||||
</style>
|
|
||||||
|
|
@ -1,826 +0,0 @@
|
||||||
<template>
|
|
||||||
<div class="models-view">
|
|
||||||
<h1 class="page-title">🤗 Models</h1>
|
|
||||||
|
|
||||||
<!-- ── 1. HF Lookup ───────────────────────────────── -->
|
|
||||||
<section class="section">
|
|
||||||
<h2 class="section-title">HuggingFace Lookup</h2>
|
|
||||||
|
|
||||||
<div class="lookup-row">
|
|
||||||
<input
|
|
||||||
v-model="lookupInput"
|
|
||||||
type="text"
|
|
||||||
class="lookup-input"
|
|
||||||
placeholder="org/model or huggingface.co/org/model"
|
|
||||||
:disabled="lookupLoading"
|
|
||||||
@keydown.enter="doLookup"
|
|
||||||
aria-label="HuggingFace model ID"
|
|
||||||
/>
|
|
||||||
<button
|
|
||||||
class="btn-primary"
|
|
||||||
:disabled="lookupLoading || !lookupInput.trim()"
|
|
||||||
@click="doLookup"
|
|
||||||
>
|
|
||||||
{{ lookupLoading ? 'Looking up…' : 'Lookup' }}
|
|
||||||
</button>
|
|
||||||
</div>
|
|
||||||
|
|
||||||
<div v-if="lookupError" class="error-notice" role="alert">
|
|
||||||
{{ lookupError }}
|
|
||||||
</div>
|
|
||||||
|
|
||||||
<div v-if="lookupResult" class="preview-card">
|
|
||||||
<div class="preview-header">
|
|
||||||
<span class="preview-repo-id">{{ lookupResult.repo_id }}</span>
|
|
||||||
<div class="badge-group">
|
|
||||||
<span v-if="lookupResult.already_installed" class="badge badge-success">Installed</span>
|
|
||||||
<span v-if="lookupResult.already_queued" class="badge badge-info">In queue</span>
|
|
||||||
</div>
|
|
||||||
</div>
|
|
||||||
|
|
||||||
<div class="preview-meta">
|
|
||||||
<span v-if="lookupResult.pipeline_tag" class="chip chip-pipeline">
|
|
||||||
{{ lookupResult.pipeline_tag }}
|
|
||||||
</span>
|
|
||||||
<span v-if="lookupResult.adapter_recommendation" class="chip chip-adapter">
|
|
||||||
{{ lookupResult.adapter_recommendation }}
|
|
||||||
</span>
|
|
||||||
<span v-if="lookupResult.size != null" class="preview-size">
|
|
||||||
{{ humanBytes(lookupResult.size) }}
|
|
||||||
</span>
|
|
||||||
</div>
|
|
||||||
|
|
||||||
<p v-if="lookupResult.description" class="preview-desc">
|
|
||||||
{{ lookupResult.description }}
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<button
|
|
||||||
class="btn-primary btn-add-queue"
|
|
||||||
:disabled="lookupResult.already_installed || lookupResult.already_queued || addingToQueue"
|
|
||||||
@click="addToQueue"
|
|
||||||
>
|
|
||||||
{{ addingToQueue ? 'Adding…' : 'Add to queue' }}
|
|
||||||
</button>
|
|
||||||
</div>
|
|
||||||
</section>
|
|
||||||
|
|
||||||
<!-- ── 2. Approval Queue ──────────────────────────── -->
|
|
||||||
<section class="section">
|
|
||||||
<h2 class="section-title">Approval Queue</h2>
|
|
||||||
|
|
||||||
<div v-if="pendingModels.length === 0" class="empty-notice">
|
|
||||||
No models waiting for approval.
|
|
||||||
</div>
|
|
||||||
|
|
||||||
<div v-for="model in pendingModels" :key="model.id" class="model-card">
|
|
||||||
<div class="model-card-header">
|
|
||||||
<span class="model-repo-id">{{ model.repo_id }}</span>
|
|
||||||
<button
|
|
||||||
class="btn-dismiss"
|
|
||||||
:aria-label="`Dismiss ${model.repo_id}`"
|
|
||||||
@click="dismissModel(model.id)"
|
|
||||||
>
|
|
||||||
✕
|
|
||||||
</button>
|
|
||||||
</div>
|
|
||||||
<div class="model-meta">
|
|
||||||
<span v-if="model.pipeline_tag" class="chip chip-pipeline">{{ model.pipeline_tag }}</span>
|
|
||||||
<span v-if="model.adapter_recommendation" class="chip chip-adapter">{{ model.adapter_recommendation }}</span>
|
|
||||||
</div>
|
|
||||||
<div class="model-card-actions">
|
|
||||||
<button class="btn-primary btn-sm" @click="approveModel(model.id)">
|
|
||||||
Approve download
|
|
||||||
</button>
|
|
||||||
</div>
|
|
||||||
</div>
|
|
||||||
</section>
|
|
||||||
|
|
||||||
<!-- ── 3. Active Downloads ────────────────────────── -->
|
|
||||||
<section class="section">
|
|
||||||
<h2 class="section-title">Active Downloads</h2>
|
|
||||||
|
|
||||||
<div v-if="downloadingModels.length === 0" class="empty-notice">
|
|
||||||
No active downloads.
|
|
||||||
</div>
|
|
||||||
|
|
||||||
<div v-for="model in downloadingModels" :key="model.id" class="model-card">
|
|
||||||
<div class="model-card-header">
|
|
||||||
<span class="model-repo-id">{{ model.repo_id }}</span>
|
|
||||||
<span v-if="downloadErrors[model.id]" class="badge badge-error">Error</span>
|
|
||||||
</div>
|
|
||||||
<div class="model-meta">
|
|
||||||
<span v-if="model.pipeline_tag" class="chip chip-pipeline">{{ model.pipeline_tag }}</span>
|
|
||||||
</div>
|
|
||||||
|
|
||||||
<div v-if="downloadErrors[model.id]" class="download-error" role="alert">
|
|
||||||
{{ downloadErrors[model.id] }}
|
|
||||||
</div>
|
|
||||||
<div v-else class="progress-wrap" :aria-label="`Download progress for ${model.repo_id}`">
|
|
||||||
<div
|
|
||||||
class="progress-bar"
|
|
||||||
:style="{ width: `${downloadProgress[model.id] ?? 0}%` }"
|
|
||||||
role="progressbar"
|
|
||||||
:aria-valuenow="downloadProgress[model.id] ?? 0"
|
|
||||||
aria-valuemin="0"
|
|
||||||
aria-valuemax="100"
|
|
||||||
/>
|
|
||||||
<span class="progress-label">
|
|
||||||
{{ downloadProgress[model.id] == null ? 'Preparing…' : `${downloadProgress[model.id]}%` }}
|
|
||||||
</span>
|
|
||||||
</div>
|
|
||||||
</div>
|
|
||||||
</section>
|
|
||||||
|
|
||||||
<!-- ── 4. Installed Models ────────────────────────── -->
|
|
||||||
<section class="section">
|
|
||||||
<h2 class="section-title">Installed Models</h2>
|
|
||||||
|
|
||||||
<div v-if="installedModels.length === 0" class="empty-notice">
|
|
||||||
No models installed yet.
|
|
||||||
</div>
|
|
||||||
|
|
||||||
<div v-else class="installed-table-wrap">
|
|
||||||
<table class="installed-table">
|
|
||||||
<thead>
|
|
||||||
<tr>
|
|
||||||
<th>Name</th>
|
|
||||||
<th>Type</th>
|
|
||||||
<th>Adapter</th>
|
|
||||||
<th>Size</th>
|
|
||||||
<th></th>
|
|
||||||
</tr>
|
|
||||||
</thead>
|
|
||||||
<tbody>
|
|
||||||
<tr v-for="model in installedModels" :key="model.name">
|
|
||||||
<td class="td-name">{{ model.name }}</td>
|
|
||||||
<td>
|
|
||||||
<span
|
|
||||||
class="badge"
|
|
||||||
:class="model.type === 'finetuned' ? 'badge-accent' : 'badge-info'"
|
|
||||||
>
|
|
||||||
{{ model.type }}
|
|
||||||
</span>
|
|
||||||
</td>
|
|
||||||
<td>{{ model.adapter ?? '—' }}</td>
|
|
||||||
<td>{{ humanBytes(model.size) }}</td>
|
|
||||||
<td>
|
|
||||||
<button
|
|
||||||
class="btn-danger btn-sm"
|
|
||||||
@click="deleteInstalled(model.name)"
|
|
||||||
>
|
|
||||||
Delete
|
|
||||||
</button>
|
|
||||||
</td>
|
|
||||||
</tr>
|
|
||||||
</tbody>
|
|
||||||
</table>
|
|
||||||
</div>
|
|
||||||
</section>
|
|
||||||
</div>
|
|
||||||
</template>
|
|
||||||
|
|
||||||
<script setup lang="ts">
|
|
||||||
import { ref, computed, onMounted, onUnmounted } from 'vue'
|
|
||||||
|
|
||||||
// ── Type definitions ──────────────────────────────────
|
|
||||||
|
|
||||||
interface LookupResult {
|
|
||||||
repo_id: string
|
|
||||||
pipeline_tag: string | null
|
|
||||||
adapter_recommendation: string | null
|
|
||||||
size: number | null
|
|
||||||
description: string | null
|
|
||||||
already_installed: boolean
|
|
||||||
already_queued: boolean
|
|
||||||
}
|
|
||||||
|
|
||||||
interface QueuedModel {
|
|
||||||
id: string
|
|
||||||
repo_id: string
|
|
||||||
status: 'pending' | 'downloading' | 'done' | 'error'
|
|
||||||
pipeline_tag: string | null
|
|
||||||
adapter_recommendation: string | null
|
|
||||||
}
|
|
||||||
|
|
||||||
interface InstalledModel {
|
|
||||||
name: string
|
|
||||||
type: 'finetuned' | 'downloaded'
|
|
||||||
adapter: string | null
|
|
||||||
size: number
|
|
||||||
}
|
|
||||||
|
|
||||||
interface SseProgressEvent {
|
|
||||||
model_id: string
|
|
||||||
pct: number | null
|
|
||||||
status: 'progress' | 'done' | 'error'
|
|
||||||
message?: string
|
|
||||||
}
|
|
||||||
|
|
||||||
// ── State ─────────────────────────────────────────────
|
|
||||||
|
|
||||||
const lookupInput = ref('')
|
|
||||||
const lookupLoading = ref(false)
|
|
||||||
const lookupError = ref<string | null>(null)
|
|
||||||
const lookupResult = ref<LookupResult | null>(null)
|
|
||||||
const addingToQueue = ref(false)
|
|
||||||
|
|
||||||
const queuedModels = ref<QueuedModel[]>([])
|
|
||||||
const installedModels = ref<InstalledModel[]>([])
|
|
||||||
|
|
||||||
const downloadProgress = ref<Record<string, number>>({})
|
|
||||||
const downloadErrors = ref<Record<string, string>>({})
|
|
||||||
|
|
||||||
let pollInterval: ReturnType<typeof setInterval> | null = null
|
|
||||||
let sseSource: EventSource | null = null
|
|
||||||
|
|
||||||
// ── Derived ───────────────────────────────────────────
|
|
||||||
|
|
||||||
const pendingModels = computed(() =>
|
|
||||||
queuedModels.value.filter(m => m.status === 'pending')
|
|
||||||
)
|
|
||||||
|
|
||||||
const downloadingModels = computed(() =>
|
|
||||||
queuedModels.value.filter(m => m.status === 'downloading')
|
|
||||||
)
|
|
||||||
|
|
||||||
// ── Helpers ───────────────────────────────────────────
|
|
||||||
|
|
||||||
function humanBytes(bytes: number | null): string {
|
|
||||||
if (bytes == null) return '—'
|
|
||||||
const units = ['B', 'KB', 'MB', 'GB', 'TB']
|
|
||||||
let value = bytes
|
|
||||||
let unitIndex = 0
|
|
||||||
while (value >= 1024 && unitIndex < units.length - 1) {
|
|
||||||
value /= 1024
|
|
||||||
unitIndex++
|
|
||||||
}
|
|
||||||
return `${value.toFixed(unitIndex === 0 ? 0 : 1)} ${units[unitIndex]}`
|
|
||||||
}
|
|
||||||
|
|
||||||
function normalizeRepoId(raw: string): string {
|
|
||||||
return raw.trim().replace(/^https?:\/\/huggingface\.co\//, '')
|
|
||||||
}
|
|
||||||
|
|
||||||
// ── API calls ─────────────────────────────────────────
|
|
||||||
|
|
||||||
async function doLookup() {
|
|
||||||
const repoId = normalizeRepoId(lookupInput.value)
|
|
||||||
if (!repoId) return
|
|
||||||
|
|
||||||
lookupLoading.value = true
|
|
||||||
lookupError.value = null
|
|
||||||
lookupResult.value = null
|
|
||||||
|
|
||||||
try {
|
|
||||||
const res = await fetch(`/api/models/lookup?repo_id=${encodeURIComponent(repoId)}`)
|
|
||||||
if (res.status === 404) {
|
|
||||||
lookupError.value = 'Model not found on HuggingFace.'
|
|
||||||
return
|
|
||||||
}
|
|
||||||
if (res.status === 502) {
|
|
||||||
lookupError.value = 'HuggingFace unreachable. Check your connection and try again.'
|
|
||||||
return
|
|
||||||
}
|
|
||||||
if (!res.ok) {
|
|
||||||
lookupError.value = `Lookup failed (HTTP ${res.status}).`
|
|
||||||
return
|
|
||||||
}
|
|
||||||
lookupResult.value = await res.json() as LookupResult
|
|
||||||
} catch {
|
|
||||||
lookupError.value = 'Network error. Is the Avocet API running?'
|
|
||||||
} finally {
|
|
||||||
lookupLoading.value = false
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
async function addToQueue() {
|
|
||||||
if (!lookupResult.value) return
|
|
||||||
addingToQueue.value = true
|
|
||||||
try {
|
|
||||||
const res = await fetch('/api/models/queue', {
|
|
||||||
method: 'POST',
|
|
||||||
headers: { 'Content-Type': 'application/json' },
|
|
||||||
body: JSON.stringify({ repo_id: lookupResult.value.repo_id }),
|
|
||||||
})
|
|
||||||
if (res.ok) {
|
|
||||||
lookupResult.value = { ...lookupResult.value, already_queued: true }
|
|
||||||
await loadQueue()
|
|
||||||
}
|
|
||||||
} catch { /* ignore — already_queued badge won't flip, user can retry */ }
|
|
||||||
finally {
|
|
||||||
addingToQueue.value = false
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
async function approveModel(id: string) {
|
|
||||||
try {
|
|
||||||
const res = await fetch(`/api/models/queue/${encodeURIComponent(id)}/approve`, { method: 'POST' })
|
|
||||||
if (res.ok) {
|
|
||||||
await loadQueue()
|
|
||||||
startSse()
|
|
||||||
}
|
|
||||||
} catch { /* ignore */ }
|
|
||||||
}
|
|
||||||
|
|
||||||
async function dismissModel(id: string) {
|
|
||||||
try {
|
|
||||||
const res = await fetch(`/api/models/queue/${encodeURIComponent(id)}`, { method: 'DELETE' })
|
|
||||||
if (res.ok) {
|
|
||||||
queuedModels.value = queuedModels.value.filter(m => m.id !== id)
|
|
||||||
}
|
|
||||||
} catch { /* ignore */ }
|
|
||||||
}
|
|
||||||
|
|
||||||
async function deleteInstalled(name: string) {
|
|
||||||
if (!window.confirm(`Delete installed model "${name}"? This cannot be undone.`)) return
|
|
||||||
try {
|
|
||||||
const res = await fetch(`/api/models/installed/${encodeURIComponent(name)}`, { method: 'DELETE' })
|
|
||||||
if (res.ok) {
|
|
||||||
installedModels.value = installedModels.value.filter(m => m.name !== name)
|
|
||||||
}
|
|
||||||
} catch { /* ignore */ }
|
|
||||||
}
|
|
||||||
|
|
||||||
async function loadQueue() {
|
|
||||||
try {
|
|
||||||
const res = await fetch('/api/models/queue')
|
|
||||||
if (res.ok) queuedModels.value = await res.json() as QueuedModel[]
|
|
||||||
} catch { /* non-fatal */ }
|
|
||||||
}
|
|
||||||
|
|
||||||
async function loadInstalled() {
|
|
||||||
try {
|
|
||||||
const res = await fetch('/api/models/installed')
|
|
||||||
if (res.ok) installedModels.value = await res.json() as InstalledModel[]
|
|
||||||
} catch { /* non-fatal */ }
|
|
||||||
}
|
|
||||||
|
|
||||||
// ── SSE for download progress ─────────────────────────
|
|
||||||
|
|
||||||
function startSse() {
|
|
||||||
if (sseSource) return // already connected
|
|
||||||
|
|
||||||
sseSource = new EventSource('/api/models/download/stream')
|
|
||||||
|
|
||||||
sseSource.addEventListener('message', (e: MessageEvent) => {
|
|
||||||
let event: SseProgressEvent
|
|
||||||
try {
|
|
||||||
event = JSON.parse(e.data as string) as SseProgressEvent
|
|
||||||
} catch {
|
|
||||||
return
|
|
||||||
}
|
|
||||||
|
|
||||||
const { model_id, pct, status, message } = event
|
|
||||||
|
|
||||||
if (status === 'progress' && pct != null) {
|
|
||||||
downloadProgress.value = { ...downloadProgress.value, [model_id]: pct }
|
|
||||||
} else if (status === 'done') {
|
|
||||||
const updated = { ...downloadProgress.value }
|
|
||||||
delete updated[model_id]
|
|
||||||
downloadProgress.value = updated
|
|
||||||
|
|
||||||
queuedModels.value = queuedModels.value.filter(m => m.id !== model_id)
|
|
||||||
loadInstalled()
|
|
||||||
} else if (status === 'error') {
|
|
||||||
downloadErrors.value = {
|
|
||||||
...downloadErrors.value,
|
|
||||||
[model_id]: message ?? 'Download failed.',
|
|
||||||
}
|
|
||||||
}
|
|
||||||
})
|
|
||||||
|
|
||||||
sseSource.onerror = () => {
|
|
||||||
sseSource?.close()
|
|
||||||
sseSource = null
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
function stopSse() {
|
|
||||||
sseSource?.close()
|
|
||||||
sseSource = null
|
|
||||||
}
|
|
||||||
|
|
||||||
// ── Polling ───────────────────────────────────────────
|
|
||||||
|
|
||||||
function startPollingIfDownloading() {
|
|
||||||
if (pollInterval) return
|
|
||||||
pollInterval = setInterval(async () => {
|
|
||||||
await loadQueue()
|
|
||||||
if (downloadingModels.value.length === 0) {
|
|
||||||
stopPolling()
|
|
||||||
}
|
|
||||||
}, 5000)
|
|
||||||
}
|
|
||||||
|
|
||||||
function stopPolling() {
|
|
||||||
if (pollInterval) {
|
|
||||||
clearInterval(pollInterval)
|
|
||||||
pollInterval = null
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// ── Lifecycle ─────────────────────────────────────────
|
|
||||||
|
|
||||||
onMounted(async () => {
|
|
||||||
await Promise.all([loadQueue(), loadInstalled()])
|
|
||||||
|
|
||||||
if (downloadingModels.value.length > 0) {
|
|
||||||
startSse()
|
|
||||||
startPollingIfDownloading()
|
|
||||||
}
|
|
||||||
})
|
|
||||||
|
|
||||||
onUnmounted(() => {
|
|
||||||
stopPolling()
|
|
||||||
stopSse()
|
|
||||||
})
|
|
||||||
</script>
|
|
||||||
|
|
||||||
<style scoped>
|
|
||||||
.models-view {
|
|
||||||
max-width: 760px;
|
|
||||||
margin: 0 auto;
|
|
||||||
padding: 1.5rem 1rem 4rem;
|
|
||||||
display: flex;
|
|
||||||
flex-direction: column;
|
|
||||||
gap: 2rem;
|
|
||||||
}
|
|
||||||
|
|
||||||
.page-title {
|
|
||||||
font-family: var(--font-display, var(--font-body, sans-serif));
|
|
||||||
font-size: 1.4rem;
|
|
||||||
font-weight: 700;
|
|
||||||
color: var(--color-primary, #2d5a27);
|
|
||||||
}
|
|
||||||
|
|
||||||
/* ── Sections ── */
|
|
||||||
.section {
|
|
||||||
display: flex;
|
|
||||||
flex-direction: column;
|
|
||||||
gap: 0.75rem;
|
|
||||||
}
|
|
||||||
|
|
||||||
.section-title {
|
|
||||||
font-size: 1rem;
|
|
||||||
font-weight: 600;
|
|
||||||
color: var(--color-text, #1a2338);
|
|
||||||
padding-bottom: 0.4rem;
|
|
||||||
border-bottom: 1px solid var(--color-border, #a8b8d0);
|
|
||||||
}
|
|
||||||
|
|
||||||
/* ── Lookup row ── */
|
|
||||||
.lookup-row {
|
|
||||||
display: flex;
|
|
||||||
gap: 0.5rem;
|
|
||||||
flex-wrap: wrap;
|
|
||||||
}
|
|
||||||
|
|
||||||
.lookup-input {
|
|
||||||
flex: 1;
|
|
||||||
min-width: 0;
|
|
||||||
padding: 0.45rem 0.7rem;
|
|
||||||
border: 1px solid var(--color-border, #a8b8d0);
|
|
||||||
border-radius: var(--radius-md, 0.5rem);
|
|
||||||
background: var(--color-surface-raised, #f5f7fc);
|
|
||||||
color: var(--color-text, #1a2338);
|
|
||||||
font-size: 0.9rem;
|
|
||||||
font-family: var(--font-body, sans-serif);
|
|
||||||
}
|
|
||||||
|
|
||||||
.lookup-input:disabled {
|
|
||||||
opacity: 0.6;
|
|
||||||
}
|
|
||||||
|
|
||||||
.lookup-input::placeholder {
|
|
||||||
color: var(--color-text-muted, #4a5c7a);
|
|
||||||
}
|
|
||||||
|
|
||||||
/* ── Notices ── */
|
|
||||||
.error-notice {
|
|
||||||
padding: 0.6rem 0.8rem;
|
|
||||||
background: color-mix(in srgb, var(--color-error, #c0392b) 12%, transparent);
|
|
||||||
border: 1px solid color-mix(in srgb, var(--color-error, #c0392b) 30%, transparent);
|
|
||||||
border-radius: var(--radius-md, 0.5rem);
|
|
||||||
color: var(--color-error, #c0392b);
|
|
||||||
font-size: 0.88rem;
|
|
||||||
}
|
|
||||||
|
|
||||||
.empty-notice {
|
|
||||||
color: var(--color-text-muted, #4a5c7a);
|
|
||||||
font-size: 0.9rem;
|
|
||||||
padding: 0.75rem;
|
|
||||||
border: 1px dashed var(--color-border, #a8b8d0);
|
|
||||||
border-radius: var(--radius-md, 0.5rem);
|
|
||||||
}
|
|
||||||
|
|
||||||
/* ── Preview card ── */
|
|
||||||
.preview-card {
|
|
||||||
border: 1px solid var(--color-border, #a8b8d0);
|
|
||||||
border-radius: var(--radius-lg, 1rem);
|
|
||||||
background: var(--color-surface-raised, #f5f7fc);
|
|
||||||
padding: 1rem;
|
|
||||||
display: flex;
|
|
||||||
flex-direction: column;
|
|
||||||
gap: 0.6rem;
|
|
||||||
box-shadow: var(--shadow-sm);
|
|
||||||
}
|
|
||||||
|
|
||||||
.preview-header {
|
|
||||||
display: flex;
|
|
||||||
align-items: flex-start;
|
|
||||||
justify-content: space-between;
|
|
||||||
gap: 0.5rem;
|
|
||||||
flex-wrap: wrap;
|
|
||||||
}
|
|
||||||
|
|
||||||
.preview-repo-id {
|
|
||||||
font-family: var(--font-mono, monospace);
|
|
||||||
font-size: 0.95rem;
|
|
||||||
font-weight: 600;
|
|
||||||
color: var(--color-text, #1a2338);
|
|
||||||
word-break: break-all;
|
|
||||||
}
|
|
||||||
|
|
||||||
.preview-meta {
|
|
||||||
display: flex;
|
|
||||||
gap: 0.4rem;
|
|
||||||
flex-wrap: wrap;
|
|
||||||
align-items: center;
|
|
||||||
}
|
|
||||||
|
|
||||||
.preview-size {
|
|
||||||
font-size: 0.8rem;
|
|
||||||
color: var(--color-text-muted, #4a5c7a);
|
|
||||||
margin-left: 0.25rem;
|
|
||||||
}
|
|
||||||
|
|
||||||
.preview-desc {
|
|
||||||
font-size: 0.875rem;
|
|
||||||
color: var(--color-text-muted, #4a5c7a);
|
|
||||||
line-height: 1.5;
|
|
||||||
margin: 0;
|
|
||||||
display: -webkit-box;
|
|
||||||
-webkit-line-clamp: 3;
|
|
||||||
-webkit-box-orient: vertical;
|
|
||||||
overflow: hidden;
|
|
||||||
}
|
|
||||||
|
|
||||||
.btn-add-queue {
|
|
||||||
align-self: flex-start;
|
|
||||||
}
|
|
||||||
|
|
||||||
/* ── Model cards (queue + downloads) ── */
|
|
||||||
.model-card {
|
|
||||||
border: 1px solid var(--color-border, #a8b8d0);
|
|
||||||
border-radius: var(--radius-md, 0.5rem);
|
|
||||||
background: var(--color-surface-raised, #f5f7fc);
|
|
||||||
padding: 0.75rem 1rem;
|
|
||||||
display: flex;
|
|
||||||
flex-direction: column;
|
|
||||||
gap: 0.5rem;
|
|
||||||
box-shadow: var(--shadow-sm);
|
|
||||||
}
|
|
||||||
|
|
||||||
.model-card-header {
|
|
||||||
display: flex;
|
|
||||||
align-items: center;
|
|
||||||
justify-content: space-between;
|
|
||||||
gap: 0.5rem;
|
|
||||||
}
|
|
||||||
|
|
||||||
.model-repo-id {
|
|
||||||
font-family: var(--font-mono, monospace);
|
|
||||||
font-size: 0.9rem;
|
|
||||||
font-weight: 600;
|
|
||||||
color: var(--color-text, #1a2338);
|
|
||||||
word-break: break-all;
|
|
||||||
}
|
|
||||||
|
|
||||||
.model-meta {
|
|
||||||
display: flex;
|
|
||||||
gap: 0.4rem;
|
|
||||||
flex-wrap: wrap;
|
|
||||||
}
|
|
||||||
|
|
||||||
.model-card-actions {
|
|
||||||
display: flex;
|
|
||||||
gap: 0.5rem;
|
|
||||||
flex-wrap: wrap;
|
|
||||||
padding-top: 0.25rem;
|
|
||||||
}
|
|
||||||
|
|
||||||
/* ── Progress bar ── */
|
|
||||||
.progress-wrap {
|
|
||||||
position: relative;
|
|
||||||
height: 1.5rem;
|
|
||||||
background: var(--color-surface-alt, #dde4f0);
|
|
||||||
border-radius: var(--radius-full, 9999px);
|
|
||||||
overflow: hidden;
|
|
||||||
}
|
|
||||||
|
|
||||||
.progress-bar {
|
|
||||||
position: absolute;
|
|
||||||
top: 0;
|
|
||||||
left: 0;
|
|
||||||
height: 100%;
|
|
||||||
background: var(--color-accent, #c4732a);
|
|
||||||
border-radius: var(--radius-full, 9999px);
|
|
||||||
transition: width 300ms ease;
|
|
||||||
}
|
|
||||||
|
|
||||||
.progress-label {
|
|
||||||
position: absolute;
|
|
||||||
inset: 0;
|
|
||||||
display: flex;
|
|
||||||
align-items: center;
|
|
||||||
justify-content: center;
|
|
||||||
font-size: 0.75rem;
|
|
||||||
font-weight: 600;
|
|
||||||
color: var(--color-text, #1a2338);
|
|
||||||
pointer-events: none;
|
|
||||||
}
|
|
||||||
|
|
||||||
.download-error {
|
|
||||||
font-size: 0.85rem;
|
|
||||||
color: var(--color-error, #c0392b);
|
|
||||||
padding: 0.4rem 0.5rem;
|
|
||||||
background: color-mix(in srgb, var(--color-error, #c0392b) 10%, transparent);
|
|
||||||
border-radius: var(--radius-sm, 0.25rem);
|
|
||||||
}
|
|
||||||
|
|
||||||
/* ── Installed table ── */
|
|
||||||
.installed-table-wrap {
|
|
||||||
overflow-x: auto;
|
|
||||||
}
|
|
||||||
|
|
||||||
.installed-table {
|
|
||||||
width: 100%;
|
|
||||||
border-collapse: collapse;
|
|
||||||
font-size: 0.875rem;
|
|
||||||
}
|
|
||||||
|
|
||||||
.installed-table th {
|
|
||||||
text-align: left;
|
|
||||||
padding: 0.4rem 0.6rem;
|
|
||||||
color: var(--color-text-muted, #4a5c7a);
|
|
||||||
font-size: 0.78rem;
|
|
||||||
font-weight: 600;
|
|
||||||
text-transform: uppercase;
|
|
||||||
letter-spacing: 0.03em;
|
|
||||||
border-bottom: 1px solid var(--color-border, #a8b8d0);
|
|
||||||
white-space: nowrap;
|
|
||||||
}
|
|
||||||
|
|
||||||
.installed-table td {
|
|
||||||
padding: 0.55rem 0.6rem;
|
|
||||||
border-bottom: 1px solid var(--color-border-light, #ccd5e6);
|
|
||||||
vertical-align: middle;
|
|
||||||
}
|
|
||||||
|
|
||||||
.td-name {
|
|
||||||
font-family: var(--font-mono, monospace);
|
|
||||||
font-size: 0.85rem;
|
|
||||||
word-break: break-all;
|
|
||||||
}
|
|
||||||
|
|
||||||
/* ── Badges ── */
|
|
||||||
.badge-group {
|
|
||||||
display: flex;
|
|
||||||
gap: 0.35rem;
|
|
||||||
flex-wrap: wrap;
|
|
||||||
align-items: center;
|
|
||||||
}
|
|
||||||
|
|
||||||
.badge {
|
|
||||||
display: inline-flex;
|
|
||||||
align-items: center;
|
|
||||||
padding: 0.15rem 0.55rem;
|
|
||||||
border-radius: var(--radius-full, 9999px);
|
|
||||||
font-size: 0.72rem;
|
|
||||||
font-weight: 700;
|
|
||||||
letter-spacing: 0.02em;
|
|
||||||
text-transform: uppercase;
|
|
||||||
white-space: nowrap;
|
|
||||||
}
|
|
||||||
|
|
||||||
.badge-success {
|
|
||||||
background: color-mix(in srgb, var(--color-success, #3a7a32) 15%, transparent);
|
|
||||||
color: var(--color-success, #3a7a32);
|
|
||||||
}
|
|
||||||
|
|
||||||
.badge-info {
|
|
||||||
background: color-mix(in srgb, var(--color-info, #1e6091) 15%, transparent);
|
|
||||||
color: var(--color-info, #1e6091);
|
|
||||||
}
|
|
||||||
|
|
||||||
.badge-accent {
|
|
||||||
background: color-mix(in srgb, var(--color-accent, #c4732a) 15%, transparent);
|
|
||||||
color: var(--color-accent, #c4732a);
|
|
||||||
}
|
|
||||||
|
|
||||||
.badge-error {
|
|
||||||
background: color-mix(in srgb, var(--color-error, #c0392b) 15%, transparent);
|
|
||||||
color: var(--color-error, #c0392b);
|
|
||||||
}
|
|
||||||
|
|
||||||
/* ── Chips ── */
|
|
||||||
.chip {
|
|
||||||
display: inline-flex;
|
|
||||||
align-items: center;
|
|
||||||
padding: 0.15rem 0.5rem;
|
|
||||||
border-radius: var(--radius-full, 9999px);
|
|
||||||
font-size: 0.75rem;
|
|
||||||
font-weight: 600;
|
|
||||||
background: var(--color-surface-alt, #dde4f0);
|
|
||||||
white-space: nowrap;
|
|
||||||
}
|
|
||||||
|
|
||||||
.chip-pipeline {
|
|
||||||
color: var(--color-primary, #2d5a27);
|
|
||||||
background: color-mix(in srgb, var(--color-primary, #2d5a27) 12%, var(--color-surface-alt, #dde4f0));
|
|
||||||
}
|
|
||||||
|
|
||||||
.chip-adapter {
|
|
||||||
color: var(--color-accent, #c4732a);
|
|
||||||
background: color-mix(in srgb, var(--color-accent, #c4732a) 12%, var(--color-surface-alt, #dde4f0));
|
|
||||||
}
|
|
||||||
|
|
||||||
/* ── Buttons ── */
|
|
||||||
.btn-primary, .btn-danger {
|
|
||||||
padding: 0.4rem 0.9rem;
|
|
||||||
border-radius: var(--radius-md, 0.5rem);
|
|
||||||
font-size: 0.85rem;
|
|
||||||
cursor: pointer;
|
|
||||||
border: 1px solid;
|
|
||||||
font-family: var(--font-body, sans-serif);
|
|
||||||
transition: background var(--transition, 200ms ease), color var(--transition, 200ms ease);
|
|
||||||
}
|
|
||||||
|
|
||||||
.btn-sm {
|
|
||||||
padding: 0.25rem 0.65rem;
|
|
||||||
font-size: 0.8rem;
|
|
||||||
}
|
|
||||||
|
|
||||||
.btn-primary {
|
|
||||||
border-color: var(--color-primary, #2d5a27);
|
|
||||||
background: var(--color-primary, #2d5a27);
|
|
||||||
color: var(--color-text-inverse, #eaeff8);
|
|
||||||
}
|
|
||||||
|
|
||||||
.btn-primary:hover:not(:disabled) {
|
|
||||||
background: var(--color-primary-hover, #234820);
|
|
||||||
border-color: var(--color-primary-hover, #234820);
|
|
||||||
}
|
|
||||||
|
|
||||||
.btn-primary:disabled {
|
|
||||||
opacity: 0.5;
|
|
||||||
cursor: not-allowed;
|
|
||||||
}
|
|
||||||
|
|
||||||
.btn-danger {
|
|
||||||
border-color: var(--color-error, #c0392b);
|
|
||||||
background: transparent;
|
|
||||||
color: var(--color-error, #c0392b);
|
|
||||||
}
|
|
||||||
|
|
||||||
.btn-danger:hover {
|
|
||||||
background: color-mix(in srgb, var(--color-error, #c0392b) 10%, transparent);
|
|
||||||
}
|
|
||||||
|
|
||||||
.btn-dismiss {
|
|
||||||
border: none;
|
|
||||||
background: transparent;
|
|
||||||
color: var(--color-text-muted, #4a5c7a);
|
|
||||||
cursor: pointer;
|
|
||||||
font-size: 0.9rem;
|
|
||||||
padding: 0.15rem 0.4rem;
|
|
||||||
border-radius: var(--radius-sm, 0.25rem);
|
|
||||||
flex-shrink: 0;
|
|
||||||
transition: color var(--transition, 200ms ease), background var(--transition, 200ms ease);
|
|
||||||
}
|
|
||||||
|
|
||||||
.btn-dismiss:hover {
|
|
||||||
color: var(--color-error, #c0392b);
|
|
||||||
background: color-mix(in srgb, var(--color-error, #c0392b) 10%, transparent);
|
|
||||||
}
|
|
||||||
|
|
||||||
/* ── Responsive ── */
|
|
||||||
@media (max-width: 480px) {
|
|
||||||
.lookup-row {
|
|
||||||
flex-direction: column;
|
|
||||||
}
|
|
||||||
|
|
||||||
.lookup-input {
|
|
||||||
width: 100%;
|
|
||||||
}
|
|
||||||
|
|
||||||
.btn-primary:not(.btn-sm) {
|
|
||||||
width: 100%;
|
|
||||||
}
|
|
||||||
|
|
||||||
.installed-table th:nth-child(3),
|
|
||||||
.installed-table td:nth-child(3) {
|
|
||||||
display: none; /* hide Adapter column on very narrow screens */
|
|
||||||
}
|
|
||||||
}
|
|
||||||
</style>
|
|
||||||
|
|
@ -110,63 +110,6 @@
|
||||||
</label>
|
</label>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<!-- cf-orch SFT Integration section -->
|
|
||||||
<section class="section">
|
|
||||||
<h2 class="section-title">cf-orch Integration</h2>
|
|
||||||
<p class="section-desc">
|
|
||||||
Import SFT (supervised fine-tuning) candidates from cf-orch benchmark runs.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<div class="field-row">
|
|
||||||
<label class="field field-grow">
|
|
||||||
<span>bench_results_dir</span>
|
|
||||||
<input
|
|
||||||
id="bench-results-dir"
|
|
||||||
v-model="benchResultsDir"
|
|
||||||
type="text"
|
|
||||||
placeholder="/path/to/circuitforge-orch/scripts/bench_results"
|
|
||||||
/>
|
|
||||||
</label>
|
|
||||||
</div>
|
|
||||||
|
|
||||||
<div class="account-actions">
|
|
||||||
<button class="btn-primary" @click="saveSftConfig">Save</button>
|
|
||||||
<button class="btn-secondary" @click="scanRuns">Scan for runs</button>
|
|
||||||
<span v-if="saveStatus" class="save-status">{{ saveStatus }}</span>
|
|
||||||
</div>
|
|
||||||
|
|
||||||
<table v-if="runs.length > 0" class="runs-table">
|
|
||||||
<thead>
|
|
||||||
<tr>
|
|
||||||
<th>Timestamp</th>
|
|
||||||
<th>Candidates</th>
|
|
||||||
<th>Imported</th>
|
|
||||||
<th></th>
|
|
||||||
</tr>
|
|
||||||
</thead>
|
|
||||||
<tbody>
|
|
||||||
<tr v-for="run in runs" :key="run.run_id">
|
|
||||||
<td>{{ run.timestamp }}</td>
|
|
||||||
<td>{{ run.candidate_count }}</td>
|
|
||||||
<td>{{ run.already_imported ? '✓' : '—' }}</td>
|
|
||||||
<td>
|
|
||||||
<button
|
|
||||||
class="btn-import"
|
|
||||||
:disabled="run.already_imported || importingRunId === run.run_id"
|
|
||||||
@click="importRun(run.run_id)"
|
|
||||||
>
|
|
||||||
{{ importingRunId === run.run_id ? 'Importing…' : 'Import' }}
|
|
||||||
</button>
|
|
||||||
</td>
|
|
||||||
</tr>
|
|
||||||
</tbody>
|
|
||||||
</table>
|
|
||||||
|
|
||||||
<div v-if="importResult" class="import-result">
|
|
||||||
Imported {{ importResult.imported }}, skipped {{ importResult.skipped }}.
|
|
||||||
</div>
|
|
||||||
</section>
|
|
||||||
|
|
||||||
<!-- Save / Reload -->
|
<!-- Save / Reload -->
|
||||||
<div class="save-bar">
|
<div class="save-bar">
|
||||||
<button class="btn-primary" :disabled="saving" @click="save">
|
<button class="btn-primary" :disabled="saving" @click="save">
|
||||||
|
|
@ -199,64 +142,6 @@ const saveOk = ref(true)
|
||||||
const richMotion = ref(localStorage.getItem('cf-avocet-rich-motion') !== 'false')
|
const richMotion = ref(localStorage.getItem('cf-avocet-rich-motion') !== 'false')
|
||||||
const keyHints = ref(localStorage.getItem('cf-avocet-key-hints') !== 'false')
|
const keyHints = ref(localStorage.getItem('cf-avocet-key-hints') !== 'false')
|
||||||
|
|
||||||
// SFT integration state
|
|
||||||
const benchResultsDir = ref('')
|
|
||||||
const runs = ref<Array<{ run_id: string; timestamp: string; candidate_count: number; already_imported: boolean }>>([])
|
|
||||||
const importingRunId = ref<string | null>(null)
|
|
||||||
const importResult = ref<{ imported: number; skipped: number } | null>(null)
|
|
||||||
const saveStatus = ref('')
|
|
||||||
|
|
||||||
async function loadSftConfig() {
|
|
||||||
try {
|
|
||||||
const res = await fetch('/api/sft/config')
|
|
||||||
if (res.ok) {
|
|
||||||
const data = await res.json()
|
|
||||||
benchResultsDir.value = data.bench_results_dir ?? ''
|
|
||||||
}
|
|
||||||
} catch {
|
|
||||||
// non-fatal — leave field empty
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
async function saveSftConfig() {
|
|
||||||
saveStatus.value = 'Saving…'
|
|
||||||
try {
|
|
||||||
const res = await fetch('/api/sft/config', {
|
|
||||||
method: 'POST',
|
|
||||||
headers: { 'Content-Type': 'application/json' },
|
|
||||||
body: JSON.stringify({ bench_results_dir: benchResultsDir.value }),
|
|
||||||
})
|
|
||||||
saveStatus.value = res.ok ? 'Saved.' : 'Error saving.'
|
|
||||||
} catch {
|
|
||||||
saveStatus.value = 'Error saving.'
|
|
||||||
}
|
|
||||||
setTimeout(() => { saveStatus.value = '' }, 2000)
|
|
||||||
}
|
|
||||||
|
|
||||||
async function scanRuns() {
|
|
||||||
try {
|
|
||||||
const res = await fetch('/api/sft/runs')
|
|
||||||
if (res.ok) runs.value = await res.json()
|
|
||||||
} catch { /* ignore */ }
|
|
||||||
}
|
|
||||||
|
|
||||||
async function importRun(runId: string) {
|
|
||||||
importingRunId.value = runId
|
|
||||||
importResult.value = null
|
|
||||||
try {
|
|
||||||
const res = await fetch('/api/sft/import', {
|
|
||||||
method: 'POST',
|
|
||||||
headers: { 'Content-Type': 'application/json' },
|
|
||||||
body: JSON.stringify({ run_id: runId }),
|
|
||||||
})
|
|
||||||
if (res.ok) {
|
|
||||||
importResult.value = await res.json()
|
|
||||||
scanRuns() // refresh already_imported flags
|
|
||||||
}
|
|
||||||
} catch { /* ignore */ }
|
|
||||||
importingRunId.value = null
|
|
||||||
}
|
|
||||||
|
|
||||||
async function reload() {
|
async function reload() {
|
||||||
const { data } = await useApiFetch<{ accounts: Account[]; max_per_account: number }>('/api/config')
|
const { data } = await useApiFetch<{ accounts: Account[]; max_per_account: number }>('/api/config')
|
||||||
if (data) {
|
if (data) {
|
||||||
|
|
@ -334,10 +219,7 @@ function onKeyHintsChange() {
|
||||||
document.documentElement.classList.toggle('hide-key-hints', !keyHints.value)
|
document.documentElement.classList.toggle('hide-key-hints', !keyHints.value)
|
||||||
}
|
}
|
||||||
|
|
||||||
onMounted(() => {
|
onMounted(reload)
|
||||||
reload()
|
|
||||||
loadSftConfig()
|
|
||||||
})
|
|
||||||
</script>
|
</script>
|
||||||
|
|
||||||
<style scoped>
|
<style scoped>
|
||||||
|
|
@ -546,61 +428,4 @@ onMounted(() => {
|
||||||
border: 1px dashed var(--color-border, #d0d7e8);
|
border: 1px dashed var(--color-border, #d0d7e8);
|
||||||
border-radius: 0.5rem;
|
border-radius: 0.5rem;
|
||||||
}
|
}
|
||||||
|
|
||||||
.section-desc {
|
|
||||||
color: var(--color-text-secondary, #6b7a99);
|
|
||||||
font-size: 0.88rem;
|
|
||||||
line-height: 1.5;
|
|
||||||
}
|
|
||||||
|
|
||||||
.field-input {
|
|
||||||
padding: 0.4rem 0.6rem;
|
|
||||||
border: 1px solid var(--color-border, #d0d7e8);
|
|
||||||
border-radius: 0.375rem;
|
|
||||||
background: var(--color-surface, #fff);
|
|
||||||
color: var(--color-text, #1a2338);
|
|
||||||
font-size: 0.9rem;
|
|
||||||
font-family: var(--font-body, sans-serif);
|
|
||||||
width: 100%;
|
|
||||||
}
|
|
||||||
|
|
||||||
.runs-table {
|
|
||||||
width: 100%;
|
|
||||||
border-collapse: collapse;
|
|
||||||
margin-top: var(--space-3, 0.75rem);
|
|
||||||
font-size: 0.88rem;
|
|
||||||
}
|
|
||||||
|
|
||||||
.runs-table th,
|
|
||||||
.runs-table td {
|
|
||||||
padding: var(--space-2, 0.5rem) var(--space-3, 0.75rem);
|
|
||||||
text-align: left;
|
|
||||||
border-bottom: 1px solid var(--color-border, #d0d7e8);
|
|
||||||
}
|
|
||||||
|
|
||||||
.btn-import {
|
|
||||||
padding: var(--space-1, 0.25rem) var(--space-3, 0.75rem);
|
|
||||||
border: 1px solid var(--app-primary, #2A6080);
|
|
||||||
border-radius: var(--radius-sm, 0.25rem);
|
|
||||||
background: none;
|
|
||||||
color: var(--app-primary, #2A6080);
|
|
||||||
cursor: pointer;
|
|
||||||
font-size: 0.85rem;
|
|
||||||
}
|
|
||||||
|
|
||||||
.btn-import:disabled {
|
|
||||||
opacity: 0.45;
|
|
||||||
cursor: not-allowed;
|
|
||||||
}
|
|
||||||
|
|
||||||
.import-result {
|
|
||||||
margin-top: var(--space-2, 0.5rem);
|
|
||||||
font-size: 0.88rem;
|
|
||||||
color: var(--color-text-secondary, #6b7a99);
|
|
||||||
}
|
|
||||||
|
|
||||||
.save-status {
|
|
||||||
font-size: 0.85rem;
|
|
||||||
color: var(--color-text-secondary, #6b7a99);
|
|
||||||
}
|
|
||||||
</style>
|
</style>
|
||||||
|
|
|
||||||
|
|
@ -35,39 +35,6 @@
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<!-- Benchmark Results -->
|
|
||||||
<template v-if="benchRows.length > 0">
|
|
||||||
<h2 class="section-title">🏁 Benchmark Results</h2>
|
|
||||||
<div class="bench-table-wrap">
|
|
||||||
<table class="bench-table">
|
|
||||||
<thead>
|
|
||||||
<tr>
|
|
||||||
<th class="bt-model-col">Model</th>
|
|
||||||
<th
|
|
||||||
v-for="m in BENCH_METRICS"
|
|
||||||
:key="m.key as string"
|
|
||||||
class="bt-metric-col"
|
|
||||||
>{{ m.label }}</th>
|
|
||||||
</tr>
|
|
||||||
</thead>
|
|
||||||
<tbody>
|
|
||||||
<tr v-for="row in benchRows" :key="row.name">
|
|
||||||
<td class="bt-model-cell" :title="row.name">{{ row.name }}</td>
|
|
||||||
<td
|
|
||||||
v-for="m in BENCH_METRICS"
|
|
||||||
:key="m.key as string"
|
|
||||||
class="bt-metric-cell"
|
|
||||||
:class="{ 'bt-best': bestByMetric[m.key as string] === row.name }"
|
|
||||||
>
|
|
||||||
{{ formatMetric(row.result[m.key]) }}
|
|
||||||
</td>
|
|
||||||
</tr>
|
|
||||||
</tbody>
|
|
||||||
</table>
|
|
||||||
</div>
|
|
||||||
<p class="bench-hint">Highlighted cells are the best-scoring model per metric.</p>
|
|
||||||
</template>
|
|
||||||
|
|
||||||
<div class="file-info">
|
<div class="file-info">
|
||||||
<span class="file-path">Score file: <code>data/email_score.jsonl</code></span>
|
<span class="file-path">Score file: <code>data/email_score.jsonl</code></span>
|
||||||
<span class="file-size">{{ fileSizeLabel }}</span>
|
<span class="file-size">{{ fileSizeLabel }}</span>
|
||||||
|
|
@ -87,18 +54,10 @@
|
||||||
import { ref, computed, onMounted } from 'vue'
|
import { ref, computed, onMounted } from 'vue'
|
||||||
import { useApiFetch } from '../composables/useApi'
|
import { useApiFetch } from '../composables/useApi'
|
||||||
|
|
||||||
interface BenchmarkModelResult {
|
|
||||||
accuracy?: number
|
|
||||||
macro_f1?: number
|
|
||||||
weighted_f1?: number
|
|
||||||
[key: string]: number | undefined
|
|
||||||
}
|
|
||||||
|
|
||||||
interface StatsResponse {
|
interface StatsResponse {
|
||||||
total: number
|
total: number
|
||||||
counts: Record<string, number>
|
counts: Record<string, number>
|
||||||
score_file_bytes: number
|
score_file_bytes: number
|
||||||
benchmark_results?: Record<string, BenchmarkModelResult>
|
|
||||||
}
|
}
|
||||||
|
|
||||||
// Canonical label order + metadata
|
// Canonical label order + metadata
|
||||||
|
|
@ -149,42 +108,6 @@ const fileSizeLabel = computed(() => {
|
||||||
return `${(b / 1024 / 1024).toFixed(2)} MB`
|
return `${(b / 1024 / 1024).toFixed(2)} MB`
|
||||||
})
|
})
|
||||||
|
|
||||||
// Benchmark results helpers
|
|
||||||
const BENCH_METRICS: Array<{ key: keyof BenchmarkModelResult; label: string }> = [
|
|
||||||
{ key: 'accuracy', label: 'Accuracy' },
|
|
||||||
{ key: 'macro_f1', label: 'Macro F1' },
|
|
||||||
{ key: 'weighted_f1', label: 'Weighted F1' },
|
|
||||||
]
|
|
||||||
|
|
||||||
const benchRows = computed(() => {
|
|
||||||
const br = stats.value.benchmark_results
|
|
||||||
if (!br || Object.keys(br).length === 0) return []
|
|
||||||
return Object.entries(br).map(([name, result]) => ({ name, result }))
|
|
||||||
})
|
|
||||||
|
|
||||||
// Find the best model name for each metric
|
|
||||||
const bestByMetric = computed((): Record<string, string> => {
|
|
||||||
const result: Record<string, string> = {}
|
|
||||||
for (const { key } of BENCH_METRICS) {
|
|
||||||
let bestName = ''
|
|
||||||
let bestVal = -Infinity
|
|
||||||
for (const { name, result: r } of benchRows.value) {
|
|
||||||
const v = r[key]
|
|
||||||
if (v != null && v > bestVal) { bestVal = v; bestName = name }
|
|
||||||
}
|
|
||||||
result[key as string] = bestName
|
|
||||||
}
|
|
||||||
return result
|
|
||||||
})
|
|
||||||
|
|
||||||
function formatMetric(v: number | undefined): string {
|
|
||||||
if (v == null) return '—'
|
|
||||||
// Values in 0-1 range: format as percentage
|
|
||||||
if (v <= 1) return `${(v * 100).toFixed(1)}%`
|
|
||||||
// Already a percentage
|
|
||||||
return `${v.toFixed(1)}%`
|
|
||||||
}
|
|
||||||
|
|
||||||
async function load() {
|
async function load() {
|
||||||
loading.value = true
|
loading.value = true
|
||||||
error.value = ''
|
error.value = ''
|
||||||
|
|
@ -311,79 +234,6 @@ onMounted(load)
|
||||||
padding: 1rem;
|
padding: 1rem;
|
||||||
}
|
}
|
||||||
|
|
||||||
/* ── Benchmark Results ──────────────────────────── */
|
|
||||||
.section-title {
|
|
||||||
font-family: var(--font-display, var(--font-body, sans-serif));
|
|
||||||
font-size: 1.05rem;
|
|
||||||
font-weight: 700;
|
|
||||||
color: var(--app-primary, #2A6080);
|
|
||||||
margin: 0;
|
|
||||||
}
|
|
||||||
|
|
||||||
.bench-table-wrap {
|
|
||||||
overflow-x: auto;
|
|
||||||
border: 1px solid var(--color-border, #d0d7e8);
|
|
||||||
border-radius: 0.5rem;
|
|
||||||
}
|
|
||||||
|
|
||||||
.bench-table {
|
|
||||||
border-collapse: collapse;
|
|
||||||
width: 100%;
|
|
||||||
font-size: 0.82rem;
|
|
||||||
}
|
|
||||||
|
|
||||||
.bt-model-col {
|
|
||||||
text-align: left;
|
|
||||||
padding: 0.45rem 0.75rem;
|
|
||||||
background: var(--color-surface-raised, #e4ebf5);
|
|
||||||
border-bottom: 1px solid var(--color-border, #d0d7e8);
|
|
||||||
font-weight: 600;
|
|
||||||
min-width: 12rem;
|
|
||||||
}
|
|
||||||
|
|
||||||
.bt-metric-col {
|
|
||||||
text-align: right;
|
|
||||||
padding: 0.45rem 0.75rem;
|
|
||||||
background: var(--color-surface-raised, #e4ebf5);
|
|
||||||
border-bottom: 1px solid var(--color-border, #d0d7e8);
|
|
||||||
font-weight: 600;
|
|
||||||
white-space: nowrap;
|
|
||||||
min-width: 6rem;
|
|
||||||
}
|
|
||||||
|
|
||||||
.bt-model-cell {
|
|
||||||
padding: 0.4rem 0.75rem;
|
|
||||||
border-top: 1px solid var(--color-border, #d0d7e8);
|
|
||||||
font-family: var(--font-mono, monospace);
|
|
||||||
font-size: 0.76rem;
|
|
||||||
white-space: nowrap;
|
|
||||||
overflow: hidden;
|
|
||||||
text-overflow: ellipsis;
|
|
||||||
max-width: 16rem;
|
|
||||||
color: var(--color-text, #1a2338);
|
|
||||||
}
|
|
||||||
|
|
||||||
.bt-metric-cell {
|
|
||||||
padding: 0.4rem 0.75rem;
|
|
||||||
border-top: 1px solid var(--color-border, #d0d7e8);
|
|
||||||
text-align: right;
|
|
||||||
font-family: var(--font-mono, monospace);
|
|
||||||
font-variant-numeric: tabular-nums;
|
|
||||||
color: var(--color-text, #1a2338);
|
|
||||||
}
|
|
||||||
|
|
||||||
.bt-metric-cell.bt-best {
|
|
||||||
color: var(--color-success, #3a7a32);
|
|
||||||
font-weight: 700;
|
|
||||||
background: color-mix(in srgb, var(--color-success, #3a7a32) 8%, transparent);
|
|
||||||
}
|
|
||||||
|
|
||||||
.bench-hint {
|
|
||||||
font-size: 0.75rem;
|
|
||||||
color: var(--color-text-secondary, #6b7a99);
|
|
||||||
margin: 0;
|
|
||||||
}
|
|
||||||
|
|
||||||
@media (max-width: 480px) {
|
@media (max-width: 480px) {
|
||||||
.bar-row {
|
.bar-row {
|
||||||
grid-template-columns: 1.5rem 1fr 1fr 3rem;
|
grid-template-columns: 1.5rem 1fr 1fr 3rem;
|
||||||
|
|
|
||||||
|
|
@ -4,11 +4,6 @@ import UnoCSS from 'unocss/vite'
|
||||||
|
|
||||||
export default defineConfig({
|
export default defineConfig({
|
||||||
plugins: [vue(), UnoCSS()],
|
plugins: [vue(), UnoCSS()],
|
||||||
server: {
|
|
||||||
proxy: {
|
|
||||||
'/api': 'http://localhost:8503',
|
|
||||||
},
|
|
||||||
},
|
|
||||||
test: {
|
test: {
|
||||||
environment: 'jsdom',
|
environment: 'jsdom',
|
||||||
globals: true,
|
globals: true,
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue