Compare commits
4 commits
e6b64d6efe
...
5a0ba92fc6
| Author | SHA1 | Date | |
|---|---|---|---|
| 5a0ba92fc6 | |||
| ea3da701c6 | |||
| ddb56efb89 | |||
| cc24cd0d7d |
21 changed files with 9096 additions and 2019 deletions
106
README.md
Normal file
106
README.md
Normal file
|
|
@ -0,0 +1,106 @@
|
|||
# Avocet — Email Classifier Training Tool
|
||||
|
||||
> *Part of the CircuitForge LLC internal infrastructure suite.*
|
||||
|
||||
**Status:** Internal beta — label tool and benchmark harness complete. Used to build training data for Peregrine's email classifier.
|
||||
|
||||
---
|
||||
|
||||
## What it does
|
||||
|
||||
Avocet is the data pipeline for building and benchmarking email classifiers. It has two layers:
|
||||
|
||||
**No LLM required.** Avocet uses zero-shot HuggingFace classification models — no API key, no cloud inference, no GPU required for the label tool. The benchmark harness can optionally export LLM-labeled emails from a Peregrine staging DB, but human labeling via the card-stack UI is the primary workflow.
|
||||
|
||||
**Layer 1 — Label tool**
|
||||
Card-stack UI for building ground-truth classifier benchmark data. Fetch emails from one or more IMAP accounts (with targeted date-range and sender/subject filters), review them card-by-card, and label each with a job-search category. Labeled output feeds the benchmark harness.
|
||||
|
||||
**Layer 2 — Benchmark harness**
|
||||
Scores HuggingFace zero-shot classification models against the labeled dataset. Supports slow/large model inclusion, visual side-by-side comparison on live emails, and export of LLM-labeled emails from a Peregrine staging DB.
|
||||
|
||||
---
|
||||
|
||||
## Labels
|
||||
|
||||
| Label | Key |
|
||||
|-------|-----|
|
||||
| `interview_scheduled` | 1 |
|
||||
| `offer_received` | 2 |
|
||||
| `rejected` | 3 |
|
||||
| `positive_response` | 4 |
|
||||
| `survey_received` | 5 |
|
||||
| `neutral` | 6 |
|
||||
| `event_rescheduled` | 7 |
|
||||
| `unrelated` | 8 |
|
||||
| `digest` | 9 |
|
||||
|
||||
---
|
||||
|
||||
## Stack
|
||||
|
||||
| Layer | Tech |
|
||||
|-------|------|
|
||||
| Label UI | Streamlit (port 8503, auto-increments on collision) |
|
||||
| Benchmark | Python + HuggingFace Transformers |
|
||||
| Email fetch | IMAP (multi-account, targeted date/sender/subject filter) |
|
||||
| Data | JSONL (`data/email_label_queue.jsonl`, `data/email_score.jsonl`) |
|
||||
| Config | `config/label_tool.yaml` (gitignored — see `.example`) |
|
||||
|
||||
Conda environments:
|
||||
- `job-seeker` — label tool UI
|
||||
- `job-seeker-classifiers` — benchmark harness (separate env for heavy deps)
|
||||
|
||||
---
|
||||
|
||||
## Running
|
||||
|
||||
```bash
|
||||
./manage.sh start # start label tool UI (port collision-safe from 8503)
|
||||
./manage.sh stop # stop
|
||||
./manage.sh restart # restart
|
||||
./manage.sh status # show running state and port
|
||||
./manage.sh logs # tail label tool log
|
||||
./manage.sh open # open in browser
|
||||
```
|
||||
|
||||
Benchmark:
|
||||
```bash
|
||||
./manage.sh benchmark --list-models # list available zero-shot models
|
||||
./manage.sh score # score models against labeled JSONL
|
||||
./manage.sh score --include-slow # include large/slow models
|
||||
./manage.sh compare --limit 30 # visual comparison on live IMAP emails
|
||||
```
|
||||
|
||||
Dev:
|
||||
```bash
|
||||
./manage.sh test # run pytest suite
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Data flow
|
||||
|
||||
```
|
||||
IMAP accounts → fetch (targeted or wide) → email_label_queue.jsonl
|
||||
→ label tool card UI → email_score.jsonl
|
||||
→ benchmark harness → model rankings
|
||||
→ best model → Peregrine classifier adapter
|
||||
```
|
||||
|
||||
Targeted fetch: date range + sender/subject filter for pulling historical emails on specific senders or topics without flooding the queue.
|
||||
|
||||
Discard: removes an email from the queue without writing to the score file — for emails that don't belong in the training set.
|
||||
|
||||
---
|
||||
|
||||
## Classifier adapters
|
||||
|
||||
`app/classifier_adapters.py` provides a common interface for swapping classifier backends. Falls back to the label name when no `LABEL_DESCRIPTIONS` entry is configured for a label (RerankerAdapter).
|
||||
|
||||
---
|
||||
|
||||
## License
|
||||
|
||||
BSL 1.1 — internal tool, not user-facing.
|
||||
|
||||
© 2026 Circuit Forge LLC
|
||||
|
|
@ -155,6 +155,9 @@ app.include_router(cforch_router, prefix="/api/cforch")
|
|||
from app.imitate import router as imitate_router
|
||||
app.include_router(imitate_router, prefix="/api/imitate")
|
||||
|
||||
from app.style import router as style_router
|
||||
app.include_router(style_router, prefix="/api/style")
|
||||
|
||||
# In-memory last-action store (single user, local tool — in-memory is fine)
|
||||
_last_action: dict | None = None
|
||||
|
||||
|
|
|
|||
310
app/imitate.py
310
app/imitate.py
|
|
@ -11,6 +11,7 @@ override _CONFIG_DIR and _DATA_DIR via set_config_dir() / set_data_dir() in test
|
|||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import base64
|
||||
import json
|
||||
import logging
|
||||
import time
|
||||
|
|
@ -21,6 +22,7 @@ from typing import Any
|
|||
from urllib.error import URLError
|
||||
from urllib.request import Request, urlopen
|
||||
|
||||
import httpx
|
||||
import yaml
|
||||
from fastapi import APIRouter, HTTPException
|
||||
from fastapi.responses import StreamingResponse
|
||||
|
|
@ -87,6 +89,45 @@ def _ollama_url(cfg: dict) -> str:
|
|||
return cfg.get("ollama_url") or cforch.get("ollama_url") or "http://localhost:11434"
|
||||
|
||||
|
||||
def _cforch_url() -> str:
|
||||
cforch = _load_cforch_config()
|
||||
return cforch.get("coordinator_url") or "http://localhost:7700"
|
||||
|
||||
|
||||
def _cforch_catalog(cforch_base: str) -> list[dict]:
|
||||
"""Fetch the live cf-text catalog from cf-orch.
|
||||
|
||||
Filters out proxy entries (ollama://, vllm://, http://) — those models are
|
||||
served by their own services and should not be allocated via cf-text.
|
||||
Returns only models with real file-system paths that cf-text can load directly.
|
||||
"""
|
||||
try:
|
||||
resp = httpx.get(
|
||||
f"{cforch_base}/api/services/cf-text/catalog",
|
||||
params={"node_id": "heimdall"},
|
||||
timeout=5.0,
|
||||
)
|
||||
resp.raise_for_status()
|
||||
raw = resp.json()
|
||||
result = []
|
||||
for model_id, entry in raw.items():
|
||||
if not isinstance(entry, dict):
|
||||
continue
|
||||
path = entry.get("path", "")
|
||||
# Skip proxy entries — they're routed through other services
|
||||
if "://" in path:
|
||||
continue
|
||||
result.append({
|
||||
"id": model_id,
|
||||
"vram_mb": entry.get("vram_mb", 0),
|
||||
"description": entry.get("description", ""),
|
||||
})
|
||||
return result
|
||||
except Exception as exc:
|
||||
logger.warning("Could not fetch cf-orch catalog: %s", exc)
|
||||
return []
|
||||
|
||||
|
||||
def _http_get_json(url: str, timeout: int = 5) -> Any:
|
||||
"""Fetch JSON from url; raise URLError on failure."""
|
||||
req = Request(url, headers={"Accept": "application/json"})
|
||||
|
|
@ -104,18 +145,29 @@ def _is_online(base_url: str, health_path: str = "/api/health") -> bool:
|
|||
|
||||
|
||||
def _extract_sample(
|
||||
raw: Any, text_fields: list[str], sample_index: int = 0
|
||||
raw: Any,
|
||||
text_fields: list[str],
|
||||
sample_index: int = 0,
|
||||
sample_key: str | None = None,
|
||||
) -> dict[str, Any]:
|
||||
"""Pull one item from a list or dict response and extract text_fields."""
|
||||
"""Pull one item from a list or dict response and extract text_fields.
|
||||
|
||||
sample_key: if provided, unwrap raw[sample_key] before looking for a list.
|
||||
Falls back to a set of conventional envelope keys if sample_key is absent.
|
||||
"""
|
||||
item: dict[str, Any]
|
||||
if isinstance(raw, list):
|
||||
if not raw:
|
||||
return {}
|
||||
item = raw[min(sample_index, len(raw) - 1)]
|
||||
elif isinstance(raw, dict):
|
||||
# may be {items: [...]} or the item itself
|
||||
for key in ("items", "results", "data", "jobs", "listings", "pantry",
|
||||
"saved_searches", "entries", "calls", "records"):
|
||||
# Use declared sample_key first, then fall back to conventional names.
|
||||
_ENVELOPE_KEYS = (
|
||||
"samples", "items", "results", "data", "jobs", "listings",
|
||||
"pantry", "saved_searches", "entries", "calls", "records",
|
||||
)
|
||||
search_keys = ([sample_key] if sample_key else []) + list(_ENVELOPE_KEYS)
|
||||
for key in search_keys:
|
||||
if key in raw and isinstance(raw[key], list):
|
||||
lst = raw[key]
|
||||
item = lst[min(sample_index, len(lst) - 1)] if lst else {}
|
||||
|
|
@ -141,24 +193,49 @@ def _sse(data: dict) -> str:
|
|||
return f"data: {json.dumps(data)}\n\n"
|
||||
|
||||
|
||||
def _fetch_image_b64(image_url: str) -> str:
|
||||
"""Download an image URL and return it as a base64 string for ollama.
|
||||
|
||||
Returns empty string on any failure — a missing image is non-fatal;
|
||||
the model will still run against the text prompt alone.
|
||||
"""
|
||||
try:
|
||||
req = Request(image_url, headers={"User-Agent": "Avocet/1.0"})
|
||||
with urlopen(req, timeout=10) as resp:
|
||||
return base64.b64encode(resp.read()).decode("ascii")
|
||||
except Exception as exc:
|
||||
logger.warning("Failed to fetch image %s: %s", image_url, exc)
|
||||
return ""
|
||||
|
||||
|
||||
def _run_ollama_streaming(
|
||||
ollama_base: str,
|
||||
model_id: str,
|
||||
prompt: str,
|
||||
temperature: float,
|
||||
system: str = "",
|
||||
images: list[str] | None = None,
|
||||
) -> tuple[str, int]:
|
||||
"""Call ollama /api/generate with stream=True; return (full_response, elapsed_ms).
|
||||
"""Call ollama /api/generate with stream=False; return (full_response, elapsed_ms).
|
||||
|
||||
Blocks until the model finishes; yields nothing — streaming is handled by
|
||||
the SSE generator in run_imitate().
|
||||
|
||||
system: optional system prompt passed as a separate field to ollama.
|
||||
images: list of base64-encoded image strings (vision models only).
|
||||
"""
|
||||
url = f"{ollama_base.rstrip('/')}/api/generate"
|
||||
payload = json.dumps({
|
||||
body: dict = {
|
||||
"model": model_id,
|
||||
"prompt": prompt,
|
||||
"stream": False,
|
||||
"options": {"temperature": temperature},
|
||||
}).encode("utf-8")
|
||||
}
|
||||
if system:
|
||||
body["system"] = system
|
||||
if images:
|
||||
body["images"] = images
|
||||
payload = json.dumps(body).encode("utf-8")
|
||||
req = Request(url, data=payload, method="POST",
|
||||
headers={"Content-Type": "application/json"})
|
||||
t0 = time.time()
|
||||
|
|
@ -172,6 +249,122 @@ def _run_ollama_streaming(
|
|||
raise RuntimeError(str(exc)) from exc
|
||||
|
||||
|
||||
def _run_cftext(
|
||||
cforch_base: str,
|
||||
model_id: str,
|
||||
prompt: str,
|
||||
system: str,
|
||||
temperature: float,
|
||||
startup_timeout_s: float = 180.0,
|
||||
) -> tuple[str, int, bool]:
|
||||
"""Allocate cf-text via cf-orch, generate, release. Returns (response, elapsed_ms, cold_started).
|
||||
|
||||
Raises RuntimeError on allocation failure or generation error.
|
||||
cold_started=True means the service was launched from scratch (caller may log this).
|
||||
|
||||
Cold-start detection uses coordinator state signals (running/stopped) rather than
|
||||
polling the service health endpoint — this fails fast on model load errors instead
|
||||
of waiting out the full timeout.
|
||||
"""
|
||||
# Allocate
|
||||
alloc_resp = httpx.post(
|
||||
f"{cforch_base}/api/services/cf-text/allocate",
|
||||
json={
|
||||
"model_candidates": [model_id],
|
||||
"caller": "avocet",
|
||||
"pipeline": "imitate",
|
||||
},
|
||||
timeout=30.0,
|
||||
)
|
||||
alloc_resp.raise_for_status()
|
||||
data = alloc_resp.json()
|
||||
service_url: str = data["url"]
|
||||
allocation_id: str = data.get("allocation_id", "")
|
||||
node_id: str = data.get("node_id", "")
|
||||
gpu_id: int | None = data.get("gpu_id")
|
||||
cold_started = data.get("started", False) and not data.get("warm", True)
|
||||
|
||||
# Wait for ready using coordinator state signals
|
||||
if cold_started:
|
||||
deadline = time.monotonic() + startup_timeout_s
|
||||
probe_misses = 0
|
||||
while time.monotonic() < deadline:
|
||||
try:
|
||||
status = httpx.get(
|
||||
f"{cforch_base}/api/services/cf-text/status", timeout=5.0
|
||||
)
|
||||
if status.is_success:
|
||||
instances = status.json().get("instances", [])
|
||||
match = next(
|
||||
(i for i in instances
|
||||
if i.get("node_id") == node_id and i.get("gpu_id") == gpu_id),
|
||||
None,
|
||||
)
|
||||
if match:
|
||||
probe_misses = 0
|
||||
state = match.get("state", "")
|
||||
if state == "running":
|
||||
break
|
||||
elif state == "stopped":
|
||||
if allocation_id:
|
||||
httpx.delete(
|
||||
f"{cforch_base}/api/services/cf-text/allocations/{allocation_id}",
|
||||
timeout=5.0,
|
||||
)
|
||||
raise RuntimeError(f"cf-text failed to load {model_id!r} (service stopped)")
|
||||
else:
|
||||
probe_misses += 1
|
||||
if probe_misses >= 6:
|
||||
# Coordinator hasn't registered instance yet — fall back to health poll
|
||||
try:
|
||||
if httpx.get(f"{service_url}/health", timeout=3.0).is_success:
|
||||
break
|
||||
except Exception:
|
||||
pass
|
||||
except RuntimeError:
|
||||
raise
|
||||
except Exception:
|
||||
pass
|
||||
time.sleep(2.0)
|
||||
else:
|
||||
if allocation_id:
|
||||
httpx.delete(f"{cforch_base}/api/services/cf-text/allocations/{allocation_id}", timeout=5.0)
|
||||
raise RuntimeError(f"cf-text cold start timed out after {startup_timeout_s:.0f}s")
|
||||
|
||||
# Generate
|
||||
messages: list[dict] = []
|
||||
if system:
|
||||
messages.append({"role": "system", "content": system})
|
||||
messages.append({"role": "user", "content": prompt})
|
||||
|
||||
t0 = time.time()
|
||||
try:
|
||||
gen_resp = httpx.post(
|
||||
f"{service_url}/v1/chat/completions",
|
||||
json={
|
||||
"model": model_id,
|
||||
"messages": messages,
|
||||
"max_tokens": 300,
|
||||
"temperature": temperature,
|
||||
"stream": False,
|
||||
},
|
||||
timeout=120.0,
|
||||
)
|
||||
gen_resp.raise_for_status()
|
||||
elapsed_ms = int((time.time() - t0) * 1000)
|
||||
content = gen_resp.json()["choices"][0]["message"]["content"]
|
||||
return content.strip(), elapsed_ms, cold_started
|
||||
except Exception as exc:
|
||||
elapsed_ms = int((time.time() - t0) * 1000)
|
||||
raise RuntimeError(str(exc)) from exc
|
||||
finally:
|
||||
if allocation_id:
|
||||
try:
|
||||
httpx.delete(f"{cforch_base}/api/services/cf-text/allocations/{allocation_id}", timeout=5.0)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
|
||||
# ── GET /products ──────────────────────────────────────────────────────────────
|
||||
|
||||
@router.get("/products")
|
||||
|
|
@ -226,52 +419,96 @@ def get_sample(product_id: str, index: int = 0) -> dict:
|
|||
raise HTTPException(502, f"Bad response from product API: {exc}") from exc
|
||||
|
||||
text_fields = product.get("text_fields", []) or []
|
||||
extracted = _extract_sample(raw, text_fields, index)
|
||||
sample_key = product.get("sample_key") or None
|
||||
extracted = _extract_sample(raw, text_fields, index, sample_key=sample_key)
|
||||
if not extracted:
|
||||
raise HTTPException(404, "No sample items returned by product API")
|
||||
|
||||
prompt_template = product.get("prompt_template", "{text}")
|
||||
prompt = prompt_template.replace("{text}", extracted["text"])
|
||||
# Also substitute any {field_name} placeholders from the raw item fields.
|
||||
item = extracted.get("item", {})
|
||||
for field, val in item.items():
|
||||
prompt = prompt.replace(f"{{{field}}}", str(val) if val is not None else "")
|
||||
|
||||
# Expose system_prompt and image_url if the product API returns them.
|
||||
# system_prompt: Peregrine, Snipe (vision analysis instructions)
|
||||
# image_url: Snipe listing photos — Avocet downloads + base64-encodes at run time
|
||||
item = extracted.get("item", {})
|
||||
system_prompt = str(item.get("system_prompt", "")) if isinstance(item, dict) else ""
|
||||
image_url = str(item.get("image_url", "")) if isinstance(item, dict) else ""
|
||||
|
||||
return {
|
||||
"product_id": product_id,
|
||||
"sample_index": index,
|
||||
"text": extracted["text"],
|
||||
"prompt": prompt,
|
||||
"raw_item": extracted.get("item", {}),
|
||||
"system_prompt": system_prompt,
|
||||
"image_url": image_url,
|
||||
"raw_item": item,
|
||||
}
|
||||
|
||||
|
||||
# ── GET /catalog ───────────────────────────────────────────────────────────────
|
||||
|
||||
@router.get("/catalog")
|
||||
def get_catalog() -> dict:
|
||||
"""Return the live cf-text model catalog from cf-orch coordinator."""
|
||||
models = _cforch_catalog(_cforch_url())
|
||||
return {"models": models}
|
||||
|
||||
|
||||
# ── GET /run (SSE) ─────────────────────────────────────────────────────────────
|
||||
|
||||
@router.get("/run")
|
||||
def run_imitate(
|
||||
prompt: str = "",
|
||||
model_ids: str = "", # comma-separated ollama model IDs
|
||||
model_ids: str = "", # comma-separated ollama model IDs
|
||||
cf_text_model_ids: str = "", # comma-separated cf-text model IDs (via cf-orch)
|
||||
temperature: float = 0.7,
|
||||
product_id: str = "",
|
||||
system: str = "", # optional system prompt
|
||||
image_url: str = "", # optional image URL for vision models
|
||||
) -> StreamingResponse:
|
||||
"""Run a prompt through selected ollama models and stream results as SSE."""
|
||||
"""Run a prompt through selected ollama models and stream results as SSE.
|
||||
|
||||
If image_url is provided, the image is downloaded once and passed to every
|
||||
model as a base64-encoded blob — allowing vision-capable local models to
|
||||
evaluate listing photos the same way Snipe's background task pipeline does.
|
||||
"""
|
||||
|
||||
if not prompt.strip():
|
||||
raise HTTPException(422, "prompt is required")
|
||||
|
||||
ids = [m.strip() for m in model_ids.split(",") if m.strip()]
|
||||
if not ids:
|
||||
raise HTTPException(422, "model_ids is required")
|
||||
ollama_ids = [m.strip() for m in model_ids.split(",") if m.strip()]
|
||||
cftext_ids = [m.strip() for m in cf_text_model_ids.split(",") if m.strip()]
|
||||
if not ollama_ids and not cftext_ids:
|
||||
raise HTTPException(422, "model_ids or cf_text_model_ids is required")
|
||||
|
||||
cfg = _load_imitate_config()
|
||||
ollama_base = _ollama_url(cfg)
|
||||
cforch_base = _cforch_url()
|
||||
system_ctx = system.strip() or ""
|
||||
total_models = len(ollama_ids) + len(cftext_ids)
|
||||
|
||||
# Download image once before streaming — shared across ollama vision models
|
||||
images: list[str] = []
|
||||
if image_url.strip():
|
||||
b64 = _fetch_image_b64(image_url.strip())
|
||||
if b64:
|
||||
images = [b64]
|
||||
|
||||
def generate():
|
||||
results: list[dict] = []
|
||||
yield _sse({"type": "start", "total_models": len(ids)})
|
||||
yield _sse({"type": "start", "total_models": total_models, "has_image": bool(images)})
|
||||
|
||||
for model_id in ids:
|
||||
yield _sse({"type": "model_start", "model": model_id})
|
||||
# Ollama models
|
||||
for model_id in ollama_ids:
|
||||
yield _sse({"type": "model_start", "model": model_id, "service": "ollama"})
|
||||
try:
|
||||
response, elapsed_ms = _run_ollama_streaming(
|
||||
ollama_base, model_id, prompt, temperature
|
||||
ollama_base, model_id, prompt, temperature,
|
||||
system=system_ctx, images=images or None,
|
||||
)
|
||||
result = {
|
||||
"model": model_id,
|
||||
|
|
@ -289,6 +526,41 @@ def run_imitate(
|
|||
results.append(result)
|
||||
yield _sse({"type": "model_done", **result})
|
||||
|
||||
# cf-text models via cf-orch — fan out in parallel when multiple models selected
|
||||
if cftext_ids:
|
||||
from concurrent.futures import ThreadPoolExecutor, as_completed
|
||||
|
||||
# Announce all models upfront so the UI can show loading states immediately
|
||||
for model_id in cftext_ids:
|
||||
yield _sse({"type": "model_start", "model": model_id, "service": "cf-text"})
|
||||
|
||||
with ThreadPoolExecutor(max_workers=len(cftext_ids)) as pool:
|
||||
future_to_model = {
|
||||
pool.submit(_run_cftext, cforch_base, mid, prompt, system_ctx, temperature): mid
|
||||
for mid in cftext_ids
|
||||
}
|
||||
for future in as_completed(future_to_model):
|
||||
model_id = future_to_model[future]
|
||||
try:
|
||||
response, elapsed_ms, cold_started = future.result()
|
||||
if cold_started:
|
||||
yield _sse({"type": "model_coldstart", "model": model_id})
|
||||
result = {
|
||||
"model": model_id,
|
||||
"response": response,
|
||||
"elapsed_ms": elapsed_ms,
|
||||
"error": None,
|
||||
}
|
||||
except Exception as exc:
|
||||
result = {
|
||||
"model": model_id,
|
||||
"response": "",
|
||||
"elapsed_ms": 0,
|
||||
"error": str(exc),
|
||||
}
|
||||
results.append(result)
|
||||
yield _sse({"type": "model_done", **result})
|
||||
|
||||
yield _sse({"type": "complete", "results": results})
|
||||
|
||||
return StreamingResponse(
|
||||
|
|
|
|||
686
app/models.py
686
app/models.py
|
|
@ -14,11 +14,12 @@ from __future__ import annotations
|
|||
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import shutil
|
||||
import threading
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
from typing import Any, TypedDict
|
||||
from uuid import uuid4
|
||||
|
||||
import httpx
|
||||
|
|
@ -39,21 +40,67 @@ _ROOT = Path(__file__).parent.parent
|
|||
_MODELS_DIR: Path = _ROOT / "models"
|
||||
_QUEUE_DIR: Path = _ROOT / "data"
|
||||
|
||||
# Service-specific model destinations.
|
||||
# cf-text models land on the NFS-mounted shared asset store so every cluster
|
||||
# node can reach them without a separate download. Avocet classifiers stay local
|
||||
# because they are fine-tuned in-place and are only consumed by avocet itself.
|
||||
# Override via CF_TEXT_MODELS_DIR env var (useful for dev / non-NFS setups).
|
||||
_CF_TEXT_MODELS_DIR: Path = Path(
|
||||
os.environ.get("CF_TEXT_MODELS_DIR", "/Library/Assets/LLM/cf-text/models")
|
||||
)
|
||||
|
||||
# Directory containing per-node YAML profiles for cf-orch.
|
||||
# Auto-registration writes new catalog entries here on model download.
|
||||
_CF_ORCH_PROFILES_DIR: Path = Path(
|
||||
os.environ.get(
|
||||
"CF_ORCH_PROFILES_DIR",
|
||||
"/Library/Development/CircuitForge/circuitforge-orch/circuitforge_orch/profiles/nodes",
|
||||
)
|
||||
)
|
||||
|
||||
router = APIRouter()
|
||||
|
||||
# ── Download progress shared state ────────────────────────────────────────────
|
||||
# Updated by the background download thread; read by GET /download/stream.
|
||||
_download_progress: dict[str, Any] = {}
|
||||
|
||||
# ── HF pipeline_tag → adapter recommendation ──────────────────────────────────
|
||||
_TAG_TO_ADAPTER: dict[str, str] = {
|
||||
"zero-shot-classification": "ZeroShotAdapter",
|
||||
"text-classification": "ZeroShotAdapter",
|
||||
"natural-language-inference": "ZeroShotAdapter",
|
||||
"sentence-similarity": "RerankerAdapter",
|
||||
"text-ranking": "RerankerAdapter",
|
||||
"text-generation": "GenerationAdapter",
|
||||
"text2text-generation": "GenerationAdapter",
|
||||
# ── HF pipeline_tag → CF service info ────────────────────────────────────────
|
||||
|
||||
|
||||
class _TagInfo(TypedDict):
|
||||
adapter: str | None # Avocet adapter class, or None if handled by another service
|
||||
role: str # Human-readable model role (classifier, stt, tts, vision, …)
|
||||
service: str # CF service that consumes this model type
|
||||
|
||||
|
||||
_TAG_TO_INFO: dict[str, _TagInfo] = {
|
||||
# Avocet email classifiers
|
||||
"zero-shot-classification": {"adapter": "ZeroShotAdapter", "role": "classifier", "service": "avocet"},
|
||||
"text-classification": {"adapter": "ZeroShotAdapter", "role": "classifier", "service": "avocet"},
|
||||
"natural-language-inference": {"adapter": "ZeroShotAdapter", "role": "classifier", "service": "avocet"},
|
||||
"sentence-similarity": {"adapter": "RerankerAdapter", "role": "reranker", "service": "avocet"},
|
||||
"text-ranking": {"adapter": "RerankerAdapter", "role": "reranker", "service": "avocet"},
|
||||
"text-generation": {"adapter": "GenerationAdapter", "role": "generator", "service": "cf-text"},
|
||||
"text2text-generation": {"adapter": "GenerationAdapter", "role": "generator", "service": "cf-text"},
|
||||
"summarization": {"adapter": "GenerationAdapter", "role": "generator", "service": "cf-text"},
|
||||
# STT — cf-stt speech recognition service
|
||||
"automatic-speech-recognition": {"adapter": None, "role": "stt", "service": "cf-stt"},
|
||||
# Audio language models — audio + text → text (understanding, QA, captioning)
|
||||
"audio-text-to-text": {"adapter": None, "role": "alm", "service": "cf-stt"},
|
||||
# Audio classification — cf-voice sidecar context stream
|
||||
"audio-classification": {"adapter": None, "role": "classifier", "service": "cf-voice"},
|
||||
# TTS — cf-tts text-to-speech service
|
||||
"text-to-speech": {"adapter": None, "role": "tts", "service": "cf-tts"},
|
||||
# Vision — cf-vision image classification / embedding / VLM service
|
||||
"image-classification": {"adapter": None, "role": "vision", "service": "cf-vision"},
|
||||
"zero-shot-image-classification": {"adapter": None, "role": "vision", "service": "cf-vision"},
|
||||
"image-feature-extraction": {"adapter": None, "role": "embedding", "service": "cf-vision"},
|
||||
"image-text-to-text": {"adapter": None, "role": "vlm", "service": "cf-vision"},
|
||||
"visual-question-answering": {"adapter": None, "role": "vlm", "service": "cf-vision"},
|
||||
# Image generation — cf-image (text → image; distinct from cf-vision image understanding)
|
||||
"text-to-image": {"adapter": None, "role": "image-gen", "service": "cf-image"},
|
||||
# Embedding — cf-core shared embedding layer
|
||||
"feature-extraction": {"adapter": None, "role": "embedding", "service": "cf-core"},
|
||||
}
|
||||
|
||||
|
||||
|
|
@ -84,14 +131,31 @@ def _write_queue(records: list[dict]) -> None:
|
|||
|
||||
|
||||
def _safe_model_name(repo_id: str) -> str:
|
||||
"""Convert repo_id to a filesystem-safe directory name (HF convention)."""
|
||||
"""Convert repo_id to a filesystem-safe directory name.
|
||||
|
||||
Uses the HuggingFace Hub convention: owner/model-name → owner--model-name.
|
||||
This matches what snapshot_download produces under local_dir and what
|
||||
cf-orch uses when constructing model paths for cf-text allocations.
|
||||
"""
|
||||
return repo_id.replace("/", "--")
|
||||
|
||||
|
||||
def _is_installed(repo_id: str) -> bool:
|
||||
"""Check if a model is already downloaded in _MODELS_DIR."""
|
||||
def _model_dir_for(repo_id: str, service: str | None) -> Path:
|
||||
"""Return the download destination directory for a model.
|
||||
|
||||
cf-text models → NFS shared asset store (_CF_TEXT_MODELS_DIR) so every
|
||||
cluster node can load them without a separate download.
|
||||
All other services (avocet classifiers, fine-tunes) → local _MODELS_DIR.
|
||||
"""
|
||||
safe_name = _safe_model_name(repo_id)
|
||||
model_dir = _MODELS_DIR / safe_name
|
||||
if service == "cf-text":
|
||||
return _CF_TEXT_MODELS_DIR / safe_name
|
||||
return _MODELS_DIR / safe_name
|
||||
|
||||
|
||||
def _is_installed(repo_id: str, service: str | None = None) -> bool:
|
||||
"""Check if a model is already downloaded in the appropriate destination."""
|
||||
model_dir = _model_dir_for(repo_id, service)
|
||||
return model_dir.exists() and (
|
||||
(model_dir / "config.json").exists()
|
||||
or (model_dir / "training_info.json").exists()
|
||||
|
|
@ -125,48 +189,289 @@ def _get_queue_entry(entry_id: str) -> dict | None:
|
|||
return None
|
||||
|
||||
|
||||
# ── cf-orch catalog auto-registration ─────────────────────────────────────────
|
||||
|
||||
|
||||
def _catalog_key(repo_id: str) -> str:
|
||||
"""Derive a readable catalog key from repo_id.
|
||||
|
||||
ibm-granite/granite-4.1-8b → granite-4.1-8b
|
||||
facebook/bart-large-cnn → bart-large-cnn
|
||||
"""
|
||||
return repo_id.split("/", 1)[-1].lower()
|
||||
|
||||
|
||||
def _insert_catalog_entry(content: str, entry_lines: str) -> str:
|
||||
"""Insert entry_lines at the end of the cf-text.catalog section.
|
||||
|
||||
Scans line by line to preserve all comments and original formatting.
|
||||
Returns content unchanged if the catalog section cannot be located.
|
||||
"""
|
||||
lines = content.splitlines(keepends=True)
|
||||
|
||||
in_cf_text = False
|
||||
in_catalog = False
|
||||
|
||||
for i, line in enumerate(lines):
|
||||
stripped = line.lstrip()
|
||||
indent = len(line) - len(stripped)
|
||||
blank_or_comment = not stripped or stripped.startswith("#")
|
||||
|
||||
if not in_cf_text:
|
||||
if indent == 2 and stripped.startswith("cf-text:"):
|
||||
in_cf_text = True
|
||||
continue
|
||||
|
||||
if not in_catalog:
|
||||
if indent == 4 and stripped.startswith("catalog:"):
|
||||
in_catalog = True
|
||||
elif not blank_or_comment and indent <= 2:
|
||||
# Left cf-text section without finding a catalog
|
||||
return content
|
||||
continue
|
||||
|
||||
# Inside catalog: first non-blank/comment line with indent < 6 ends it
|
||||
if not blank_or_comment and indent < 6:
|
||||
prefix = "\n" if lines[i - 1].strip() else ""
|
||||
lines.insert(i, prefix + entry_lines)
|
||||
return "".join(lines)
|
||||
|
||||
# Catalog ran to EOF — append there
|
||||
if in_catalog:
|
||||
prefix = "\n" if lines and lines[-1].strip() else ""
|
||||
lines.append(prefix + entry_lines)
|
||||
return "".join(lines)
|
||||
|
||||
return content
|
||||
|
||||
|
||||
def _register_in_node_catalogs(
|
||||
repo_id: str,
|
||||
local_path: Path,
|
||||
vram_mb_fp16: int,
|
||||
role: str,
|
||||
) -> list[str]:
|
||||
"""Insert a cf-text catalog entry into every eligible node YAML.
|
||||
|
||||
A node is eligible when:
|
||||
- It has a ``cf-text.catalog`` section
|
||||
- The model fits within the node's ``cf-text.max_mb`` at FP16 *or* 4-bit
|
||||
- Neither the model key nor the local path is already in the catalog
|
||||
|
||||
Returns the list of node names that were updated.
|
||||
"""
|
||||
try:
|
||||
import yaml # lazy — not in the critical import path
|
||||
except ImportError:
|
||||
logger.warning("PyYAML not available — skipping catalog registration for %s", repo_id)
|
||||
return []
|
||||
|
||||
profiles_dir = _CF_ORCH_PROFILES_DIR
|
||||
if not profiles_dir.exists():
|
||||
logger.warning(
|
||||
"cf-orch profiles dir not found: %s — skipping catalog registration", profiles_dir
|
||||
)
|
||||
return []
|
||||
|
||||
model_key = _catalog_key(repo_id)
|
||||
local_path_str = str(local_path)
|
||||
vram_4bit = round(vram_mb_fp16 / 4 * 1.1)
|
||||
updated: list[str] = []
|
||||
|
||||
for yaml_file in sorted(profiles_dir.glob("*.yaml")):
|
||||
try:
|
||||
content = yaml_file.read_text(encoding="utf-8")
|
||||
data = yaml.safe_load(content)
|
||||
|
||||
cf_text = (data.get("services") or {}).get("cf-text")
|
||||
if not cf_text:
|
||||
continue
|
||||
|
||||
max_mb: int = cf_text.get("max_mb", 0)
|
||||
catalog: dict = cf_text.get("catalog") or {}
|
||||
|
||||
# Skip if key already exists
|
||||
if model_key in catalog:
|
||||
logger.debug("Key %r already in %s — skipping", model_key, yaml_file.name)
|
||||
continue
|
||||
|
||||
# Skip if any existing entry already points at this path (or a file within it)
|
||||
registered_paths = {
|
||||
str(entry.get("path", ""))
|
||||
for entry in catalog.values()
|
||||
if isinstance(entry, dict)
|
||||
}
|
||||
if local_path_str in registered_paths or any(
|
||||
p.startswith(local_path_str + "/") for p in registered_paths
|
||||
):
|
||||
logger.debug("Path %s already registered in %s — skipping", local_path_str, yaml_file.name)
|
||||
continue
|
||||
|
||||
# Determine whether model fits at FP16 or needs 4-bit
|
||||
if vram_mb_fp16 <= max_mb:
|
||||
vram_for_node = vram_mb_fp16
|
||||
needs_4bit = False
|
||||
elif vram_4bit <= max_mb:
|
||||
vram_for_node = vram_4bit
|
||||
needs_4bit = True
|
||||
else:
|
||||
logger.debug(
|
||||
"%s too large for %s (fp16=%d MB, 4bit=%d MB, max=%d MB)",
|
||||
repo_id, yaml_file.name, vram_mb_fp16, vram_4bit, max_mb,
|
||||
)
|
||||
continue
|
||||
|
||||
desc = f"{repo_id} ({role}, downloaded via avocet)"
|
||||
if needs_4bit:
|
||||
desc += " — CF_TEXT_4BIT=1 required"
|
||||
|
||||
vram_comment = (
|
||||
f" # 4-bit estimate; FP16 footprint is {vram_mb_fp16} MB"
|
||||
if needs_4bit
|
||||
else f" # FP16 file-size estimate"
|
||||
)
|
||||
entry_block = (
|
||||
f" # auto-registered by avocet on download\n"
|
||||
f" {model_key}:\n"
|
||||
f" path: {local_path_str}\n"
|
||||
f" vram_mb: {vram_for_node}{vram_comment}\n"
|
||||
f" description: \"{desc}\"\n"
|
||||
)
|
||||
|
||||
new_content = _insert_catalog_entry(content, entry_block)
|
||||
if new_content == content:
|
||||
logger.warning("Could not find catalog insertion point in %s", yaml_file.name)
|
||||
continue
|
||||
|
||||
yaml_file.write_text(new_content, encoding="utf-8")
|
||||
updated.append(yaml_file.stem)
|
||||
logger.info(
|
||||
"Registered %s in %s (vram_mb=%d, 4bit=%s)",
|
||||
model_key, yaml_file.name, vram_for_node, needs_4bit,
|
||||
)
|
||||
|
||||
except Exception as exc:
|
||||
logger.warning("Could not update %s: %s", yaml_file.name, exc)
|
||||
|
||||
return updated
|
||||
|
||||
|
||||
# ── Background download ────────────────────────────────────────────────────────
|
||||
|
||||
def _run_download(entry_id: str, repo_id: str, pipeline_tag: str | None, adapter_recommendation: str | None) -> None:
|
||||
"""Background thread: download model via huggingface_hub.snapshot_download."""
|
||||
def _poll_disk_progress(local_dir: Path, total_bytes: int, stop_event: threading.Event) -> None:
|
||||
"""Side-thread: poll local_dir size every 2s and update _download_progress.
|
||||
|
||||
snapshot_download is a blocking call with no progress callback, so we watch
|
||||
the destination directory grow on disk as a proxy for download progress.
|
||||
total_bytes=0 means we don't know the target size; pct stays 0 until done.
|
||||
"""
|
||||
import time
|
||||
while not stop_event.is_set():
|
||||
try:
|
||||
downloaded = sum(
|
||||
f.stat().st_size for f in local_dir.rglob("*") if f.is_file()
|
||||
)
|
||||
_download_progress["downloaded_bytes"] = downloaded
|
||||
if total_bytes > 0:
|
||||
_download_progress["total_bytes"] = total_bytes
|
||||
_download_progress["pct"] = min(downloaded / total_bytes * 100, 99.0)
|
||||
except Exception:
|
||||
pass
|
||||
time.sleep(2)
|
||||
|
||||
|
||||
def _run_download(
|
||||
entry_id: str,
|
||||
repo_id: str,
|
||||
pipeline_tag: str | None,
|
||||
adapter_recommendation: str | None,
|
||||
role: str | None = None,
|
||||
service: str | None = None,
|
||||
model_size_bytes: int = 0,
|
||||
) -> None:
|
||||
"""Background thread: download model via huggingface_hub.snapshot_download.
|
||||
|
||||
model_size_bytes is the sum of file sizes reported by the HF API (siblings).
|
||||
It is used to estimate vram_mb and written to model_info.json so cf-orch can
|
||||
budget VRAM when allocating a cf-text instance for this model.
|
||||
"""
|
||||
global _download_progress
|
||||
safe_name = _safe_model_name(repo_id)
|
||||
local_dir = _MODELS_DIR / safe_name
|
||||
local_dir = _model_dir_for(repo_id, service)
|
||||
|
||||
_download_progress = {
|
||||
"active": True,
|
||||
"repo_id": repo_id,
|
||||
"downloaded_bytes": 0,
|
||||
"total_bytes": 0,
|
||||
"total_bytes": model_size_bytes,
|
||||
"pct": 0.0,
|
||||
"done": False,
|
||||
"error": None,
|
||||
}
|
||||
|
||||
stop_poll = threading.Event()
|
||||
poll_thread = threading.Thread(
|
||||
target=_poll_disk_progress,
|
||||
args=(local_dir, model_size_bytes, stop_poll),
|
||||
daemon=True,
|
||||
name=f"model-poll-{entry_id}",
|
||||
)
|
||||
|
||||
try:
|
||||
if snapshot_download is None:
|
||||
raise RuntimeError("huggingface_hub is not installed")
|
||||
|
||||
local_dir.mkdir(parents=True, exist_ok=True)
|
||||
poll_thread.start()
|
||||
snapshot_download(
|
||||
repo_id=repo_id,
|
||||
local_dir=str(local_dir),
|
||||
)
|
||||
|
||||
# Write model_info.json alongside downloaded files
|
||||
# Estimate VRAM from reported file size.
|
||||
# HF siblings sizes are pre-quantisation file sizes; add 10% for KV cache
|
||||
# and runtime overhead. Falls back to a stat of the local dir if 0.
|
||||
if model_size_bytes == 0:
|
||||
model_size_bytes = sum(
|
||||
f.stat().st_size for f in local_dir.rglob("*") if f.is_file()
|
||||
)
|
||||
vram_mb = int(model_size_bytes / (1024 * 1024) * 1.1)
|
||||
|
||||
# Write model_info.json alongside downloaded files.
|
||||
# local_path + vram_mb are read by cf-orch at allocation time to resolve
|
||||
# the full model path and grant the correct VRAM lease.
|
||||
model_info = {
|
||||
"repo_id": repo_id,
|
||||
"pipeline_tag": pipeline_tag,
|
||||
"adapter_recommendation": adapter_recommendation,
|
||||
"role": role,
|
||||
"service": service,
|
||||
"model_size_bytes": model_size_bytes,
|
||||
"vram_mb": vram_mb,
|
||||
"local_path": str(local_dir),
|
||||
"downloaded_at": datetime.now(timezone.utc).isoformat(),
|
||||
}
|
||||
local_dir.mkdir(parents=True, exist_ok=True)
|
||||
(local_dir / "model_info.json").write_text(
|
||||
json.dumps(model_info, indent=2), encoding="utf-8"
|
||||
)
|
||||
|
||||
# Auto-register cf-text models in the cf-orch node YAML catalogs so they
|
||||
# appear in the benchmark model list without a manual YAML edit.
|
||||
if service == "cf-text":
|
||||
registered_on = _register_in_node_catalogs(
|
||||
repo_id=repo_id,
|
||||
local_path=local_dir,
|
||||
vram_mb_fp16=vram_mb,
|
||||
role=role or "generator",
|
||||
)
|
||||
if registered_on:
|
||||
logger.info(
|
||||
"Auto-registered %s in node catalogs: %s",
|
||||
repo_id, ", ".join(registered_on),
|
||||
)
|
||||
|
||||
_download_progress["done"] = True
|
||||
_download_progress["pct"] = 100.0
|
||||
_update_queue_entry(entry_id, {"status": "ready"})
|
||||
_update_queue_entry(entry_id, {"status": "ready", "local_path": str(local_dir)})
|
||||
|
||||
except Exception as exc:
|
||||
logger.exception("Download failed for %s: %s", repo_id, exc)
|
||||
|
|
@ -174,6 +479,7 @@ def _run_download(entry_id: str, repo_id: str, pipeline_tag: str | None, adapter
|
|||
_download_progress["done"] = True
|
||||
_update_queue_entry(entry_id, {"status": "failed", "error": str(exc)})
|
||||
finally:
|
||||
stop_poll.set()
|
||||
_download_progress["active"] = False
|
||||
|
||||
|
||||
|
|
@ -199,11 +505,15 @@ def lookup_model(repo_id: str) -> dict:
|
|||
|
||||
data = resp.json()
|
||||
pipeline_tag = data.get("pipeline_tag")
|
||||
adapter_recommendation = _TAG_TO_ADAPTER.get(pipeline_tag) if pipeline_tag else None
|
||||
tag_info = _TAG_TO_INFO.get(pipeline_tag) if pipeline_tag else None
|
||||
adapter_recommendation = tag_info["adapter"] if tag_info else None
|
||||
role = tag_info["role"] if tag_info else None
|
||||
service = tag_info["service"] if tag_info else None
|
||||
|
||||
# Determine compatibility and surface a human-readable warning
|
||||
_supported = ", ".join(sorted(_TAG_TO_ADAPTER.keys()))
|
||||
if adapter_recommendation is not None:
|
||||
_supported = ", ".join(sorted(_TAG_TO_INFO.keys()))
|
||||
if tag_info is not None:
|
||||
# Any recognized tag is compatible — avocet adapters or another CF service
|
||||
compatible = True
|
||||
warning: str | None = None
|
||||
elif pipeline_tag is None:
|
||||
|
|
@ -216,7 +526,7 @@ def lookup_model(repo_id: str) -> dict:
|
|||
else:
|
||||
compatible = False
|
||||
warning = (
|
||||
f"\"{pipeline_tag}\" models are not supported by Avocet's email classification adapters. "
|
||||
f"\"{pipeline_tag}\" models are not yet supported by the CircuitForge model ecosystem. "
|
||||
f"Supported task types: {_supported}."
|
||||
)
|
||||
logger.warning("Unsupported pipeline_tag %r for %s", pipeline_tag, repo_id)
|
||||
|
|
@ -234,6 +544,8 @@ def lookup_model(repo_id: str) -> dict:
|
|||
"repo_id": repo_id,
|
||||
"pipeline_tag": pipeline_tag,
|
||||
"adapter_recommendation": adapter_recommendation,
|
||||
"role": role,
|
||||
"service": service,
|
||||
"compatible": compatible,
|
||||
"warning": warning,
|
||||
"model_size_bytes": model_size_bytes,
|
||||
|
|
@ -261,12 +573,18 @@ class QueueAddRequest(BaseModel):
|
|||
repo_id: str
|
||||
pipeline_tag: str | None = None
|
||||
adapter_recommendation: str | None = None
|
||||
role: str | None = None
|
||||
service: str | None = None
|
||||
# Sum of file sizes from HF API siblings list; 0 if unknown.
|
||||
# Stored in the queue entry so approve can pass it to _run_download
|
||||
# without a second HF API round-trip.
|
||||
model_size_bytes: int = 0
|
||||
|
||||
|
||||
@router.post("/queue", status_code=201)
|
||||
def add_to_queue(req: QueueAddRequest) -> dict:
|
||||
"""Add a model to the approval queue with status 'pending'."""
|
||||
if _is_installed(req.repo_id):
|
||||
if _is_installed(req.repo_id, service=req.service):
|
||||
raise HTTPException(409, f"{req.repo_id!r} is already installed")
|
||||
if _is_queued(req.repo_id):
|
||||
raise HTTPException(409, f"{req.repo_id!r} is already in the queue")
|
||||
|
|
@ -276,6 +594,9 @@ def add_to_queue(req: QueueAddRequest) -> dict:
|
|||
"repo_id": req.repo_id,
|
||||
"pipeline_tag": req.pipeline_tag,
|
||||
"adapter_recommendation": req.adapter_recommendation,
|
||||
"role": req.role,
|
||||
"service": req.service,
|
||||
"model_size_bytes": req.model_size_bytes,
|
||||
"status": "pending",
|
||||
"queued_at": datetime.now(timezone.utc).isoformat(),
|
||||
}
|
||||
|
|
@ -300,7 +621,15 @@ def approve_queue_entry(entry_id: str) -> dict:
|
|||
|
||||
thread = threading.Thread(
|
||||
target=_run_download,
|
||||
args=(entry_id, entry["repo_id"], entry.get("pipeline_tag"), entry.get("adapter_recommendation")),
|
||||
args=(
|
||||
entry_id,
|
||||
entry["repo_id"],
|
||||
entry.get("pipeline_tag"),
|
||||
entry.get("adapter_recommendation"),
|
||||
entry.get("role"),
|
||||
entry.get("service"),
|
||||
entry.get("model_size_bytes", 0),
|
||||
),
|
||||
daemon=True,
|
||||
name=f"model-download-{entry_id}",
|
||||
)
|
||||
|
|
@ -368,81 +697,286 @@ def download_stream() -> StreamingResponse:
|
|||
)
|
||||
|
||||
|
||||
# ── POST /sync-catalogs ────────────────────────────────────────────────────────
|
||||
|
||||
@router.post("/sync-catalogs")
|
||||
def sync_catalogs() -> dict:
|
||||
"""Scan all installed cf-text models and register any missing from node YAMLs.
|
||||
|
||||
Reads model_info.json from each directory in the cf-text models dir and calls
|
||||
_register_in_node_catalogs() for each. Idempotent — skips models already
|
||||
present by key or path.
|
||||
|
||||
Returns a summary of registrations performed.
|
||||
"""
|
||||
if not _CF_TEXT_MODELS_DIR.exists():
|
||||
return {"registered": {}, "skipped": [], "message": "cf-text models dir not found"}
|
||||
|
||||
registered: dict[str, list[str]] = {}
|
||||
skipped: list[str] = []
|
||||
|
||||
for model_dir in sorted(_CF_TEXT_MODELS_DIR.iterdir()):
|
||||
if not model_dir.is_dir():
|
||||
continue
|
||||
info_file = model_dir / "model_info.json"
|
||||
if not info_file.exists():
|
||||
skipped.append(model_dir.name)
|
||||
continue
|
||||
|
||||
try:
|
||||
info = json.loads(info_file.read_text(encoding="utf-8"))
|
||||
except Exception as exc:
|
||||
logger.warning("Could not read model_info.json for %s: %s", model_dir.name, exc)
|
||||
skipped.append(model_dir.name)
|
||||
continue
|
||||
|
||||
if info.get("service") != "cf-text":
|
||||
skipped.append(model_dir.name)
|
||||
continue
|
||||
|
||||
repo_id = info.get("repo_id", model_dir.name)
|
||||
vram_mb = info.get("vram_mb", 0)
|
||||
role = info.get("role", "generator")
|
||||
|
||||
updated_nodes = _register_in_node_catalogs(
|
||||
repo_id=repo_id,
|
||||
local_path=model_dir,
|
||||
vram_mb_fp16=vram_mb,
|
||||
role=role,
|
||||
)
|
||||
if updated_nodes:
|
||||
registered[repo_id] = updated_nodes
|
||||
else:
|
||||
skipped.append(repo_id)
|
||||
|
||||
return {
|
||||
"registered": registered,
|
||||
"skipped": skipped,
|
||||
"message": (
|
||||
f"Registered {len(registered)} model(s) on "
|
||||
f"{sum(len(v) for v in registered.values())} node(s)"
|
||||
if registered
|
||||
else "All models already registered (or no eligible nodes found)"
|
||||
),
|
||||
}
|
||||
|
||||
|
||||
# ── GET /installed ─────────────────────────────────────────────────────────────
|
||||
|
||||
@router.get("/installed")
|
||||
def list_installed() -> list[dict]:
|
||||
"""Scan _MODELS_DIR and return info on each installed model."""
|
||||
if not _MODELS_DIR.exists():
|
||||
return []
|
||||
"""Scan all model directories and return info on each installed model.
|
||||
|
||||
Scans both the local avocet models dir (classifiers, fine-tunes) and the
|
||||
shared NFS cf-text models dir, deduplicating by directory path.
|
||||
|
||||
Falls back to queue entry data when model_info.json has null service/role,
|
||||
so models downloaded before the pipeline_tag registry existed still group
|
||||
correctly in the UI.
|
||||
"""
|
||||
scan_dirs = [_MODELS_DIR]
|
||||
if _CF_TEXT_MODELS_DIR != _MODELS_DIR and _CF_TEXT_MODELS_DIR.exists():
|
||||
scan_dirs.append(_CF_TEXT_MODELS_DIR)
|
||||
|
||||
# Build a lookup from safe directory name → queue entry for fallback enrichment.
|
||||
queue_by_safe_name: dict[str, dict] = {
|
||||
_safe_model_name(r["repo_id"]): r
|
||||
for r in _read_queue()
|
||||
if r.get("repo_id") and r.get("status") not in ("dismissed",)
|
||||
}
|
||||
|
||||
results: list[dict] = []
|
||||
for sub in _MODELS_DIR.iterdir():
|
||||
if not sub.is_dir():
|
||||
seen: set[Path] = set()
|
||||
|
||||
for scan_dir in scan_dirs:
|
||||
if not scan_dir.exists():
|
||||
continue
|
||||
for sub in scan_dir.iterdir():
|
||||
if not sub.is_dir() or sub in seen:
|
||||
continue
|
||||
seen.add(sub)
|
||||
|
||||
has_training_info = (sub / "training_info.json").exists()
|
||||
has_config = (sub / "config.json").exists()
|
||||
has_model_info = (sub / "model_info.json").exists()
|
||||
has_training_info = (sub / "training_info.json").exists()
|
||||
has_config = (sub / "config.json").exists()
|
||||
has_model_info = (sub / "model_info.json").exists()
|
||||
|
||||
if not (has_training_info or has_config or has_model_info):
|
||||
continue
|
||||
if not (has_training_info or has_config or has_model_info):
|
||||
continue
|
||||
|
||||
model_type = "finetuned" if has_training_info else "downloaded"
|
||||
model_type = "finetuned" if has_training_info else "downloaded"
|
||||
|
||||
# Compute directory size
|
||||
size_bytes = sum(f.stat().st_size for f in sub.rglob("*") if f.is_file())
|
||||
# Compute directory size
|
||||
size_bytes = sum(f.stat().st_size for f in sub.rglob("*") if f.is_file())
|
||||
|
||||
# Load adapter/model_id from model_info.json or training_info.json
|
||||
adapter: str | None = None
|
||||
model_id: str | None = None
|
||||
adapter: str | None = None
|
||||
model_id: str | None = None
|
||||
role: str | None = None
|
||||
service: str | None = None
|
||||
vram_mb: int | None = None
|
||||
|
||||
if has_model_info:
|
||||
try:
|
||||
info = json.loads((sub / "model_info.json").read_text(encoding="utf-8"))
|
||||
adapter = info.get("adapter_recommendation")
|
||||
model_id = info.get("repo_id")
|
||||
except Exception:
|
||||
pass
|
||||
elif has_training_info:
|
||||
try:
|
||||
info = json.loads((sub / "training_info.json").read_text(encoding="utf-8"))
|
||||
adapter = info.get("adapter")
|
||||
model_id = info.get("base_model") or info.get("model_id")
|
||||
except Exception:
|
||||
pass
|
||||
if has_model_info:
|
||||
try:
|
||||
info = json.loads((sub / "model_info.json").read_text(encoding="utf-8"))
|
||||
adapter = info.get("adapter_recommendation")
|
||||
model_id = info.get("repo_id")
|
||||
role = info.get("role")
|
||||
service = info.get("service")
|
||||
vram_mb = info.get("vram_mb")
|
||||
except Exception:
|
||||
pass
|
||||
elif has_training_info:
|
||||
try:
|
||||
info = json.loads((sub / "training_info.json").read_text(encoding="utf-8"))
|
||||
adapter = info.get("adapter")
|
||||
model_id = info.get("base_model") or info.get("model_id")
|
||||
role = info.get("role", "classifier")
|
||||
service = info.get("service", "avocet")
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
results.append({
|
||||
"name": sub.name,
|
||||
"path": str(sub),
|
||||
"type": model_type,
|
||||
"adapter": adapter,
|
||||
"size_bytes": size_bytes,
|
||||
"model_id": model_id,
|
||||
})
|
||||
# Fall back to queue entry when model_info.json has null service/role.
|
||||
# This covers models downloaded before the pipeline_tag registry existed.
|
||||
if (role is None or service is None) and sub.name in queue_by_safe_name:
|
||||
q = queue_by_safe_name[sub.name]
|
||||
role = role or q.get("role")
|
||||
service = service or q.get("service")
|
||||
model_id = model_id or q.get("repo_id")
|
||||
|
||||
# Last resort: re-derive from pipeline_tag if we still have no service.
|
||||
if service is None and model_id:
|
||||
hf_url = f"https://huggingface.co/api/models/{model_id}"
|
||||
# Only attempt if we have a pipeline_tag cached somewhere.
|
||||
for q in queue_by_safe_name.values():
|
||||
if q.get("repo_id") == model_id and q.get("pipeline_tag"):
|
||||
tag_info = _TAG_TO_INFO.get(q["pipeline_tag"])
|
||||
if tag_info:
|
||||
role = role or tag_info["role"]
|
||||
service = service or tag_info["service"]
|
||||
break
|
||||
|
||||
results.append({
|
||||
"name": sub.name,
|
||||
"path": str(sub),
|
||||
"type": model_type,
|
||||
"adapter": adapter,
|
||||
"role": role,
|
||||
"service": service,
|
||||
"size_bytes": size_bytes,
|
||||
"vram_mb": vram_mb,
|
||||
"model_id": model_id,
|
||||
})
|
||||
|
||||
return results
|
||||
|
||||
|
||||
# ── PATCH /installed/{name} ────────────────────────────────────────────────────
|
||||
|
||||
class InstalledModelPatch(BaseModel):
|
||||
service: str
|
||||
role: str
|
||||
|
||||
|
||||
@router.patch("/installed/{name}")
|
||||
def patch_installed(name: str, body: InstalledModelPatch) -> dict:
|
||||
"""Manually assign service and role to an installed model.
|
||||
|
||||
Writes the updated values back to model_info.json so they survive restarts,
|
||||
and updates any matching queue entry so the UI shows the correct chip.
|
||||
"""
|
||||
if "/" in name or "\\" in name or ".." in name or not name or name.startswith("."):
|
||||
raise HTTPException(400, f"Invalid model name {name!r}")
|
||||
|
||||
candidate_dirs = [_MODELS_DIR]
|
||||
if _CF_TEXT_MODELS_DIR != _MODELS_DIR:
|
||||
candidate_dirs.append(_CF_TEXT_MODELS_DIR)
|
||||
|
||||
model_path: Path | None = None
|
||||
for base in candidate_dirs:
|
||||
candidate = base / name
|
||||
try:
|
||||
candidate.resolve().relative_to(base.resolve())
|
||||
except ValueError:
|
||||
raise HTTPException(400, f"Path traversal detected for name {name!r}")
|
||||
if candidate.exists():
|
||||
model_path = candidate
|
||||
break
|
||||
|
||||
if model_path is None:
|
||||
raise HTTPException(404, f"Installed model {name!r} not found")
|
||||
|
||||
info_path = model_path / "model_info.json"
|
||||
if info_path.exists():
|
||||
try:
|
||||
info = json.loads(info_path.read_text(encoding="utf-8"))
|
||||
except Exception:
|
||||
info = {}
|
||||
else:
|
||||
info = {}
|
||||
|
||||
info["service"] = body.service
|
||||
info["role"] = body.role
|
||||
info_path.write_text(json.dumps(info, indent=2), encoding="utf-8")
|
||||
|
||||
# Mirror the update into any matching queue entry.
|
||||
records = _read_queue()
|
||||
updated = False
|
||||
for r in records:
|
||||
local = r.get("local_path", "")
|
||||
matches = (local and Path(local).name == name) or _safe_model_name(r.get("repo_id", "")) == name
|
||||
if matches and r.get("status") not in ("dismissed",):
|
||||
r["service"] = body.service
|
||||
r["role"] = body.role
|
||||
updated = True
|
||||
if updated:
|
||||
_write_queue(records)
|
||||
|
||||
return {"ok": True, "service": body.service, "role": body.role}
|
||||
|
||||
|
||||
# ── DELETE /installed/{name} ───────────────────────────────────────────────────
|
||||
|
||||
@router.delete("/installed/{name}")
|
||||
def delete_installed(name: str) -> dict:
|
||||
"""Remove an installed model directory by name. Blocks path traversal."""
|
||||
# Validate: single path component, no slashes or '..'
|
||||
"""Remove an installed model directory by name. Blocks path traversal.
|
||||
|
||||
Searches both the local avocet models dir and the shared cf-text models dir.
|
||||
Also dismisses any matching queue entry so the UI doesn't show a stale "ready" card.
|
||||
"""
|
||||
if "/" in name or "\\" in name or ".." in name or not name or name.startswith("."):
|
||||
raise HTTPException(400, f"Invalid model name {name!r}: must be a single directory name with no path separators or '..'")
|
||||
|
||||
model_path = _MODELS_DIR / name
|
||||
# Search both model directories
|
||||
candidate_dirs = [_MODELS_DIR]
|
||||
if _CF_TEXT_MODELS_DIR != _MODELS_DIR:
|
||||
candidate_dirs.append(_CF_TEXT_MODELS_DIR)
|
||||
|
||||
# Extra safety: confirm resolved path is inside _MODELS_DIR
|
||||
try:
|
||||
model_path.resolve().relative_to(_MODELS_DIR.resolve())
|
||||
except ValueError:
|
||||
raise HTTPException(400, f"Path traversal detected for name {name!r}")
|
||||
model_path: Path | None = None
|
||||
for base in candidate_dirs:
|
||||
candidate = base / name
|
||||
try:
|
||||
candidate.resolve().relative_to(base.resolve())
|
||||
except ValueError:
|
||||
raise HTTPException(400, f"Path traversal detected for name {name!r}")
|
||||
if candidate.exists():
|
||||
model_path = candidate
|
||||
break
|
||||
|
||||
if not model_path.exists():
|
||||
raise HTTPException(404, f"Installed model {name!r} not found")
|
||||
if model_path is None:
|
||||
raise HTTPException(404, f"Installed model {name!r} not found in any model directory")
|
||||
|
||||
shutil.rmtree(model_path)
|
||||
|
||||
# Dismiss any queue entries whose local_path matches, or whose repo_id maps to this dir name.
|
||||
records = _read_queue()
|
||||
updated = False
|
||||
for r in records:
|
||||
local = r.get("local_path", "")
|
||||
matches_path = local and Path(local).name == name
|
||||
matches_name = _safe_model_name(r.get("repo_id", "")) == name
|
||||
if (matches_path or matches_name) and r.get("status") != "dismissed":
|
||||
r["status"] = "dismissed"
|
||||
updated = True
|
||||
if updated:
|
||||
_write_queue(records)
|
||||
|
||||
return {"ok": True}
|
||||
|
|
|
|||
427
app/style.py
Normal file
427
app/style.py
Normal file
|
|
@ -0,0 +1,427 @@
|
|||
"""Avocet — Writing style benchmark integration API.
|
||||
|
||||
Wraps scripts/benchmark_style.py and exposes it via the Avocet API.
|
||||
Connection config (coordinator_url, ollama_url, python_bin) is read
|
||||
from label_tool.yaml under the `cforch:` key — the same block used
|
||||
by cforch.py, so no new config section is needed.
|
||||
|
||||
All endpoints are registered on `router` (a FastAPI APIRouter).
|
||||
api.py includes this router with prefix="/api/style".
|
||||
|
||||
Module-level globals (_BENCH_RUNNING, _bench_proc) follow the same
|
||||
testability pattern as cforch.py.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import logging
|
||||
import subprocess as _subprocess
|
||||
import uuid
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
import httpx
|
||||
import yaml
|
||||
from fastapi import APIRouter, HTTPException, Query
|
||||
from fastapi.responses import StreamingResponse
|
||||
from pydantic import BaseModel
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
_ROOT = Path(__file__).parent.parent
|
||||
_CONFIG_DIR: Path | None = None # override in tests via set_config_dir()
|
||||
_BENCH_RUNNING: bool = False
|
||||
_bench_proc: Any = None
|
||||
|
||||
_BENCH_SCRIPT = _ROOT / "scripts" / "benchmark_style.py"
|
||||
_RESULTS_DIR = _ROOT / "benchmark_results"
|
||||
|
||||
router = APIRouter()
|
||||
|
||||
|
||||
# ── Testability seams ──────────────────────────────────────────────────────────
|
||||
|
||||
def set_config_dir(path: Path | None) -> None:
|
||||
global _CONFIG_DIR
|
||||
_CONFIG_DIR = path
|
||||
|
||||
|
||||
# ── Internal helpers ───────────────────────────────────────────────────────────
|
||||
|
||||
def _config_file() -> Path:
|
||||
if _CONFIG_DIR is not None:
|
||||
return _CONFIG_DIR / "label_tool.yaml"
|
||||
return _ROOT / "config" / "label_tool.yaml"
|
||||
|
||||
|
||||
def _load_config() -> dict:
|
||||
"""Read label_tool.yaml cforch section for coordinator/ollama/python config."""
|
||||
f = _config_file()
|
||||
file_cfg: dict = {}
|
||||
if f.exists():
|
||||
try:
|
||||
raw = yaml.safe_load(f.read_text(encoding="utf-8")) or {}
|
||||
file_cfg = raw.get("cforch", {}) or {}
|
||||
except yaml.YAMLError as exc:
|
||||
logger.warning("Failed to parse style config %s: %s", f, exc)
|
||||
return {
|
||||
"coordinator_url": file_cfg.get("coordinator_url", "http://10.1.10.71:7700"),
|
||||
"ollama_url": file_cfg.get("ollama_url", "http://localhost:11434"),
|
||||
"python_bin": file_cfg.get("python_bin", "/devl/miniconda3/envs/cf/bin/python"),
|
||||
}
|
||||
|
||||
|
||||
# ── GET /models ────────────────────────────────────────────────────────────────
|
||||
|
||||
@router.get("/models")
|
||||
def get_models() -> dict:
|
||||
"""Return available models grouped by source.
|
||||
|
||||
- ollama: fetched live from /api/tags (includes any models downloaded
|
||||
via the Models view — automatically in sync)
|
||||
- cf_text: fetched from cf-orch catalog endpoint (requires node profile
|
||||
entry + coordinator restart when new GGUFs are added)
|
||||
"""
|
||||
cfg = _load_config()
|
||||
|
||||
# Ollama models — live query so newly downloaded models appear immediately
|
||||
ollama_models: list[dict] = []
|
||||
try:
|
||||
resp = httpx.get(f"{cfg['ollama_url']}/api/tags", timeout=5.0)
|
||||
resp.raise_for_status()
|
||||
for m in resp.json().get("models", []):
|
||||
name = m.get("name", "")
|
||||
if name:
|
||||
size_bytes = m.get("size", 0)
|
||||
ollama_models.append({
|
||||
"id": name,
|
||||
"name": name,
|
||||
"source": "ollama",
|
||||
"size_mb": round(size_bytes / (1024 * 1024)) if size_bytes else None,
|
||||
"vram_mb": None,
|
||||
})
|
||||
except Exception as exc:
|
||||
logger.warning("Failed to fetch ollama models: %s", exc)
|
||||
|
||||
# cf-text catalog — fetched from cf-orch coordinator
|
||||
cftext_models: list[dict] = []
|
||||
try:
|
||||
resp = httpx.get(
|
||||
f"{cfg['coordinator_url']}/api/services/cf-text/catalog",
|
||||
timeout=5.0,
|
||||
)
|
||||
resp.raise_for_status()
|
||||
for model_id, entry in resp.json().items():
|
||||
if isinstance(entry, dict):
|
||||
cftext_models.append({
|
||||
"id": model_id,
|
||||
"name": model_id,
|
||||
"source": "cf-text",
|
||||
"vram_mb": entry.get("vram_mb"),
|
||||
"description": entry.get("description", ""),
|
||||
})
|
||||
except Exception as exc:
|
||||
logger.warning("Failed to fetch cf-text catalog: %s", exc)
|
||||
|
||||
return {"ollama": ollama_models, "cf_text": cftext_models}
|
||||
|
||||
|
||||
# ── GET /run ───────────────────────────────────────────────────────────────────
|
||||
|
||||
@router.get("/run")
|
||||
def run_style_benchmark(
|
||||
models: str = Query("", description="Comma-separated model IDs (empty = all)"),
|
||||
use_cforch: bool = Query(False),
|
||||
max_vram: int = Query(7200, description="Max VRAM MB for cf-orch OOM filter"),
|
||||
include_large: bool = Query(False, description="Include large (30B+) ollama models"),
|
||||
workers: int = Query(1, description="Parallel workers — run N models simultaneously"),
|
||||
) -> StreamingResponse:
|
||||
"""Spawn benchmark_style.py and stream stdout as SSE progress events.
|
||||
|
||||
On successful completion, emits a final `type: result` event containing
|
||||
the parsed JSON from the newest style_*.json file.
|
||||
"""
|
||||
global _BENCH_RUNNING, _bench_proc
|
||||
|
||||
if _BENCH_RUNNING:
|
||||
raise HTTPException(409, "A writing style benchmark is already running")
|
||||
|
||||
cfg = _load_config()
|
||||
python_bin = cfg["python_bin"]
|
||||
|
||||
def generate():
|
||||
global _BENCH_RUNNING, _bench_proc
|
||||
|
||||
if not _BENCH_SCRIPT.exists():
|
||||
yield f"data: {json.dumps({'type': 'error', 'message': f'benchmark_style.py not found at {_BENCH_SCRIPT}'})}\n\n"
|
||||
return
|
||||
|
||||
cmd = [python_bin, str(_BENCH_SCRIPT), "run"]
|
||||
|
||||
if models:
|
||||
cmd.extend(["--models", ",".join(m.strip() for m in models.split(",") if m.strip())])
|
||||
if use_cforch:
|
||||
cmd.extend(["--cforch", "--cforch-url", cfg["coordinator_url"],
|
||||
"--max-vram", str(max_vram)])
|
||||
if include_large:
|
||||
cmd.append("--include-large")
|
||||
if workers > 1:
|
||||
cmd.extend(["--workers", str(workers)])
|
||||
|
||||
_BENCH_RUNNING = True
|
||||
try:
|
||||
proc = _subprocess.Popen(
|
||||
cmd,
|
||||
stdout=_subprocess.PIPE,
|
||||
stderr=_subprocess.STDOUT,
|
||||
text=True,
|
||||
bufsize=1,
|
||||
cwd=str(_ROOT),
|
||||
)
|
||||
_bench_proc = proc
|
||||
try:
|
||||
for line in proc.stdout:
|
||||
line = line.rstrip()
|
||||
if line:
|
||||
yield f"data: {json.dumps({'type': 'progress', 'message': line})}\n\n"
|
||||
proc.wait()
|
||||
if proc.returncode == 0:
|
||||
result_files = sorted(_RESULTS_DIR.glob("style_*.json"))
|
||||
if result_files:
|
||||
try:
|
||||
results = json.loads(result_files[-1].read_text(encoding="utf-8"))
|
||||
yield f"data: {json.dumps({'type': 'result', 'results': results, 'filename': result_files[-1].name})}\n\n"
|
||||
except Exception as exc:
|
||||
logger.warning("Failed to read style results: %s", exc)
|
||||
yield f"data: {json.dumps({'type': 'complete'})}\n\n"
|
||||
else:
|
||||
yield f"data: {json.dumps({'type': 'error', 'message': f'Process exited with code {proc.returncode}'})}\n\n"
|
||||
finally:
|
||||
_bench_proc = None
|
||||
except Exception as exc:
|
||||
yield f"data: {json.dumps({'type': 'error', 'message': str(exc)})}\n\n"
|
||||
finally:
|
||||
_BENCH_RUNNING = False
|
||||
|
||||
return StreamingResponse(
|
||||
generate(),
|
||||
media_type="text/event-stream",
|
||||
headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
|
||||
)
|
||||
|
||||
|
||||
# ── GET /results ───────────────────────────────────────────────────────────────
|
||||
|
||||
@router.get("/results")
|
||||
def list_results() -> list[dict]:
|
||||
"""List past writing style benchmark runs, newest first.
|
||||
|
||||
Returns lightweight summaries (date, model count, top score).
|
||||
Use /results/{filename} to fetch full model-level detail.
|
||||
"""
|
||||
if not _RESULTS_DIR.exists():
|
||||
return []
|
||||
|
||||
runs: list[dict] = []
|
||||
for f in sorted(_RESULTS_DIR.glob("style_*.json"), reverse=True):
|
||||
stem = f.stem # style_2026-04-22_1502
|
||||
date_str = stem.removeprefix("style_") # 2026-04-22_1502
|
||||
try:
|
||||
date_part, time_part = date_str.split("_")
|
||||
display_date = f"{date_part} {time_part[:2]}:{time_part[2:]}"
|
||||
except Exception:
|
||||
display_date = date_str
|
||||
|
||||
try:
|
||||
results = json.loads(f.read_text(encoding="utf-8"))
|
||||
top_score = max((r.get("avg_score", 0) for r in results), default=0)
|
||||
model_count = len(results)
|
||||
except Exception:
|
||||
top_score = 0
|
||||
model_count = 0
|
||||
|
||||
runs.append({
|
||||
"filename": f.name,
|
||||
"date": display_date,
|
||||
"model_count": model_count,
|
||||
"top_score": round(top_score, 1),
|
||||
})
|
||||
|
||||
return runs
|
||||
|
||||
|
||||
@router.get("/results/latest")
|
||||
def get_latest_results() -> list[dict]:
|
||||
"""Return the latest writing style benchmark result list."""
|
||||
if not _RESULTS_DIR.exists():
|
||||
raise HTTPException(404, "No benchmark results found")
|
||||
files = sorted(_RESULTS_DIR.glob("style_*.json"))
|
||||
if not files:
|
||||
raise HTTPException(404, "No benchmark results found")
|
||||
try:
|
||||
return json.loads(files[-1].read_text(encoding="utf-8"))
|
||||
except Exception as exc:
|
||||
raise HTTPException(500, f"Failed to read results: {exc}") from exc
|
||||
|
||||
|
||||
@router.get("/results/{filename}")
|
||||
def get_results_by_filename(filename: str) -> list[dict]:
|
||||
"""Return writing style benchmark results for a specific run file."""
|
||||
if not filename.startswith("style_") or not filename.endswith(".json"):
|
||||
raise HTTPException(400, "Invalid filename — expected style_*.json")
|
||||
f = _RESULTS_DIR / filename
|
||||
if not f.exists():
|
||||
raise HTTPException(404, f"Results file not found: {filename}")
|
||||
try:
|
||||
return json.loads(f.read_text(encoding="utf-8"))
|
||||
except Exception as exc:
|
||||
raise HTTPException(500, f"Failed to read results: {exc}") from exc
|
||||
|
||||
|
||||
# ── POST /send-to-corrections ──────────────────────────────────────────────────
|
||||
|
||||
class SendToCorrectionsRequest(BaseModel):
|
||||
filename: str # style_YYYY-MM-DD_HHMM.json — the source run file
|
||||
model_ids: list[str] = [] # empty = all models in the run
|
||||
|
||||
|
||||
@router.post("/send-to-corrections")
|
||||
def send_to_corrections(req: SendToCorrectionsRequest) -> dict:
|
||||
"""Push writing style benchmark outputs into the SFT corrections queue.
|
||||
|
||||
Each prompt_result from the selected models becomes one SFT candidate
|
||||
with status='needs_review'. Duplicates are skipped via the 'id' field
|
||||
(hash of model_id + tag).
|
||||
"""
|
||||
if not req.filename.startswith("style_") or not req.filename.endswith(".json"):
|
||||
raise HTTPException(400, "Invalid filename")
|
||||
|
||||
src = _RESULTS_DIR / req.filename
|
||||
if not src.exists():
|
||||
raise HTTPException(404, f"Results file not found: {req.filename}")
|
||||
|
||||
try:
|
||||
run_results: list[dict] = json.loads(src.read_text(encoding="utf-8"))
|
||||
except Exception as exc:
|
||||
raise HTTPException(500, f"Failed to read results: {exc}") from exc
|
||||
|
||||
# Resolve sft_candidates.jsonl path (same logic as sft.py)
|
||||
sft_data_dir = _ROOT / "data"
|
||||
sft_file = sft_data_dir / "sft_candidates.jsonl"
|
||||
|
||||
# Load existing IDs to deduplicate
|
||||
existing_ids: set[str] = set()
|
||||
if sft_file.exists():
|
||||
for line in sft_file.read_text(encoding="utf-8").splitlines():
|
||||
line = line.strip()
|
||||
if line:
|
||||
try:
|
||||
existing_ids.add(json.loads(line)["id"])
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
run_id = req.filename.removesuffix(".json") # style_2026-04-22_1502
|
||||
timestamp = datetime.now(tz=timezone.utc).isoformat()
|
||||
|
||||
new_candidates: list[dict] = []
|
||||
for model_result in run_results:
|
||||
model_id = model_result.get("model_id", "")
|
||||
if req.model_ids and model_id not in req.model_ids:
|
||||
continue
|
||||
for pr in model_result.get("prompt_results", []):
|
||||
tag = pr.get("tag", "")
|
||||
# Stable id: deterministic hash of run + model + prompt tag
|
||||
candidate_id = str(uuid.uuid5(
|
||||
uuid.NAMESPACE_URL,
|
||||
f"style-benchmark/{run_id}/{model_id}/{tag}",
|
||||
))
|
||||
if candidate_id in existing_ids:
|
||||
continue
|
||||
|
||||
score_pct = pr.get("score", 0.0) / 100.0
|
||||
signals = pr.get("signals", {})
|
||||
|
||||
# Build the prompt message list matching the benchmark's actual request
|
||||
prompt_messages = [
|
||||
{"role": "system", "content": _STYLE_SYSTEM_PROMPT},
|
||||
{"role": "user", "content": pr.get("user_prompt", tag)},
|
||||
]
|
||||
|
||||
new_candidates.append({
|
||||
"id": candidate_id,
|
||||
"source": "style-benchmark",
|
||||
"benchmark_run_id": run_id,
|
||||
"timestamp": timestamp,
|
||||
"status": "needs_review",
|
||||
"prompt_messages": prompt_messages,
|
||||
"model_response": pr.get("output", ""),
|
||||
"corrected_response": None,
|
||||
"quality_score": round(score_pct, 4),
|
||||
"failure_reason": _build_failure_reason(pr, signals),
|
||||
"failure_category": None,
|
||||
"task_id": f"style/{tag}",
|
||||
"task_type": "style-match",
|
||||
"task_name": tag.replace("_", " ").title(),
|
||||
"model_id": model_id,
|
||||
"model_name": model_id,
|
||||
"node_id": "",
|
||||
"gpu_id": 0,
|
||||
"tokens_per_sec": 0,
|
||||
})
|
||||
existing_ids.add(candidate_id)
|
||||
|
||||
if new_candidates:
|
||||
sft_data_dir.mkdir(parents=True, exist_ok=True)
|
||||
with open(sft_file, "a", encoding="utf-8") as fh:
|
||||
for c in new_candidates:
|
||||
fh.write(json.dumps(c) + "\n")
|
||||
|
||||
return {"imported": len(new_candidates), "skipped": 0}
|
||||
|
||||
|
||||
# Excerpt of the system prompt used in benchmark_style.py — reproduced here
|
||||
# so the SFT candidate captures the full generation context.
|
||||
_STYLE_SYSTEM_PROMPT = (
|
||||
"You are a writing assistant. Your job is to write a Reddit reply that matches "
|
||||
"the voice, tone, and style of the provided samples exactly.\n\n"
|
||||
"Voice characteristics:\n"
|
||||
"- Casual engineer tone. Short punchy sentences.\n"
|
||||
"- No em dashes. No semicolons. No filler phrases.\n"
|
||||
"- Direct. Opinionated. Community-first."
|
||||
)
|
||||
|
||||
|
||||
def _build_failure_reason(pr: dict, signals: dict) -> str | None:
|
||||
"""Return a human-readable failure reason string if there are violations."""
|
||||
reasons = []
|
||||
if signals.get("em_dash_count", 0) > 0:
|
||||
reasons.append(f"{signals['em_dash_count']} em dash(es)")
|
||||
if signals.get("semicolon_count", 0) > 0:
|
||||
reasons.append(f"{signals['semicolon_count']} semicolon(s)")
|
||||
if signals.get("filler_hits"):
|
||||
reasons.append(f"filler phrases: {', '.join(signals['filler_hits'])}")
|
||||
if not pr.get("output", "").strip():
|
||||
reasons.append("empty output")
|
||||
return "; ".join(reasons) if reasons else None
|
||||
|
||||
|
||||
# ── POST /cancel ───────────────────────────────────────────────────────────────
|
||||
|
||||
@router.post("/cancel")
|
||||
def cancel_style_benchmark() -> dict:
|
||||
"""Kill the running writing style benchmark subprocess."""
|
||||
global _BENCH_RUNNING, _bench_proc
|
||||
|
||||
if not _BENCH_RUNNING:
|
||||
raise HTTPException(404, "No writing style benchmark is currently running")
|
||||
|
||||
if _bench_proc is not None:
|
||||
try:
|
||||
_bench_proc.terminate()
|
||||
except Exception as exc:
|
||||
logger.warning("Failed to terminate style benchmark: %s", exc)
|
||||
|
||||
_BENCH_RUNNING = False
|
||||
_bench_proc = None
|
||||
return {"status": "cancelled"}
|
||||
427
app/voice.py
Normal file
427
app/voice.py
Normal file
|
|
@ -0,0 +1,427 @@
|
|||
"""Avocet — Voice benchmark integration API.
|
||||
|
||||
Wraps scripts/benchmark_voice.py and exposes it via the Avocet API.
|
||||
Connection config (coordinator_url, ollama_url, python_bin) is read
|
||||
from label_tool.yaml under the `cforch:` key — the same block used
|
||||
by cforch.py, so no new config section is needed.
|
||||
|
||||
All endpoints are registered on `router` (a FastAPI APIRouter).
|
||||
api.py includes this router with prefix="/api/voice".
|
||||
|
||||
Module-level globals (_BENCH_RUNNING, _bench_proc) follow the same
|
||||
testability pattern as cforch.py.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import logging
|
||||
import subprocess as _subprocess
|
||||
import uuid
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
import httpx
|
||||
import yaml
|
||||
from fastapi import APIRouter, HTTPException, Query
|
||||
from fastapi.responses import StreamingResponse
|
||||
from pydantic import BaseModel
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
_ROOT = Path(__file__).parent.parent
|
||||
_CONFIG_DIR: Path | None = None # override in tests via set_config_dir()
|
||||
_BENCH_RUNNING: bool = False
|
||||
_bench_proc: Any = None
|
||||
|
||||
_BENCH_SCRIPT = _ROOT / "scripts" / "benchmark_voice.py"
|
||||
_RESULTS_DIR = _ROOT / "benchmark_results"
|
||||
|
||||
router = APIRouter()
|
||||
|
||||
|
||||
# ── Testability seams ──────────────────────────────────────────────────────────
|
||||
|
||||
def set_config_dir(path: Path | None) -> None:
|
||||
global _CONFIG_DIR
|
||||
_CONFIG_DIR = path
|
||||
|
||||
|
||||
# ── Internal helpers ───────────────────────────────────────────────────────────
|
||||
|
||||
def _config_file() -> Path:
|
||||
if _CONFIG_DIR is not None:
|
||||
return _CONFIG_DIR / "label_tool.yaml"
|
||||
return _ROOT / "config" / "label_tool.yaml"
|
||||
|
||||
|
||||
def _load_config() -> dict:
|
||||
"""Read label_tool.yaml cforch section for coordinator/ollama/python config."""
|
||||
f = _config_file()
|
||||
file_cfg: dict = {}
|
||||
if f.exists():
|
||||
try:
|
||||
raw = yaml.safe_load(f.read_text(encoding="utf-8")) or {}
|
||||
file_cfg = raw.get("cforch", {}) or {}
|
||||
except yaml.YAMLError as exc:
|
||||
logger.warning("Failed to parse voice config %s: %s", f, exc)
|
||||
return {
|
||||
"coordinator_url": file_cfg.get("coordinator_url", "http://10.1.10.71:7700"),
|
||||
"ollama_url": file_cfg.get("ollama_url", "http://localhost:11434"),
|
||||
"python_bin": file_cfg.get("python_bin", "/devl/miniconda3/envs/cf/bin/python"),
|
||||
}
|
||||
|
||||
|
||||
# ── GET /models ────────────────────────────────────────────────────────────────
|
||||
|
||||
@router.get("/models")
|
||||
def get_models() -> dict:
|
||||
"""Return available models grouped by source.
|
||||
|
||||
- ollama: fetched live from /api/tags (includes any models downloaded
|
||||
via the Models view — automatically in sync)
|
||||
- cf_text: fetched from cf-orch catalog endpoint (requires node profile
|
||||
entry + coordinator restart when new GGUFs are added)
|
||||
"""
|
||||
cfg = _load_config()
|
||||
|
||||
# Ollama models — live query so newly downloaded models appear immediately
|
||||
ollama_models: list[dict] = []
|
||||
try:
|
||||
resp = httpx.get(f"{cfg['ollama_url']}/api/tags", timeout=5.0)
|
||||
resp.raise_for_status()
|
||||
for m in resp.json().get("models", []):
|
||||
name = m.get("name", "")
|
||||
if name:
|
||||
size_bytes = m.get("size", 0)
|
||||
ollama_models.append({
|
||||
"id": name,
|
||||
"name": name,
|
||||
"source": "ollama",
|
||||
"size_mb": round(size_bytes / (1024 * 1024)) if size_bytes else None,
|
||||
"vram_mb": None,
|
||||
})
|
||||
except Exception as exc:
|
||||
logger.warning("Failed to fetch ollama models: %s", exc)
|
||||
|
||||
# cf-text catalog — fetched from cf-orch coordinator
|
||||
cftext_models: list[dict] = []
|
||||
try:
|
||||
resp = httpx.get(
|
||||
f"{cfg['coordinator_url']}/api/services/cf-text/catalog",
|
||||
timeout=5.0,
|
||||
)
|
||||
resp.raise_for_status()
|
||||
for model_id, entry in resp.json().items():
|
||||
if isinstance(entry, dict):
|
||||
cftext_models.append({
|
||||
"id": model_id,
|
||||
"name": model_id,
|
||||
"source": "cf-text",
|
||||
"vram_mb": entry.get("vram_mb"),
|
||||
"description": entry.get("description", ""),
|
||||
})
|
||||
except Exception as exc:
|
||||
logger.warning("Failed to fetch cf-text catalog: %s", exc)
|
||||
|
||||
return {"ollama": ollama_models, "cf_text": cftext_models}
|
||||
|
||||
|
||||
# ── GET /run ───────────────────────────────────────────────────────────────────
|
||||
|
||||
@router.get("/run")
|
||||
def run_voice_benchmark(
|
||||
models: str = Query("", description="Comma-separated model IDs (empty = all)"),
|
||||
use_cforch: bool = Query(False),
|
||||
max_vram: int = Query(7200, description="Max VRAM MB for cf-orch OOM filter"),
|
||||
include_large: bool = Query(False, description="Include large (30B+) ollama models"),
|
||||
workers: int = Query(1, description="Parallel workers — run N models simultaneously"),
|
||||
) -> StreamingResponse:
|
||||
"""Spawn benchmark_voice.py and stream stdout as SSE progress events.
|
||||
|
||||
On successful completion, emits a final `type: result` event containing
|
||||
the parsed JSON from the newest voice_*.json file.
|
||||
"""
|
||||
global _BENCH_RUNNING, _bench_proc
|
||||
|
||||
if _BENCH_RUNNING:
|
||||
raise HTTPException(409, "A voice benchmark is already running")
|
||||
|
||||
cfg = _load_config()
|
||||
python_bin = cfg["python_bin"]
|
||||
|
||||
def generate():
|
||||
global _BENCH_RUNNING, _bench_proc
|
||||
|
||||
if not _BENCH_SCRIPT.exists():
|
||||
yield f"data: {json.dumps({'type': 'error', 'message': f'benchmark_voice.py not found at {_BENCH_SCRIPT}'})}\n\n"
|
||||
return
|
||||
|
||||
cmd = [python_bin, str(_BENCH_SCRIPT), "run"]
|
||||
|
||||
if models:
|
||||
cmd.extend(["--models", ",".join(m.strip() for m in models.split(",") if m.strip())])
|
||||
if use_cforch:
|
||||
cmd.extend(["--cforch", "--cforch-url", cfg["coordinator_url"],
|
||||
"--max-vram", str(max_vram)])
|
||||
if include_large:
|
||||
cmd.append("--include-large")
|
||||
if workers > 1:
|
||||
cmd.extend(["--workers", str(workers)])
|
||||
|
||||
_BENCH_RUNNING = True
|
||||
try:
|
||||
proc = _subprocess.Popen(
|
||||
cmd,
|
||||
stdout=_subprocess.PIPE,
|
||||
stderr=_subprocess.STDOUT,
|
||||
text=True,
|
||||
bufsize=1,
|
||||
cwd=str(_ROOT),
|
||||
)
|
||||
_bench_proc = proc
|
||||
try:
|
||||
for line in proc.stdout:
|
||||
line = line.rstrip()
|
||||
if line:
|
||||
yield f"data: {json.dumps({'type': 'progress', 'message': line})}\n\n"
|
||||
proc.wait()
|
||||
if proc.returncode == 0:
|
||||
result_files = sorted(_RESULTS_DIR.glob("voice_*.json"))
|
||||
if result_files:
|
||||
try:
|
||||
results = json.loads(result_files[-1].read_text(encoding="utf-8"))
|
||||
yield f"data: {json.dumps({'type': 'result', 'results': results, 'filename': result_files[-1].name})}\n\n"
|
||||
except Exception as exc:
|
||||
logger.warning("Failed to read voice results: %s", exc)
|
||||
yield f"data: {json.dumps({'type': 'complete'})}\n\n"
|
||||
else:
|
||||
yield f"data: {json.dumps({'type': 'error', 'message': f'Process exited with code {proc.returncode}'})}\n\n"
|
||||
finally:
|
||||
_bench_proc = None
|
||||
except Exception as exc:
|
||||
yield f"data: {json.dumps({'type': 'error', 'message': str(exc)})}\n\n"
|
||||
finally:
|
||||
_BENCH_RUNNING = False
|
||||
|
||||
return StreamingResponse(
|
||||
generate(),
|
||||
media_type="text/event-stream",
|
||||
headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
|
||||
)
|
||||
|
||||
|
||||
# ── GET /results ───────────────────────────────────────────────────────────────
|
||||
|
||||
@router.get("/results")
|
||||
def list_results() -> list[dict]:
|
||||
"""List past voice benchmark runs, newest first.
|
||||
|
||||
Returns lightweight summaries (date, model count, top score).
|
||||
Use /results/{filename} to fetch full model-level detail.
|
||||
"""
|
||||
if not _RESULTS_DIR.exists():
|
||||
return []
|
||||
|
||||
runs: list[dict] = []
|
||||
for f in sorted(_RESULTS_DIR.glob("voice_*.json"), reverse=True):
|
||||
stem = f.stem # voice_2026-04-22_1502
|
||||
date_str = stem.removeprefix("voice_") # 2026-04-22_1502
|
||||
try:
|
||||
date_part, time_part = date_str.split("_")
|
||||
display_date = f"{date_part} {time_part[:2]}:{time_part[2:]}"
|
||||
except Exception:
|
||||
display_date = date_str
|
||||
|
||||
try:
|
||||
results = json.loads(f.read_text(encoding="utf-8"))
|
||||
top_score = max((r.get("avg_score", 0) for r in results), default=0)
|
||||
model_count = len(results)
|
||||
except Exception:
|
||||
top_score = 0
|
||||
model_count = 0
|
||||
|
||||
runs.append({
|
||||
"filename": f.name,
|
||||
"date": display_date,
|
||||
"model_count": model_count,
|
||||
"top_score": round(top_score, 1),
|
||||
})
|
||||
|
||||
return runs
|
||||
|
||||
|
||||
@router.get("/results/latest")
|
||||
def get_latest_results() -> list[dict]:
|
||||
"""Return the latest voice benchmark result list."""
|
||||
if not _RESULTS_DIR.exists():
|
||||
raise HTTPException(404, "No benchmark results found")
|
||||
files = sorted(_RESULTS_DIR.glob("voice_*.json"))
|
||||
if not files:
|
||||
raise HTTPException(404, "No benchmark results found")
|
||||
try:
|
||||
return json.loads(files[-1].read_text(encoding="utf-8"))
|
||||
except Exception as exc:
|
||||
raise HTTPException(500, f"Failed to read results: {exc}") from exc
|
||||
|
||||
|
||||
@router.get("/results/{filename}")
|
||||
def get_results_by_filename(filename: str) -> list[dict]:
|
||||
"""Return voice benchmark results for a specific run file."""
|
||||
if not filename.startswith("voice_") or not filename.endswith(".json"):
|
||||
raise HTTPException(400, "Invalid filename — expected voice_*.json")
|
||||
f = _RESULTS_DIR / filename
|
||||
if not f.exists():
|
||||
raise HTTPException(404, f"Results file not found: {filename}")
|
||||
try:
|
||||
return json.loads(f.read_text(encoding="utf-8"))
|
||||
except Exception as exc:
|
||||
raise HTTPException(500, f"Failed to read results: {exc}") from exc
|
||||
|
||||
|
||||
# ── POST /send-to-corrections ──────────────────────────────────────────────────
|
||||
|
||||
class SendToCorrectionsRequest(BaseModel):
|
||||
filename: str # voice_YYYY-MM-DD_HHMM.json — the source run file
|
||||
model_ids: list[str] = [] # empty = all models in the run
|
||||
|
||||
|
||||
@router.post("/send-to-corrections")
|
||||
def send_to_corrections(req: SendToCorrectionsRequest) -> dict:
|
||||
"""Push voice benchmark outputs into the SFT corrections queue.
|
||||
|
||||
Each prompt_result from the selected models becomes one SFT candidate
|
||||
with status='needs_review'. Duplicates are skipped via the 'id' field
|
||||
(hash of model_id + tag).
|
||||
"""
|
||||
if not req.filename.startswith("voice_") or not req.filename.endswith(".json"):
|
||||
raise HTTPException(400, "Invalid filename")
|
||||
|
||||
src = _RESULTS_DIR / req.filename
|
||||
if not src.exists():
|
||||
raise HTTPException(404, f"Results file not found: {req.filename}")
|
||||
|
||||
try:
|
||||
run_results: list[dict] = json.loads(src.read_text(encoding="utf-8"))
|
||||
except Exception as exc:
|
||||
raise HTTPException(500, f"Failed to read results: {exc}") from exc
|
||||
|
||||
# Resolve sft_candidates.jsonl path (same logic as sft.py)
|
||||
sft_data_dir = _ROOT / "data"
|
||||
sft_file = sft_data_dir / "sft_candidates.jsonl"
|
||||
|
||||
# Load existing IDs to deduplicate
|
||||
existing_ids: set[str] = set()
|
||||
if sft_file.exists():
|
||||
for line in sft_file.read_text(encoding="utf-8").splitlines():
|
||||
line = line.strip()
|
||||
if line:
|
||||
try:
|
||||
existing_ids.add(json.loads(line)["id"])
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
run_id = req.filename.removesuffix(".json") # voice_2026-04-22_1502
|
||||
timestamp = datetime.now(tz=timezone.utc).isoformat()
|
||||
|
||||
new_candidates: list[dict] = []
|
||||
for model_result in run_results:
|
||||
model_id = model_result.get("model_id", "")
|
||||
if req.model_ids and model_id not in req.model_ids:
|
||||
continue
|
||||
for pr in model_result.get("prompt_results", []):
|
||||
tag = pr.get("tag", "")
|
||||
# Stable id: deterministic hash of run + model + prompt tag
|
||||
candidate_id = str(uuid.uuid5(
|
||||
uuid.NAMESPACE_URL,
|
||||
f"voice-benchmark/{run_id}/{model_id}/{tag}",
|
||||
))
|
||||
if candidate_id in existing_ids:
|
||||
continue
|
||||
|
||||
score_pct = pr.get("score", 0.0) / 100.0
|
||||
signals = pr.get("signals", {})
|
||||
|
||||
# Build the prompt message list matching the benchmark's actual request
|
||||
prompt_messages = [
|
||||
{"role": "system", "content": _VOICE_SYSTEM_PROMPT},
|
||||
{"role": "user", "content": pr.get("user_prompt", tag)},
|
||||
]
|
||||
|
||||
new_candidates.append({
|
||||
"id": candidate_id,
|
||||
"source": "voice-benchmark",
|
||||
"benchmark_run_id": run_id,
|
||||
"timestamp": timestamp,
|
||||
"status": "needs_review",
|
||||
"prompt_messages": prompt_messages,
|
||||
"model_response": pr.get("output", ""),
|
||||
"corrected_response": None,
|
||||
"quality_score": round(score_pct, 4),
|
||||
"failure_reason": _build_failure_reason(pr, signals),
|
||||
"failure_category": None,
|
||||
"task_id": f"voice/{tag}",
|
||||
"task_type": "voice-match",
|
||||
"task_name": tag.replace("_", " ").title(),
|
||||
"model_id": model_id,
|
||||
"model_name": model_id,
|
||||
"node_id": "",
|
||||
"gpu_id": 0,
|
||||
"tokens_per_sec": 0,
|
||||
})
|
||||
existing_ids.add(candidate_id)
|
||||
|
||||
if new_candidates:
|
||||
sft_data_dir.mkdir(parents=True, exist_ok=True)
|
||||
with open(sft_file, "a", encoding="utf-8") as fh:
|
||||
for c in new_candidates:
|
||||
fh.write(json.dumps(c) + "\n")
|
||||
|
||||
return {"imported": len(new_candidates), "skipped": 0}
|
||||
|
||||
|
||||
# Excerpt of the system prompt used in benchmark_voice.py — reproduced here
|
||||
# so the SFT candidate captures the full generation context.
|
||||
_VOICE_SYSTEM_PROMPT = (
|
||||
"You are a writing assistant. Your job is to write a Reddit reply that matches "
|
||||
"the voice, tone, and style of the provided samples exactly.\n\n"
|
||||
"Voice characteristics:\n"
|
||||
"- Casual engineer tone. Short punchy sentences.\n"
|
||||
"- No em dashes. No semicolons. No filler phrases.\n"
|
||||
"- Direct. Opinionated. Community-first."
|
||||
)
|
||||
|
||||
|
||||
def _build_failure_reason(pr: dict, signals: dict) -> str | None:
|
||||
"""Return a human-readable failure reason string if there are violations."""
|
||||
reasons = []
|
||||
if signals.get("em_dash_count", 0) > 0:
|
||||
reasons.append(f"{signals['em_dash_count']} em dash(es)")
|
||||
if signals.get("semicolon_count", 0) > 0:
|
||||
reasons.append(f"{signals['semicolon_count']} semicolon(s)")
|
||||
if signals.get("filler_hits"):
|
||||
reasons.append(f"filler phrases: {', '.join(signals['filler_hits'])}")
|
||||
if not pr.get("output", "").strip():
|
||||
reasons.append("empty output")
|
||||
return "; ".join(reasons) if reasons else None
|
||||
|
||||
|
||||
# ── POST /cancel ───────────────────────────────────────────────────────────────
|
||||
|
||||
@router.post("/cancel")
|
||||
def cancel_voice_benchmark() -> dict:
|
||||
"""Kill the running voice benchmark subprocess."""
|
||||
global _BENCH_RUNNING, _bench_proc
|
||||
|
||||
if not _BENCH_RUNNING:
|
||||
raise HTTPException(404, "No voice benchmark is currently running")
|
||||
|
||||
if _bench_proc is not None:
|
||||
try:
|
||||
_bench_proc.terminate()
|
||||
except Exception as exc:
|
||||
logger.warning("Failed to terminate voice benchmark: %s", exc)
|
||||
|
||||
_BENCH_RUNNING = False
|
||||
_bench_proc = None
|
||||
return {"status": "cancelled"}
|
||||
|
|
@ -57,11 +57,32 @@ imitate:
|
|||
- id: peregrine
|
||||
name: Peregrine
|
||||
icon: "🦅"
|
||||
description: Job search assistant
|
||||
base_url: http://localhost:8502
|
||||
sample_endpoint: /api/jobs
|
||||
text_fields: [title, description]
|
||||
prompt_template: "Analyze this job listing and identify key requirements:\n\n{text}"
|
||||
description: Job search assistant — live job listings
|
||||
base_url: http://localhost:8601
|
||||
health_path: /api/jobs/counts
|
||||
sample_endpoint: /api/jobs?status=pending&limit=5
|
||||
text_fields: [title, company, description]
|
||||
prompt_template: "Analyze this job listing and identify the key requirements, must-have skills, and any culture signals that would help tailor an application:\n\n{text}"
|
||||
|
||||
- id: osprey
|
||||
name: Osprey
|
||||
icon: "📞"
|
||||
description: Gov't hold-line automation — recent call records
|
||||
base_url: http://localhost:8520
|
||||
health_path: /api/health
|
||||
sample_endpoint: /api/calls/recent
|
||||
text_fields: [agency, issue, notes]
|
||||
prompt_template: "Draft a clear, professional follow-up letter for this government hold-line call. Include what was discussed, what action the agency committed to, and a polite deadline for response:\n\n{text}"
|
||||
|
||||
- id: linnet
|
||||
name: Linnet
|
||||
icon: "🐦"
|
||||
description: Real-time tone annotation — Elcor-style subtext for ND users
|
||||
base_url: http://localhost:8522
|
||||
health_path: /health
|
||||
sample_endpoint: /samples
|
||||
text_fields: [text, context]
|
||||
prompt_template: "Annotate the emotional tone and subtext of the following text using explicit Elcor-style markers (e.g. [SINCERELY], [UNCERTAIN], [FRUSTRATED]). Identify implied emotions, potential sarcasm, and any ambiguity that might be misread by neurodivergent readers:\n\n{text}"
|
||||
|
||||
- id: kiwi
|
||||
name: Kiwi
|
||||
|
|
|
|||
26
manage.sh
26
manage.sh
|
|
@ -90,6 +90,12 @@ usage() {
|
|||
echo -e " ${GREEN}score [args]${NC} Shortcut: --score [args]"
|
||||
echo -e " ${GREEN}compare [args]${NC} Shortcut: --compare [args]"
|
||||
echo ""
|
||||
echo " Writing Style Benchmark:"
|
||||
echo -e " ${GREEN}style-bench [args]${NC} Run benchmark_style.py (args passed through)"
|
||||
echo -e " ${GREEN}style-list${NC} List available ollama models for style bench"
|
||||
echo -e " ${GREEN}style-run [args]${NC} Run writing style benchmark (--models, --samples, --include-large, --scan-disk PATH, --cforch)"
|
||||
echo -e " ${GREEN}style-last${NC} Print most recent writing style benchmark report"
|
||||
echo ""
|
||||
echo " Dev:"
|
||||
echo -e " ${GREEN}dev${NC} Hot-reload: uvicorn --reload (:8503) + Vite HMR (:5173)"
|
||||
echo -e " ${GREEN}test${NC} Run pytest suite"
|
||||
|
|
@ -249,6 +255,26 @@ case "$CMD" in
|
|||
exec "$0" benchmark --compare "$@"
|
||||
;;
|
||||
|
||||
style-bench)
|
||||
info "Running writing style benchmark (${ENV_BM})…"
|
||||
if [[ ! -x "$PYTHON_BM" ]]; then
|
||||
error "Python not found in ${ENV_BM} env at ${PYTHON_BM}"
|
||||
fi
|
||||
"$PYTHON_BM" scripts/benchmark_style.py "$@"
|
||||
;;
|
||||
|
||||
style-list)
|
||||
exec "$0" style-bench --list-models
|
||||
;;
|
||||
|
||||
style-run)
|
||||
exec "$0" style-bench --run "$@"
|
||||
;;
|
||||
|
||||
style-last)
|
||||
exec "$0" style-bench --show-last
|
||||
;;
|
||||
|
||||
help|--help|-h)
|
||||
usage
|
||||
;;
|
||||
|
|
|
|||
952
scripts/benchmark_style.py
Normal file
952
scripts/benchmark_style.py
Normal file
|
|
@ -0,0 +1,952 @@
|
|||
#!/usr/bin/env python
|
||||
"""
|
||||
Writing style benchmark harness -- score local text-gen models for writing style match.
|
||||
|
||||
Runs each model against a set of test prompts, extracts style signals from the
|
||||
outputs, compares them to a style corpus, and produces a ranked markdown table.
|
||||
|
||||
Usage:
|
||||
# List available ollama models
|
||||
conda run -n cf python scripts/benchmark_style.py --list-models
|
||||
|
||||
# Run against all models with default test prompts
|
||||
conda run -n cf python scripts/benchmark_style.py --run
|
||||
|
||||
# Run specific models only
|
||||
conda run -n cf python scripts/benchmark_style.py --run --models mistral:7b,llama3.1:8b
|
||||
|
||||
# Use a custom corpus directory
|
||||
conda run -n cf python scripts/benchmark_style.py --run --samples data/style_corpus/
|
||||
|
||||
# Print last results table
|
||||
conda run -n cf python scripts/benchmark_style.py --show-last
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import re
|
||||
import sys
|
||||
import time
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
import httpx
|
||||
|
||||
_ROOT = Path(__file__).parent.parent
|
||||
_CORPUS_DIR = _ROOT / "data" / "style_corpus"
|
||||
_RESULTS_DIR = _ROOT / "benchmark_results"
|
||||
_OLLAMA_URL = "http://localhost:11434"
|
||||
_CFORCH_URL = "http://localhost:7700"
|
||||
|
||||
# Subdirectories under --scan-disk root that may contain GGUFs
|
||||
_SCAN_SUBDIRS = ["textgen/models", "llama.cpp/models", "cf-text/models", "vllm/models"]
|
||||
|
||||
# ── Filler phrases that should be absent from good style-match output ──────────
|
||||
FILLER_PHRASES: list[str] = [
|
||||
"delve", "certainly", "absolutely", "i apologize", "i'd be happy to",
|
||||
"of course", "great question", "i understand", "let me know if",
|
||||
"feel free to", "it's important to note", "it's worth noting",
|
||||
"in conclusion", "to summarize", "in summary",
|
||||
]
|
||||
|
||||
# ── Test prompts: (thread_title, thread_body, context_tag) ───────────────────
|
||||
# These are representative threads that Magpie might reply to.
|
||||
# Extend this list with real examples as the corpus grows.
|
||||
TEST_PROMPTS: list[dict[str, str]] = [
|
||||
{
|
||||
"tag": "selfhosted_ai_fatigue",
|
||||
"thread_title": "Anyone else getting tired of re-explaining their setup every time an AI model forgets?",
|
||||
"thread_body": (
|
||||
"Every session I start over. My whole hardware setup, what tools I use, "
|
||||
"what I've already tried. It's exhausting. There has to be a better way."
|
||||
),
|
||||
},
|
||||
{
|
||||
"tag": "privacy_local_llm",
|
||||
"thread_title": "What's the point of running local LLMs if the apps still phone home?",
|
||||
"thread_body": (
|
||||
"I went through all the trouble of setting up ollama and now I find out "
|
||||
"the frontend I'm using is sending telemetry. Kind of defeats the purpose."
|
||||
),
|
||||
},
|
||||
{
|
||||
"tag": "solarpunk_tech",
|
||||
"thread_title": "What does solarpunk computing actually look like in practice?",
|
||||
"thread_body": (
|
||||
"I keep seeing the aesthetic but not a lot of concrete examples of "
|
||||
"people living it out with their tech choices. What does it mean day to day?"
|
||||
),
|
||||
},
|
||||
{
|
||||
"tag": "nd_tools",
|
||||
"thread_title": "Tools that actually help with executive function vs ones that just add friction",
|
||||
"thread_body": (
|
||||
"I've tried a dozen productivity apps and most of them require more "
|
||||
"executive function to maintain than they save. What actually sticks for you?"
|
||||
),
|
||||
},
|
||||
{
|
||||
"tag": "data_ownership",
|
||||
"thread_title": "Who actually owns your data when you use a 'free' AI tool?",
|
||||
"thread_body": (
|
||||
"Read the ToS on three different AI assistants today. In all three cases "
|
||||
"your inputs can be used for training, shared with partners, and retained "
|
||||
"indefinitely. At what point does 'free' just mean you're the product?"
|
||||
),
|
||||
},
|
||||
{
|
||||
"tag": "digital_culture",
|
||||
"thread_title": "The internet used to feel like it belonged to everyone. What happened?",
|
||||
"thread_body": (
|
||||
"I grew up on forums, IRC, personal homepages. Now everything is a platform "
|
||||
"owned by someone trying to extract value from the community that built it. "
|
||||
"Is the fediverse / self-hosting movement actually reversing this or just "
|
||||
"a niche hobby?"
|
||||
),
|
||||
},
|
||||
]
|
||||
|
||||
GENERATION_PARAMS: dict[str, Any] = {
|
||||
"temperature": 0.7,
|
||||
"top_p": 0.9,
|
||||
"num_predict": 300,
|
||||
}
|
||||
|
||||
SYSTEM_PROMPT = (
|
||||
"You are a writing assistant. Your job is to write a Reddit reply that matches "
|
||||
"the voice, tone, and style of the provided samples exactly.\n\n"
|
||||
"Voice characteristics:\n"
|
||||
"- Casual engineer tone. Short punchy sentences.\n"
|
||||
"- No hype, no buzzwords, no em dashes, no semicolons.\n"
|
||||
"- Community-first perspective. Solarpunk values.\n"
|
||||
"- Direct and opinionated. No throat-clearing or filler.\n"
|
||||
"- When relevant, mention personal experience with real tools.\n\n"
|
||||
"Write ONLY the reply. No preamble, no 'Here is a reply:', no meta-commentary."
|
||||
)
|
||||
|
||||
|
||||
# ── Style signal extraction ───────────────────────────────────────────────────
|
||||
|
||||
@dataclass
|
||||
class StyleSignals:
|
||||
"""Quantitative style signals extracted from a text sample."""
|
||||
sentence_count: int = 0
|
||||
word_count: int = 0
|
||||
avg_sentence_length: float = 0.0
|
||||
em_dash_count: int = 0
|
||||
semicolon_count: int = 0
|
||||
filler_hits: list[str] = field(default_factory=list)
|
||||
question_ratio: float = 0.0 # fraction of sentences ending in '?'
|
||||
first_person_ratio: float = 0.0 # fraction of sentences starting with 'I'
|
||||
avg_word_length: float = 0.0
|
||||
|
||||
|
||||
def extract_signals(text: str) -> StyleSignals:
|
||||
"""Extract style signals from a text sample."""
|
||||
text = text.strip()
|
||||
if text.startswith("[ERROR:"):
|
||||
return StyleSignals() # zero-score sentinel — caller checks for empty output
|
||||
sentences = [s.strip() for s in re.split(r'(?<=[.!?])\s+', text) if s.strip()]
|
||||
words = text.split()
|
||||
|
||||
if not sentences:
|
||||
return StyleSignals()
|
||||
|
||||
avg_sentence_length = len(words) / len(sentences) if sentences else 0.0
|
||||
avg_word_length = (sum(len(w.strip('.,!?;:"\'')) for w in words) / len(words)) if words else 0.0
|
||||
|
||||
em_dash_count = text.count('\u2014') + text.count(' -- ') + text.count('--')
|
||||
semicolon_count = text.count(';')
|
||||
|
||||
filler_hits = [p for p in FILLER_PHRASES if p.lower() in text.lower()]
|
||||
|
||||
question_ratio = sum(1 for s in sentences if s.endswith('?')) / len(sentences)
|
||||
first_person_ratio = sum(1 for s in sentences if re.match(r"^I\b", s)) / len(sentences)
|
||||
|
||||
return StyleSignals(
|
||||
sentence_count=len(sentences),
|
||||
word_count=len(words),
|
||||
avg_sentence_length=avg_sentence_length,
|
||||
em_dash_count=em_dash_count,
|
||||
semicolon_count=semicolon_count,
|
||||
filler_hits=filler_hits,
|
||||
question_ratio=question_ratio,
|
||||
first_person_ratio=first_person_ratio,
|
||||
avg_word_length=avg_word_length,
|
||||
)
|
||||
|
||||
|
||||
def build_corpus_profile(corpus_dir: Path) -> StyleSignals | None:
|
||||
"""Aggregate style signals across all corpus samples into a target profile."""
|
||||
samples = list(corpus_dir.glob("*.txt"))
|
||||
if not samples:
|
||||
return None
|
||||
|
||||
all_signals = [extract_signals(p.read_text(encoding="utf-8")) for p in samples]
|
||||
n = len(all_signals)
|
||||
|
||||
return StyleSignals(
|
||||
sentence_count=int(sum(s.sentence_count for s in all_signals) / n),
|
||||
word_count=int(sum(s.word_count for s in all_signals) / n),
|
||||
avg_sentence_length=sum(s.avg_sentence_length for s in all_signals) / n,
|
||||
em_dash_count=int(sum(s.em_dash_count for s in all_signals) / n),
|
||||
semicolon_count=int(sum(s.semicolon_count for s in all_signals) / n),
|
||||
question_ratio=sum(s.question_ratio for s in all_signals) / n,
|
||||
first_person_ratio=sum(s.first_person_ratio for s in all_signals) / n,
|
||||
avg_word_length=sum(s.avg_word_length for s in all_signals) / n,
|
||||
)
|
||||
|
||||
|
||||
def score_against_profile(output_signals: StyleSignals, profile: StyleSignals | None) -> float:
|
||||
"""Score a model output against the corpus profile. Returns 0-100.
|
||||
|
||||
Penalties:
|
||||
- Em dashes / semicolons: -5 each occurrence (hard CF style violation)
|
||||
- Filler phrases: -8 each hit (strong signal of non-style output)
|
||||
- Sentence length delta: proportional penalty (target: close to corpus avg)
|
||||
- Word length delta: smaller penalty
|
||||
|
||||
When no corpus profile is available, falls back to absolute signal scores only.
|
||||
"""
|
||||
score = 100.0
|
||||
|
||||
# Hard violations -- always penalised regardless of corpus
|
||||
score -= output_signals.em_dash_count * 5
|
||||
score -= output_signals.semicolon_count * 3
|
||||
score -= len(output_signals.filler_hits) * 8
|
||||
|
||||
if profile is not None:
|
||||
# Sentence length delta: penalise proportionally
|
||||
length_delta = abs(output_signals.avg_sentence_length - profile.avg_sentence_length)
|
||||
score -= min(length_delta * 2, 20)
|
||||
|
||||
# Question ratio delta
|
||||
question_delta = abs(output_signals.question_ratio - profile.question_ratio)
|
||||
score -= min(question_delta * 10, 10)
|
||||
|
||||
return max(0.0, score)
|
||||
|
||||
|
||||
# ── Ollama generation ─────────────────────────────────────────────────────────
|
||||
|
||||
_CFORCH_NODE_ID = "heimdall"
|
||||
|
||||
|
||||
def cforch_list_catalog(
|
||||
cforch_url: str = _CFORCH_URL,
|
||||
node_id: str = _CFORCH_NODE_ID,
|
||||
) -> dict[str, int]:
|
||||
"""Return the cf-text catalog from cf-orch as {model_id: vram_mb}.
|
||||
|
||||
Uses ?node_id= to request the catalog from a specific node's profile,
|
||||
avoiding cross-node catalog shadowing when multiple nodes define catalogs
|
||||
for the same service.
|
||||
"""
|
||||
try:
|
||||
resp = httpx.get(
|
||||
f"{cforch_url}/api/services/cf-text/catalog",
|
||||
params={"node_id": node_id} if node_id else {},
|
||||
timeout=10.0,
|
||||
)
|
||||
resp.raise_for_status()
|
||||
raw = resp.json()
|
||||
return {
|
||||
model_id: (entry.get("vram_mb", 0) if isinstance(entry, dict) else 0)
|
||||
for model_id, entry in raw.items()
|
||||
}
|
||||
except Exception as exc:
|
||||
print(f"[warn] Could not reach cf-orch catalog at {cforch_url}: {exc}", file=sys.stderr)
|
||||
return {}
|
||||
|
||||
|
||||
def _cforch_allocate_service(
|
||||
service: str,
|
||||
model_id: str,
|
||||
cforch_url: str,
|
||||
startup_timeout_s: float,
|
||||
health_path: str,
|
||||
) -> tuple[str, str] | None:
|
||||
"""Generic cf-orch allocate + state-signal wait. Returns (service_url, allocation_id) or None.
|
||||
|
||||
After allocating, waits for the coordinator's service state to reach 'running'.
|
||||
Fails immediately if the state reaches 'stopped' (crashed load) — no waiting out
|
||||
the full timeout for a model that already failed.
|
||||
Falls back to health-polling if the coordinator doesn't expose a matching instance
|
||||
(e.g. older coordinator version or service not yet registered in probe loop).
|
||||
"""
|
||||
try:
|
||||
resp = httpx.post(
|
||||
f"{cforch_url}/api/services/{service}/allocate",
|
||||
json={
|
||||
"model_candidates": [model_id],
|
||||
"caller": "avocet",
|
||||
"pipeline": "style_benchmark",
|
||||
},
|
||||
timeout=120.0,
|
||||
)
|
||||
resp.raise_for_status()
|
||||
data = resp.json()
|
||||
service_url: str = data["url"]
|
||||
allocation_id: str = data.get("allocation_id", "")
|
||||
node_id: str = data.get("node_id", "")
|
||||
gpu_id: int | None = data.get("gpu_id")
|
||||
|
||||
if data.get("started", False) and not data.get("warm", True):
|
||||
print(f" [cold start] waiting for {service} to load {model_id!r}...", end=" ", flush=True)
|
||||
t0 = time.monotonic()
|
||||
deadline = t0 + startup_timeout_s
|
||||
probe_misses = 0 # consecutive polls with no matching instance in status
|
||||
|
||||
while time.monotonic() < deadline:
|
||||
try:
|
||||
status = httpx.get(
|
||||
f"{cforch_url}/api/services/{service}/status", timeout=5.0
|
||||
)
|
||||
if status.is_success:
|
||||
instances = status.json().get("instances", [])
|
||||
# Find our specific instance by node+gpu
|
||||
match = next(
|
||||
(i for i in instances
|
||||
if i.get("node_id") == node_id and i.get("gpu_id") == gpu_id),
|
||||
None,
|
||||
)
|
||||
if match:
|
||||
probe_misses = 0
|
||||
state = match.get("state", "")
|
||||
if state == "running":
|
||||
elapsed = time.monotonic() - t0
|
||||
print(f"ready ({elapsed:.0f}s)", flush=True)
|
||||
return service_url, allocation_id
|
||||
elif state == "stopped":
|
||||
print(f"failed (service stopped — model load error)", flush=True)
|
||||
return None
|
||||
# state == "starting" or unknown → keep waiting
|
||||
else:
|
||||
probe_misses += 1
|
||||
# After a grace period with no instance visible, fall back to
|
||||
# direct health-poll (coordinator may not have probed yet)
|
||||
if probe_misses >= 6:
|
||||
try:
|
||||
health = httpx.get(f"{service_url}{health_path}", timeout=3.0)
|
||||
if health.is_success:
|
||||
elapsed = time.monotonic() - t0
|
||||
print(f"ready via health ({elapsed:.0f}s)", flush=True)
|
||||
return service_url, allocation_id
|
||||
except Exception:
|
||||
pass
|
||||
except Exception:
|
||||
pass
|
||||
time.sleep(3.0)
|
||||
|
||||
elapsed = time.monotonic() - t0
|
||||
print(f"timed out after {elapsed:.0f}s", flush=True)
|
||||
return None
|
||||
|
||||
return service_url, allocation_id
|
||||
except Exception as exc:
|
||||
print(f"[warn] cf-orch allocation failed for {model_id!r} ({service}): {exc}", file=sys.stderr)
|
||||
return None
|
||||
|
||||
|
||||
def cforch_allocate(
|
||||
model_id: str,
|
||||
cforch_url: str = _CFORCH_URL,
|
||||
startup_timeout_s: float = 180.0,
|
||||
) -> tuple[str, str] | None:
|
||||
"""Allocate a cf-text instance for model_id. Returns (service_url, allocation_id) or None."""
|
||||
return _cforch_allocate_service("cf-text", model_id, cforch_url, startup_timeout_s, "/health")
|
||||
|
||||
|
||||
def cforch_allocate_vllm(
|
||||
model_id: str,
|
||||
cforch_url: str = _CFORCH_URL,
|
||||
startup_timeout_s: float = 300.0,
|
||||
) -> tuple[str, str] | None:
|
||||
"""Allocate a vllm instance for model_id. Returns (service_url, allocation_id) or None.
|
||||
|
||||
vllm exposes an OpenAI-compatible API — generate_cftext() works unchanged
|
||||
against the returned service_url. Startup timeout is longer (300s) because
|
||||
vllm loads large model weights from disk before becoming ready.
|
||||
"""
|
||||
return _cforch_allocate_service("vllm", model_id, cforch_url, startup_timeout_s, "/health")
|
||||
|
||||
|
||||
def cforch_release(allocation_id: str, cforch_url: str = _CFORCH_URL) -> None:
|
||||
"""Release a cf-orch allocation."""
|
||||
if not allocation_id:
|
||||
return
|
||||
try:
|
||||
httpx.delete(f"{cforch_url}/api/services/cf-text/allocations/{allocation_id}", timeout=10.0)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
|
||||
def generate_cftext(
|
||||
service_url: str,
|
||||
model_id: str,
|
||||
prompt: str,
|
||||
system: str = "",
|
||||
) -> tuple[str, float]:
|
||||
"""Call cf-text via OpenAI-compatible /v1/chat/completions. Returns (text, elapsed_ms)."""
|
||||
messages: list[dict[str, str]] = []
|
||||
if system:
|
||||
messages.append({"role": "system", "content": system})
|
||||
messages.append({"role": "user", "content": prompt})
|
||||
|
||||
payload: dict[str, Any] = {
|
||||
"model": model_id,
|
||||
"messages": messages,
|
||||
"max_tokens": GENERATION_PARAMS.get("num_predict", 300),
|
||||
"temperature": GENERATION_PARAMS.get("temperature", 0.7),
|
||||
"top_p": GENERATION_PARAMS.get("top_p", 0.9),
|
||||
"stream": False,
|
||||
}
|
||||
|
||||
t0 = time.monotonic()
|
||||
try:
|
||||
resp = httpx.post(
|
||||
f"{service_url.rstrip('/')}/v1/chat/completions",
|
||||
json=payload,
|
||||
timeout=180.0,
|
||||
)
|
||||
resp.raise_for_status()
|
||||
elapsed_ms = (time.monotonic() - t0) * 1000
|
||||
content = resp.json()["choices"][0]["message"]["content"]
|
||||
return content.strip(), elapsed_ms
|
||||
except Exception as exc:
|
||||
elapsed_ms = (time.monotonic() - t0) * 1000
|
||||
return f"[ERROR: {exc}]", elapsed_ms
|
||||
|
||||
|
||||
def generate(model_id: str, prompt: str, system: str = "") -> tuple[str, float]:
|
||||
"""Call ollama /api/generate. Returns (text, elapsed_ms)."""
|
||||
payload: dict[str, Any] = {
|
||||
"model": model_id,
|
||||
"prompt": prompt,
|
||||
"stream": False,
|
||||
"options": GENERATION_PARAMS,
|
||||
}
|
||||
if system:
|
||||
payload["system"] = system
|
||||
|
||||
t0 = time.monotonic()
|
||||
try:
|
||||
resp = httpx.post(
|
||||
f"{_OLLAMA_URL}/api/generate",
|
||||
json=payload,
|
||||
timeout=120.0,
|
||||
)
|
||||
resp.raise_for_status()
|
||||
elapsed_ms = (time.monotonic() - t0) * 1000
|
||||
return resp.json().get("response", "").strip(), elapsed_ms
|
||||
except Exception as exc:
|
||||
elapsed_ms = (time.monotonic() - t0) * 1000
|
||||
return f"[ERROR: {exc}]", elapsed_ms
|
||||
|
||||
|
||||
def find_disk_ggufs(llm_root: Path) -> list[Path]:
|
||||
"""Recursively find .gguf files under known subdirs of llm_root.
|
||||
|
||||
Skips vocab-only GGUFs (ggml-vocab-*) which aren't standalone models.
|
||||
"""
|
||||
found: list[Path] = []
|
||||
search_dirs = [llm_root / sub for sub in _SCAN_SUBDIRS] + [llm_root]
|
||||
seen: set[Path] = set()
|
||||
for base in search_dirs:
|
||||
if not base.exists():
|
||||
continue
|
||||
for gguf in base.rglob("*.gguf"):
|
||||
if gguf in seen:
|
||||
continue
|
||||
seen.add(gguf)
|
||||
if gguf.name.startswith("ggml-vocab-"):
|
||||
continue
|
||||
found.append(gguf)
|
||||
return sorted(found)
|
||||
|
||||
|
||||
def gguf_to_ollama_tag(gguf_path: Path) -> str:
|
||||
"""Derive a stable ollama tag from a GGUF path.
|
||||
|
||||
Uses parent dir name + stem to avoid collisions, e.g.:
|
||||
claude-3.7-sonnet-reasoning-gemma3-12B/foo.Q8_0.gguf
|
||||
→ bench-claude-3.7-sonnet-reasoning-gemma3-12b-foo-q8-0
|
||||
"""
|
||||
parent = gguf_path.parent.name.lower()
|
||||
stem = gguf_path.stem.lower()
|
||||
# If stem is contained in parent (common pattern), just use parent
|
||||
slug = parent if stem.replace("-", "").replace("_", "") in parent.replace("-", "").replace("_", "") else f"{parent}-{stem}"
|
||||
slug = re.sub(r"[^a-z0-9]+", "-", slug).strip("-")
|
||||
return f"bench-{slug}:latest"
|
||||
|
||||
|
||||
def register_gguf(gguf_path: Path, tag: str) -> bool:
|
||||
"""Create a temporary ollama model entry from a GGUF file. Returns True on success."""
|
||||
import subprocess
|
||||
import tempfile
|
||||
modelfile = f"FROM {gguf_path.resolve()}\n"
|
||||
with tempfile.NamedTemporaryFile(mode="w", suffix=".Modelfile", delete=False) as f:
|
||||
f.write(modelfile)
|
||||
modelfile_path = f.name
|
||||
try:
|
||||
result = subprocess.run(
|
||||
["ollama", "create", tag, "-f", modelfile_path],
|
||||
capture_output=True, text=True, timeout=60,
|
||||
)
|
||||
return result.returncode == 0
|
||||
except Exception as exc:
|
||||
print(f"[warn] Could not register {gguf_path.name}: {exc}", file=sys.stderr)
|
||||
return False
|
||||
finally:
|
||||
Path(modelfile_path).unlink(missing_ok=True)
|
||||
|
||||
|
||||
def deregister_gguf(tag: str) -> None:
|
||||
"""Remove a temporary ollama model entry."""
|
||||
import subprocess
|
||||
try:
|
||||
subprocess.run(["ollama", "rm", tag], capture_output=True, timeout=30)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
|
||||
def backfill_disk_models(
|
||||
llm_root: Path,
|
||||
existing_tags: set[str],
|
||||
max_vram_mb: int = 0,
|
||||
) -> list[str]:
|
||||
"""Register GGUFs from disk that aren't already in ollama. Returns new tags.
|
||||
|
||||
max_vram_mb: skip files whose size exceeds this threshold (0 = no limit).
|
||||
GGUF file size is a reliable VRAM proxy -- quantized weights load ~1:1.
|
||||
"""
|
||||
ggufs = find_disk_ggufs(llm_root)
|
||||
if not ggufs:
|
||||
print(f"No .gguf files found under {llm_root}", file=sys.stderr)
|
||||
return []
|
||||
|
||||
new_tags: list[str] = []
|
||||
skipped_oom = 0
|
||||
for gguf in ggufs:
|
||||
size_mb = gguf.stat().st_size // (1024 * 1024)
|
||||
if max_vram_mb and size_mb > max_vram_mb:
|
||||
print(f" [skip-oom] {gguf.name} ({size_mb} MB > {max_vram_mb} MB limit)")
|
||||
skipped_oom += 1
|
||||
continue
|
||||
tag = gguf_to_ollama_tag(gguf)
|
||||
if tag in existing_tags:
|
||||
print(f" [skip] {gguf.name} already registered as {tag}")
|
||||
continue
|
||||
print(f" [register] {gguf.name} ({size_mb} MB) → {tag} ...", end=" ", flush=True)
|
||||
if register_gguf(gguf, tag):
|
||||
print("ok")
|
||||
new_tags.append(tag)
|
||||
else:
|
||||
print("failed")
|
||||
|
||||
if skipped_oom:
|
||||
print(f" [info] {skipped_oom} GGUF(s) skipped (exceed {max_vram_mb} MB VRAM limit)")
|
||||
return new_tags
|
||||
|
||||
|
||||
def list_ollama_models() -> list[str]:
|
||||
"""Return model names from ollama /api/tags, filtered to text-gen candidates."""
|
||||
try:
|
||||
resp = httpx.get(f"{_OLLAMA_URL}/api/tags", timeout=10.0)
|
||||
resp.raise_for_status()
|
||||
models = resp.json().get("models", [])
|
||||
# Exclude embedding-only models
|
||||
exclude = {"mxbai-embed-large", "nomic-embed-text", "all-minilm"}
|
||||
return [
|
||||
m["name"] for m in models
|
||||
if not any(x in m["name"].lower() for x in exclude)
|
||||
]
|
||||
except Exception as exc:
|
||||
print(f"[warn] Could not reach ollama: {exc}", file=sys.stderr)
|
||||
return []
|
||||
|
||||
|
||||
# ── Run benchmark ─────────────────────────────────────────────────────────────
|
||||
|
||||
@dataclass
|
||||
class ModelResult:
|
||||
model_id: str
|
||||
prompt_results: list[dict[str, Any]] = field(default_factory=list)
|
||||
avg_score: float = 0.0
|
||||
avg_latency_ms: float = 0.0
|
||||
total_filler_hits: int = 0
|
||||
total_em_dashes: int = 0
|
||||
total_semicolons: int = 0
|
||||
|
||||
|
||||
def _bench_one_model(
|
||||
model_id: str,
|
||||
prompts: list[dict[str, str]],
|
||||
profile: Any,
|
||||
use_cforch: bool,
|
||||
cforch_url: str,
|
||||
use_vllm: bool = False,
|
||||
) -> "ModelResult | None":
|
||||
"""Run all prompts for a single model. Thread-safe — all output is prefixed with model_id.
|
||||
|
||||
Dispatch priority:
|
||||
use_vllm=True → allocate vllm via cf-orch, then generate_cftext() (OpenAI-compatible)
|
||||
use_cforch=True → allocate cf-text via cf-orch, then generate_cftext()
|
||||
else → direct ollama generate()
|
||||
Both vllm and cf-text expose /v1/chat/completions so generate_cftext() works for both.
|
||||
"""
|
||||
prefix = f"[{model_id}]"
|
||||
result = ModelResult(model_id=model_id)
|
||||
|
||||
service_url: str | None = None
|
||||
allocation_id: str = ""
|
||||
if use_vllm:
|
||||
alloc = cforch_allocate_vllm(model_id, cforch_url)
|
||||
if alloc is None:
|
||||
print(f"{prefix} [skip] vllm allocation failed", flush=True)
|
||||
return None
|
||||
service_url, allocation_id = alloc
|
||||
print(f"{prefix} vllm allocated: {service_url}", flush=True)
|
||||
elif use_cforch:
|
||||
alloc = cforch_allocate(model_id, cforch_url)
|
||||
if alloc is None:
|
||||
print(f"{prefix} [skip] cf-orch allocation failed", flush=True)
|
||||
return None
|
||||
service_url, allocation_id = alloc
|
||||
print(f"{prefix} allocated: {service_url}", flush=True)
|
||||
|
||||
try:
|
||||
for prompt_def in prompts:
|
||||
tag = prompt_def["tag"]
|
||||
user_prompt = (
|
||||
f"Thread: {prompt_def['thread_title']}\n\n"
|
||||
f"{prompt_def['thread_body']}\n\n"
|
||||
f"Write a reply:"
|
||||
)
|
||||
print(f"{prefix} [{tag}] generating...", flush=True)
|
||||
|
||||
if (use_cforch or use_vllm) and service_url:
|
||||
# Both cf-text and vllm expose /v1/chat/completions — same call
|
||||
output, elapsed_ms = generate_cftext(service_url, model_id, user_prompt, system=SYSTEM_PROMPT)
|
||||
else:
|
||||
output, elapsed_ms = generate(model_id, user_prompt, system=SYSTEM_PROMPT)
|
||||
|
||||
signals = extract_signals(output)
|
||||
score = score_against_profile(signals, profile)
|
||||
|
||||
print(f"{prefix} [{tag}] {score:.0f}/100 ({elapsed_ms:.0f}ms)", flush=True)
|
||||
if signals.filler_hits:
|
||||
print(f"{prefix} ⚠ filler: {signals.filler_hits}", flush=True)
|
||||
if signals.em_dash_count:
|
||||
print(f"{prefix} ⚠ em-dashes: {signals.em_dash_count}", flush=True)
|
||||
|
||||
result.prompt_results.append({
|
||||
"tag": tag,
|
||||
"user_prompt": user_prompt,
|
||||
"output": output,
|
||||
"signals": {
|
||||
"avg_sentence_length": signals.avg_sentence_length,
|
||||
"em_dash_count": signals.em_dash_count,
|
||||
"semicolon_count": signals.semicolon_count,
|
||||
"filler_hits": signals.filler_hits,
|
||||
"question_ratio": signals.question_ratio,
|
||||
"word_count": signals.word_count,
|
||||
},
|
||||
"score": score,
|
||||
"latency_ms": elapsed_ms,
|
||||
})
|
||||
finally:
|
||||
if (use_cforch or use_vllm) and allocation_id:
|
||||
cforch_release(allocation_id, cforch_url)
|
||||
|
||||
if not result.prompt_results:
|
||||
return None
|
||||
|
||||
scores = [r["score"] for r in result.prompt_results]
|
||||
latencies = [r["latency_ms"] for r in result.prompt_results]
|
||||
result.avg_score = sum(scores) / len(scores)
|
||||
result.avg_latency_ms = sum(latencies) / len(latencies)
|
||||
result.total_filler_hits = sum(len(r["signals"]["filler_hits"]) for r in result.prompt_results)
|
||||
result.total_em_dashes = sum(r["signals"]["em_dash_count"] for r in result.prompt_results)
|
||||
result.total_semicolons = sum(r["signals"]["semicolon_count"] for r in result.prompt_results)
|
||||
|
||||
print(f"{prefix} done — avg score {result.avg_score:.0f}/100", flush=True)
|
||||
return result
|
||||
|
||||
|
||||
def run_benchmark(
|
||||
model_ids: list[str],
|
||||
corpus_dir: Path,
|
||||
prompts: list[dict[str, str]],
|
||||
use_cforch: bool = False,
|
||||
use_vllm: bool = False,
|
||||
cforch_url: str = _CFORCH_URL,
|
||||
workers: int = 1,
|
||||
) -> list[ModelResult]:
|
||||
profile = build_corpus_profile(corpus_dir)
|
||||
if profile:
|
||||
print(f"Corpus profile loaded from {corpus_dir} ({len(list(corpus_dir.glob('*.txt')))} samples)")
|
||||
print(f" Target avg sentence length: {profile.avg_sentence_length:.1f} words")
|
||||
else:
|
||||
print(f"[warn] No corpus samples found in {corpus_dir} -- scoring on hard violations only")
|
||||
|
||||
backend = "vllm via cf-orch" if use_vllm else ("cf-text via cf-orch" if use_cforch else "ollama")
|
||||
print(f" Backend: {backend}")
|
||||
|
||||
effective_workers = min(workers, len(model_ids)) if model_ids else 1
|
||||
print(f" Workers: {effective_workers} (of {len(model_ids)} models)", flush=True)
|
||||
|
||||
results: list[ModelResult] = []
|
||||
|
||||
if effective_workers <= 1:
|
||||
# Sequential path — simpler output, easier to follow for single-model runs
|
||||
for model_id in model_ids:
|
||||
print(f"\n{'='*60}\nModel: {model_id}", flush=True)
|
||||
r = _bench_one_model(model_id, prompts, profile, use_cforch, cforch_url, use_vllm)
|
||||
if r:
|
||||
results.append(r)
|
||||
else:
|
||||
from concurrent.futures import ThreadPoolExecutor, as_completed
|
||||
print(f" Fanning out {len(model_ids)} models across {effective_workers} workers...", flush=True)
|
||||
with ThreadPoolExecutor(max_workers=effective_workers) as pool:
|
||||
futures = {
|
||||
pool.submit(_bench_one_model, mid, prompts, profile, use_cforch, cforch_url, use_vllm): mid
|
||||
for mid in model_ids
|
||||
}
|
||||
for future in as_completed(futures):
|
||||
r = future.result()
|
||||
if r:
|
||||
results.append(r)
|
||||
|
||||
return sorted(results, key=lambda r: r.avg_score, reverse=True)
|
||||
|
||||
|
||||
# ── Markdown report ───────────────────────────────────────────────────────────
|
||||
|
||||
def render_report(results: list[ModelResult], corpus_dir: Path) -> str:
|
||||
date_str = datetime.now().strftime("%Y-%m-%d %H:%M")
|
||||
lines: list[str] = [
|
||||
f"# Writing Style Benchmark Results",
|
||||
f"",
|
||||
f"**Date:** {date_str} ",
|
||||
f"**Corpus:** `{corpus_dir}` ",
|
||||
f"**Models tested:** {len(results)} ",
|
||||
f"**Prompts per model:** {len(TEST_PROMPTS)}",
|
||||
f"",
|
||||
f"## Rankings",
|
||||
f"",
|
||||
f"| Rank | Model | Score | Latency | Em-dashes | Fillers | Semicolons |",
|
||||
f"|------|-------|-------|---------|-----------|---------|------------|",
|
||||
]
|
||||
|
||||
for i, r in enumerate(results, 1):
|
||||
medal = {1: "🥇", 2: "🥈", 3: "🥉"}.get(i, f"#{i}")
|
||||
lines.append(
|
||||
f"| {medal} | `{r.model_id}` | {r.avg_score:.0f}/100 "
|
||||
f"| {r.avg_latency_ms:.0f}ms "
|
||||
f"| {r.total_em_dashes} "
|
||||
f"| {r.total_filler_hits} "
|
||||
f"| {r.total_semicolons} |"
|
||||
)
|
||||
|
||||
lines += ["", "## Sample Outputs", ""]
|
||||
|
||||
for r in results[:3]: # top 3 only to keep report readable
|
||||
lines += [f"### `{r.model_id}` (avg score: {r.avg_score:.0f})", ""]
|
||||
for pr in r.prompt_results:
|
||||
lines += [
|
||||
f"**Prompt:** {pr['tag']} ",
|
||||
f"**Score:** {pr['score']:.0f}/100 ",
|
||||
f"",
|
||||
f"```",
|
||||
pr["output"],
|
||||
f"```",
|
||||
f"",
|
||||
]
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def save_report(results: list[ModelResult], corpus_dir: Path) -> Path:
|
||||
_RESULTS_DIR.mkdir(exist_ok=True)
|
||||
date_str = datetime.now().strftime("%Y-%m-%d_%H%M")
|
||||
report_path = _RESULTS_DIR / f"style_{date_str}.md"
|
||||
report_path.write_text(render_report(results, corpus_dir), encoding="utf-8")
|
||||
|
||||
# Also save raw JSON for programmatic use
|
||||
json_path = _RESULTS_DIR / f"style_{date_str}.json"
|
||||
json_path.write_text(
|
||||
json.dumps(
|
||||
[
|
||||
{
|
||||
"model_id": r.model_id,
|
||||
"avg_score": r.avg_score,
|
||||
"avg_latency_ms": r.avg_latency_ms,
|
||||
"total_filler_hits": r.total_filler_hits,
|
||||
"total_em_dashes": r.total_em_dashes,
|
||||
"total_semicolons": r.total_semicolons,
|
||||
"prompt_results": r.prompt_results,
|
||||
}
|
||||
for r in results
|
||||
],
|
||||
indent=2,
|
||||
),
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
return report_path
|
||||
|
||||
|
||||
# ── CLI commands ──────────────────────────────────────────────────────────────
|
||||
|
||||
def cmd_list_models(_args: argparse.Namespace) -> None:
|
||||
models = list_ollama_models()
|
||||
if not models:
|
||||
print("No models found (is ollama running?)")
|
||||
return
|
||||
print(f"{len(models)} models available:\n")
|
||||
for m in models:
|
||||
print(f" {m}")
|
||||
|
||||
|
||||
def cmd_run(args: argparse.Namespace) -> None:
|
||||
corpus_dir = Path(args.samples)
|
||||
if not corpus_dir.exists():
|
||||
print(f"[error] Corpus directory not found: {corpus_dir}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
max_vram_mb: int = getattr(args, "max_vram", 7200)
|
||||
use_cforch: bool = getattr(args, "cforch", False)
|
||||
use_vllm: bool = getattr(args, "vllm", False)
|
||||
cforch_url: str = getattr(args, "cforch_url", _CFORCH_URL)
|
||||
registered_tags: list[str] = []
|
||||
|
||||
def _filter_ollama_by_size(ids: list[str], include_large: bool) -> list[str]:
|
||||
"""Apply name-pattern size filter to ollama model list."""
|
||||
if include_large:
|
||||
return ids
|
||||
skip_patterns = ["270b", "70b", "32b", "30b", "21b", "20b", "deepseek-r1"]
|
||||
filtered = [m for m in ids if not any(p in m.lower() for p in skip_patterns)]
|
||||
skipped = len(ids) - len(filtered)
|
||||
if skipped:
|
||||
print(f"[info] Skipped {skipped} large model(s) by name pattern. "
|
||||
"Pass --include-large to include them.")
|
||||
return filtered
|
||||
|
||||
if args.models and args.models != "all":
|
||||
model_ids = [m.strip() for m in args.models.split(",") if m.strip()]
|
||||
elif use_cforch:
|
||||
# cf-orch path: pull model list from catalog, filter by vram_mb
|
||||
catalog = cforch_list_catalog(cforch_url)
|
||||
if not catalog:
|
||||
print("[warn] cf-orch catalog empty or unreachable -- falling back to ollama models")
|
||||
use_cforch = False
|
||||
model_ids = _filter_ollama_by_size(list_ollama_models(), args.include_large)
|
||||
if not model_ids:
|
||||
print("[error] No models found. Pass --models explicitly or check ollama.", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
else:
|
||||
before = list(catalog.items())
|
||||
allowed = {mid: mb for mid, mb in before if mb == 0 or mb <= max_vram_mb}
|
||||
skipped_oom = {mid: mb for mid, mb in before if mid not in allowed}
|
||||
model_ids = list(allowed.keys())
|
||||
print(f"[info] cf-orch catalog: {len(before)} model(s), "
|
||||
f"{len(allowed)} within {max_vram_mb} MB VRAM limit")
|
||||
if skipped_oom:
|
||||
print(f"[info] Skipped (OOM risk): "
|
||||
+ ", ".join(f"{mid} ({mb} MB)" for mid, mb in sorted(skipped_oom.items())))
|
||||
else:
|
||||
# Ollama path
|
||||
model_ids = list_ollama_models()
|
||||
if not model_ids:
|
||||
print("[error] No models found. Pass --models explicitly or check ollama.", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
# Backfill GGUFs from disk before filtering -- skips files that exceed VRAM limit
|
||||
if getattr(args, "scan_disk", None):
|
||||
llm_root = Path(args.scan_disk)
|
||||
print(f"\nScanning {llm_root} for unregistered GGUFs (limit: {max_vram_mb} MB)...")
|
||||
registered_tags = backfill_disk_models(llm_root, set(model_ids), max_vram_mb=max_vram_mb)
|
||||
model_ids = list_ollama_models() # re-fetch with new registrations
|
||||
|
||||
model_ids = _filter_ollama_by_size(model_ids, args.include_large)
|
||||
|
||||
print(f"\nRunning writing style benchmark on {len(model_ids)} model(s)...")
|
||||
try:
|
||||
results = run_benchmark(model_ids, corpus_dir, TEST_PROMPTS, use_cforch=use_cforch, use_vllm=use_vllm, cforch_url=cforch_url, workers=args.workers)
|
||||
report_path = save_report(results, corpus_dir)
|
||||
print(f"\n{'='*60}")
|
||||
print(f"Results saved to: {report_path}")
|
||||
print(f"\n{render_report(results, corpus_dir)}")
|
||||
finally:
|
||||
if registered_tags:
|
||||
print(f"\nCleaning up {len(registered_tags)} temporary ollama registrations...")
|
||||
for tag in registered_tags:
|
||||
deregister_gguf(tag)
|
||||
|
||||
|
||||
def cmd_show_last(_args: argparse.Namespace) -> None:
|
||||
reports = sorted(_RESULTS_DIR.glob("style_*.md"), reverse=True)
|
||||
if not reports:
|
||||
print("No benchmark results found. Run --run first.")
|
||||
return
|
||||
print(reports[0].read_text(encoding="utf-8"))
|
||||
|
||||
|
||||
# ── Entry point ───────────────────────────────────────────────────────────────
|
||||
|
||||
def main() -> None:
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Writing style benchmark harness for local text-gen models",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
)
|
||||
sub = parser.add_subparsers(dest="cmd")
|
||||
|
||||
sub.add_parser("list-models", help="List available ollama models")
|
||||
|
||||
run_p = sub.add_parser("run", help="Run the benchmark")
|
||||
run_p.add_argument("--models", default="all", help="Comma-separated model IDs, or 'all'")
|
||||
run_p.add_argument("--samples", default=str(_CORPUS_DIR), help="Path to style corpus directory")
|
||||
run_p.add_argument("--include-large", action="store_true", help="Include models >20B params")
|
||||
run_p.add_argument("--scan-disk", metavar="LLM_ROOT", help="Scan directory for GGUFs not yet in ollama (e.g. /Library/Assets/LLM)")
|
||||
run_p.add_argument("--cforch", action="store_true", help="Route generation through cf-orch/cf-text instead of direct ollama")
|
||||
run_p.add_argument("--vllm", action="store_true", help="Route generation through cf-orch/vllm (OpenAI-compatible) instead of ollama")
|
||||
run_p.add_argument("--cforch-url", default=_CFORCH_URL, help=f"cf-orch coordinator URL (default: {_CFORCH_URL})")
|
||||
run_p.add_argument("--max-vram", type=int, default=7200, metavar="MB",
|
||||
help="Skip models whose VRAM footprint exceeds this limit in MB (default: 7200)")
|
||||
run_p.add_argument("--workers", type=int, default=1, metavar="N",
|
||||
help="Parallel workers — run N models simultaneously (default: 1; use 4+ with cf-orch)")
|
||||
|
||||
sub.add_parser("show-last", help="Print the most recent benchmark report")
|
||||
|
||||
# Also support legacy --list-models / --run / --show-last flags for manage.sh compat
|
||||
parser.add_argument("--list-models", action="store_true")
|
||||
parser.add_argument("--run", action="store_true")
|
||||
parser.add_argument("--show-last", action="store_true")
|
||||
parser.add_argument("--models", default="all")
|
||||
parser.add_argument("--samples", default=str(_CORPUS_DIR))
|
||||
parser.add_argument("--include-large", action="store_true")
|
||||
parser.add_argument("--scan-disk", metavar="LLM_ROOT")
|
||||
parser.add_argument("--cforch", action="store_true")
|
||||
parser.add_argument("--vllm", action="store_true")
|
||||
parser.add_argument("--cforch-url", default=_CFORCH_URL)
|
||||
parser.add_argument("--max-vram", type=int, default=7200, metavar="MB")
|
||||
parser.add_argument("--workers", type=int, default=1, metavar="N")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if args.cmd == "list-models" or args.list_models:
|
||||
cmd_list_models(args)
|
||||
elif args.cmd == "run" or args.run:
|
||||
cmd_run(args)
|
||||
elif args.cmd == "show-last" or args.show_last:
|
||||
cmd_show_last(args)
|
||||
else:
|
||||
parser.print_help()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
909
scripts/benchmark_voice.py
Normal file
909
scripts/benchmark_voice.py
Normal file
|
|
@ -0,0 +1,909 @@
|
|||
#!/usr/bin/env python
|
||||
"""
|
||||
Voice benchmark harness -- score local text-gen models for writing style match.
|
||||
|
||||
Runs each model against a set of test prompts, extracts style signals from the
|
||||
outputs, compares them to a voice corpus, and produces a ranked markdown table.
|
||||
|
||||
Usage:
|
||||
# List available ollama models
|
||||
conda run -n cf python scripts/benchmark_voice.py --list-models
|
||||
|
||||
# Run against all models with default test prompts
|
||||
conda run -n cf python scripts/benchmark_voice.py --run
|
||||
|
||||
# Run specific models only
|
||||
conda run -n cf python scripts/benchmark_voice.py --run --models mistral:7b,llama3.1:8b
|
||||
|
||||
# Use a custom corpus directory
|
||||
conda run -n cf python scripts/benchmark_voice.py --run --samples data/voice_corpus/
|
||||
|
||||
# Print last results table
|
||||
conda run -n cf python scripts/benchmark_voice.py --show-last
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import re
|
||||
import sys
|
||||
import time
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
import httpx
|
||||
|
||||
_ROOT = Path(__file__).parent.parent
|
||||
_CORPUS_DIR = _ROOT / "data" / "voice_corpus"
|
||||
_RESULTS_DIR = _ROOT / "benchmark_results"
|
||||
_OLLAMA_URL = "http://localhost:11434"
|
||||
_CFORCH_URL = "http://localhost:7700"
|
||||
|
||||
# Subdirectories under --scan-disk root that may contain GGUFs
|
||||
_SCAN_SUBDIRS = ["textgen/models", "llama.cpp/models", "cf-text/models", "vllm/models"]
|
||||
|
||||
# ── Filler phrases that should be absent from good voice-match output ─────────
|
||||
FILLER_PHRASES: list[str] = [
|
||||
"delve", "certainly", "absolutely", "i apologize", "i'd be happy to",
|
||||
"of course", "great question", "i understand", "let me know if",
|
||||
"feel free to", "it's important to note", "it's worth noting",
|
||||
"in conclusion", "to summarize", "in summary",
|
||||
]
|
||||
|
||||
# ── Test prompts: (thread_title, thread_body, context_tag) ───────────────────
|
||||
# These are representative threads that Magpie might reply to.
|
||||
# Extend this list with real examples as the corpus grows.
|
||||
TEST_PROMPTS: list[dict[str, str]] = [
|
||||
{
|
||||
"tag": "selfhosted_ai_fatigue",
|
||||
"thread_title": "Anyone else getting tired of re-explaining their setup every time an AI model forgets?",
|
||||
"thread_body": (
|
||||
"Every session I start over. My whole hardware setup, what tools I use, "
|
||||
"what I've already tried. It's exhausting. There has to be a better way."
|
||||
),
|
||||
},
|
||||
{
|
||||
"tag": "privacy_local_llm",
|
||||
"thread_title": "What's the point of running local LLMs if the apps still phone home?",
|
||||
"thread_body": (
|
||||
"I went through all the trouble of setting up ollama and now I find out "
|
||||
"the frontend I'm using is sending telemetry. Kind of defeats the purpose."
|
||||
),
|
||||
},
|
||||
{
|
||||
"tag": "solarpunk_tech",
|
||||
"thread_title": "What does solarpunk computing actually look like in practice?",
|
||||
"thread_body": (
|
||||
"I keep seeing the aesthetic but not a lot of concrete examples of "
|
||||
"people living it out with their tech choices. What does it mean day to day?"
|
||||
),
|
||||
},
|
||||
{
|
||||
"tag": "nd_tools",
|
||||
"thread_title": "Tools that actually help with executive function vs ones that just add friction",
|
||||
"thread_body": (
|
||||
"I've tried a dozen productivity apps and most of them require more "
|
||||
"executive function to maintain than they save. What actually sticks for you?"
|
||||
),
|
||||
},
|
||||
{
|
||||
"tag": "data_ownership",
|
||||
"thread_title": "Who actually owns your data when you use a 'free' AI tool?",
|
||||
"thread_body": (
|
||||
"Read the ToS on three different AI assistants today. In all three cases "
|
||||
"your inputs can be used for training, shared with partners, and retained "
|
||||
"indefinitely. At what point does 'free' just mean you're the product?"
|
||||
),
|
||||
},
|
||||
{
|
||||
"tag": "digital_culture",
|
||||
"thread_title": "The internet used to feel like it belonged to everyone. What happened?",
|
||||
"thread_body": (
|
||||
"I grew up on forums, IRC, personal homepages. Now everything is a platform "
|
||||
"owned by someone trying to extract value from the community that built it. "
|
||||
"Is the fediverse / self-hosting movement actually reversing this or just "
|
||||
"a niche hobby?"
|
||||
),
|
||||
},
|
||||
]
|
||||
|
||||
GENERATION_PARAMS: dict[str, Any] = {
|
||||
"temperature": 0.7,
|
||||
"top_p": 0.9,
|
||||
"num_predict": 300,
|
||||
}
|
||||
|
||||
SYSTEM_PROMPT = (
|
||||
"You are a writing assistant. Your job is to write a Reddit reply that matches "
|
||||
"the voice, tone, and style of the provided samples exactly.\n\n"
|
||||
"Voice characteristics:\n"
|
||||
"- Casual engineer tone. Short punchy sentences.\n"
|
||||
"- No hype, no buzzwords, no em dashes, no semicolons.\n"
|
||||
"- Community-first perspective. Solarpunk values.\n"
|
||||
"- Direct and opinionated. No throat-clearing or filler.\n"
|
||||
"- When relevant, mention personal experience with real tools.\n\n"
|
||||
"Write ONLY the reply. No preamble, no 'Here is a reply:', no meta-commentary."
|
||||
)
|
||||
|
||||
|
||||
# ── Style signal extraction ───────────────────────────────────────────────────
|
||||
|
||||
@dataclass
|
||||
class StyleSignals:
|
||||
"""Quantitative style signals extracted from a text sample."""
|
||||
sentence_count: int = 0
|
||||
word_count: int = 0
|
||||
avg_sentence_length: float = 0.0
|
||||
em_dash_count: int = 0
|
||||
semicolon_count: int = 0
|
||||
filler_hits: list[str] = field(default_factory=list)
|
||||
question_ratio: float = 0.0 # fraction of sentences ending in '?'
|
||||
first_person_ratio: float = 0.0 # fraction of sentences starting with 'I'
|
||||
avg_word_length: float = 0.0
|
||||
|
||||
|
||||
def extract_signals(text: str) -> StyleSignals:
|
||||
"""Extract style signals from a text sample."""
|
||||
text = text.strip()
|
||||
if text.startswith("[ERROR:"):
|
||||
return StyleSignals() # zero-score sentinel — caller checks for empty output
|
||||
sentences = [s.strip() for s in re.split(r'(?<=[.!?])\s+', text) if s.strip()]
|
||||
words = text.split()
|
||||
|
||||
if not sentences:
|
||||
return StyleSignals()
|
||||
|
||||
avg_sentence_length = len(words) / len(sentences) if sentences else 0.0
|
||||
avg_word_length = (sum(len(w.strip('.,!?;:"\'')) for w in words) / len(words)) if words else 0.0
|
||||
|
||||
em_dash_count = text.count('\u2014') + text.count(' -- ') + text.count('--')
|
||||
semicolon_count = text.count(';')
|
||||
|
||||
filler_hits = [p for p in FILLER_PHRASES if p.lower() in text.lower()]
|
||||
|
||||
question_ratio = sum(1 for s in sentences if s.endswith('?')) / len(sentences)
|
||||
first_person_ratio = sum(1 for s in sentences if re.match(r"^I\b", s)) / len(sentences)
|
||||
|
||||
return StyleSignals(
|
||||
sentence_count=len(sentences),
|
||||
word_count=len(words),
|
||||
avg_sentence_length=avg_sentence_length,
|
||||
em_dash_count=em_dash_count,
|
||||
semicolon_count=semicolon_count,
|
||||
filler_hits=filler_hits,
|
||||
question_ratio=question_ratio,
|
||||
first_person_ratio=first_person_ratio,
|
||||
avg_word_length=avg_word_length,
|
||||
)
|
||||
|
||||
|
||||
def build_corpus_profile(corpus_dir: Path) -> StyleSignals | None:
|
||||
"""Aggregate style signals across all corpus samples into a target profile."""
|
||||
samples = list(corpus_dir.glob("*.txt"))
|
||||
if not samples:
|
||||
return None
|
||||
|
||||
all_signals = [extract_signals(p.read_text(encoding="utf-8")) for p in samples]
|
||||
n = len(all_signals)
|
||||
|
||||
return StyleSignals(
|
||||
sentence_count=int(sum(s.sentence_count for s in all_signals) / n),
|
||||
word_count=int(sum(s.word_count for s in all_signals) / n),
|
||||
avg_sentence_length=sum(s.avg_sentence_length for s in all_signals) / n,
|
||||
em_dash_count=int(sum(s.em_dash_count for s in all_signals) / n),
|
||||
semicolon_count=int(sum(s.semicolon_count for s in all_signals) / n),
|
||||
question_ratio=sum(s.question_ratio for s in all_signals) / n,
|
||||
first_person_ratio=sum(s.first_person_ratio for s in all_signals) / n,
|
||||
avg_word_length=sum(s.avg_word_length for s in all_signals) / n,
|
||||
)
|
||||
|
||||
|
||||
def score_against_profile(output_signals: StyleSignals, profile: StyleSignals | None) -> float:
|
||||
"""Score a model output against the corpus profile. Returns 0-100.
|
||||
|
||||
Penalties:
|
||||
- Em dashes / semicolons: -5 each occurrence (hard CF style violation)
|
||||
- Filler phrases: -8 each hit (strong signal of non-voice output)
|
||||
- Sentence length delta: proportional penalty (target: close to corpus avg)
|
||||
- Word length delta: smaller penalty
|
||||
|
||||
When no corpus profile is available, falls back to absolute signal scores only.
|
||||
"""
|
||||
score = 100.0
|
||||
|
||||
# Hard violations -- always penalised regardless of corpus
|
||||
score -= output_signals.em_dash_count * 5
|
||||
score -= output_signals.semicolon_count * 3
|
||||
score -= len(output_signals.filler_hits) * 8
|
||||
|
||||
if profile is not None:
|
||||
# Sentence length delta: penalise proportionally
|
||||
length_delta = abs(output_signals.avg_sentence_length - profile.avg_sentence_length)
|
||||
score -= min(length_delta * 2, 20)
|
||||
|
||||
# Question ratio delta
|
||||
question_delta = abs(output_signals.question_ratio - profile.question_ratio)
|
||||
score -= min(question_delta * 10, 10)
|
||||
|
||||
return max(0.0, score)
|
||||
|
||||
|
||||
# ── Ollama generation ─────────────────────────────────────────────────────────
|
||||
|
||||
_CFORCH_NODE_ID = "heimdall"
|
||||
|
||||
|
||||
def cforch_list_catalog(
|
||||
cforch_url: str = _CFORCH_URL,
|
||||
node_id: str = _CFORCH_NODE_ID,
|
||||
) -> dict[str, int]:
|
||||
"""Return the cf-text catalog from cf-orch as {model_id: vram_mb}.
|
||||
|
||||
Uses ?node_id= to request the catalog from a specific node's profile,
|
||||
avoiding cross-node catalog shadowing when multiple nodes define catalogs
|
||||
for the same service.
|
||||
"""
|
||||
try:
|
||||
resp = httpx.get(
|
||||
f"{cforch_url}/api/services/cf-text/catalog",
|
||||
params={"node_id": node_id} if node_id else {},
|
||||
timeout=10.0,
|
||||
)
|
||||
resp.raise_for_status()
|
||||
raw = resp.json()
|
||||
return {
|
||||
model_id: (entry.get("vram_mb", 0) if isinstance(entry, dict) else 0)
|
||||
for model_id, entry in raw.items()
|
||||
}
|
||||
except Exception as exc:
|
||||
print(f"[warn] Could not reach cf-orch catalog at {cforch_url}: {exc}", file=sys.stderr)
|
||||
return {}
|
||||
|
||||
|
||||
def _cforch_allocate_service(
|
||||
service: str,
|
||||
model_id: str,
|
||||
cforch_url: str,
|
||||
startup_timeout_s: float,
|
||||
health_path: str,
|
||||
) -> tuple[str, str] | None:
|
||||
"""Generic cf-orch allocate + health-poll. Returns (service_url, allocation_id) or None."""
|
||||
try:
|
||||
resp = httpx.post(
|
||||
f"{cforch_url}/api/services/{service}/allocate",
|
||||
json={
|
||||
"model_candidates": [model_id],
|
||||
"caller": "avocet",
|
||||
"pipeline": "voice_benchmark",
|
||||
},
|
||||
timeout=120.0,
|
||||
)
|
||||
resp.raise_for_status()
|
||||
data = resp.json()
|
||||
service_url: str = data["url"]
|
||||
allocation_id: str = data.get("allocation_id", "")
|
||||
|
||||
if data.get("started", False) and not data.get("warm", True):
|
||||
label = service
|
||||
print(f" [cold start] waiting for {label} to load {model_id!r}...", end=" ", flush=True)
|
||||
deadline = time.monotonic() + startup_timeout_s
|
||||
while time.monotonic() < deadline:
|
||||
try:
|
||||
health = httpx.get(f"{service_url}{health_path}", timeout=3.0)
|
||||
if health.is_success:
|
||||
print(f"ready ({time.monotonic() - (deadline - startup_timeout_s):.0f}s)", flush=True)
|
||||
break
|
||||
except Exception:
|
||||
pass
|
||||
time.sleep(2.0)
|
||||
else:
|
||||
print(f"timed out after {startup_timeout_s:.0f}s", flush=True)
|
||||
return None
|
||||
|
||||
return service_url, allocation_id
|
||||
except Exception as exc:
|
||||
print(f"[warn] cf-orch allocation failed for {model_id!r} ({service}): {exc}", file=sys.stderr)
|
||||
return None
|
||||
|
||||
|
||||
def cforch_allocate(
|
||||
model_id: str,
|
||||
cforch_url: str = _CFORCH_URL,
|
||||
startup_timeout_s: float = 180.0,
|
||||
) -> tuple[str, str] | None:
|
||||
"""Allocate a cf-text instance for model_id. Returns (service_url, allocation_id) or None."""
|
||||
return _cforch_allocate_service("cf-text", model_id, cforch_url, startup_timeout_s, "/health")
|
||||
|
||||
|
||||
def cforch_allocate_vllm(
|
||||
model_id: str,
|
||||
cforch_url: str = _CFORCH_URL,
|
||||
startup_timeout_s: float = 300.0,
|
||||
) -> tuple[str, str] | None:
|
||||
"""Allocate a vllm instance for model_id. Returns (service_url, allocation_id) or None.
|
||||
|
||||
vllm exposes an OpenAI-compatible API — generate_cftext() works unchanged
|
||||
against the returned service_url. Startup timeout is longer (300s) because
|
||||
vllm loads large model weights from disk before becoming ready.
|
||||
"""
|
||||
return _cforch_allocate_service("vllm", model_id, cforch_url, startup_timeout_s, "/health")
|
||||
|
||||
|
||||
def cforch_release(allocation_id: str, cforch_url: str = _CFORCH_URL) -> None:
|
||||
"""Release a cf-orch allocation."""
|
||||
if not allocation_id:
|
||||
return
|
||||
try:
|
||||
httpx.post(f"{cforch_url}/api/leases/{allocation_id}/release", timeout=10.0)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
|
||||
def generate_cftext(
|
||||
service_url: str,
|
||||
model_id: str,
|
||||
prompt: str,
|
||||
system: str = "",
|
||||
) -> tuple[str, float]:
|
||||
"""Call cf-text via OpenAI-compatible /v1/chat/completions. Returns (text, elapsed_ms)."""
|
||||
messages: list[dict[str, str]] = []
|
||||
if system:
|
||||
messages.append({"role": "system", "content": system})
|
||||
messages.append({"role": "user", "content": prompt})
|
||||
|
||||
payload: dict[str, Any] = {
|
||||
"model": model_id,
|
||||
"messages": messages,
|
||||
"max_tokens": GENERATION_PARAMS.get("num_predict", 300),
|
||||
"temperature": GENERATION_PARAMS.get("temperature", 0.7),
|
||||
"top_p": GENERATION_PARAMS.get("top_p", 0.9),
|
||||
"stream": False,
|
||||
}
|
||||
|
||||
t0 = time.monotonic()
|
||||
try:
|
||||
resp = httpx.post(
|
||||
f"{service_url.rstrip('/')}/v1/chat/completions",
|
||||
json=payload,
|
||||
timeout=180.0,
|
||||
)
|
||||
resp.raise_for_status()
|
||||
elapsed_ms = (time.monotonic() - t0) * 1000
|
||||
content = resp.json()["choices"][0]["message"]["content"]
|
||||
return content.strip(), elapsed_ms
|
||||
except Exception as exc:
|
||||
elapsed_ms = (time.monotonic() - t0) * 1000
|
||||
return f"[ERROR: {exc}]", elapsed_ms
|
||||
|
||||
|
||||
def generate(model_id: str, prompt: str, system: str = "") -> tuple[str, float]:
|
||||
"""Call ollama /api/generate. Returns (text, elapsed_ms)."""
|
||||
payload: dict[str, Any] = {
|
||||
"model": model_id,
|
||||
"prompt": prompt,
|
||||
"stream": False,
|
||||
"options": GENERATION_PARAMS,
|
||||
}
|
||||
if system:
|
||||
payload["system"] = system
|
||||
|
||||
t0 = time.monotonic()
|
||||
try:
|
||||
resp = httpx.post(
|
||||
f"{_OLLAMA_URL}/api/generate",
|
||||
json=payload,
|
||||
timeout=120.0,
|
||||
)
|
||||
resp.raise_for_status()
|
||||
elapsed_ms = (time.monotonic() - t0) * 1000
|
||||
return resp.json().get("response", "").strip(), elapsed_ms
|
||||
except Exception as exc:
|
||||
elapsed_ms = (time.monotonic() - t0) * 1000
|
||||
return f"[ERROR: {exc}]", elapsed_ms
|
||||
|
||||
|
||||
def find_disk_ggufs(llm_root: Path) -> list[Path]:
|
||||
"""Recursively find .gguf files under known subdirs of llm_root.
|
||||
|
||||
Skips vocab-only GGUFs (ggml-vocab-*) which aren't standalone models.
|
||||
"""
|
||||
found: list[Path] = []
|
||||
search_dirs = [llm_root / sub for sub in _SCAN_SUBDIRS] + [llm_root]
|
||||
seen: set[Path] = set()
|
||||
for base in search_dirs:
|
||||
if not base.exists():
|
||||
continue
|
||||
for gguf in base.rglob("*.gguf"):
|
||||
if gguf in seen:
|
||||
continue
|
||||
seen.add(gguf)
|
||||
if gguf.name.startswith("ggml-vocab-"):
|
||||
continue
|
||||
found.append(gguf)
|
||||
return sorted(found)
|
||||
|
||||
|
||||
def gguf_to_ollama_tag(gguf_path: Path) -> str:
|
||||
"""Derive a stable ollama tag from a GGUF path.
|
||||
|
||||
Uses parent dir name + stem to avoid collisions, e.g.:
|
||||
claude-3.7-sonnet-reasoning-gemma3-12B/foo.Q8_0.gguf
|
||||
→ bench-claude-3.7-sonnet-reasoning-gemma3-12b-foo-q8-0
|
||||
"""
|
||||
parent = gguf_path.parent.name.lower()
|
||||
stem = gguf_path.stem.lower()
|
||||
# If stem is contained in parent (common pattern), just use parent
|
||||
slug = parent if stem.replace("-", "").replace("_", "") in parent.replace("-", "").replace("_", "") else f"{parent}-{stem}"
|
||||
slug = re.sub(r"[^a-z0-9]+", "-", slug).strip("-")
|
||||
return f"bench-{slug}:latest"
|
||||
|
||||
|
||||
def register_gguf(gguf_path: Path, tag: str) -> bool:
|
||||
"""Create a temporary ollama model entry from a GGUF file. Returns True on success."""
|
||||
import subprocess
|
||||
import tempfile
|
||||
modelfile = f"FROM {gguf_path.resolve()}\n"
|
||||
with tempfile.NamedTemporaryFile(mode="w", suffix=".Modelfile", delete=False) as f:
|
||||
f.write(modelfile)
|
||||
modelfile_path = f.name
|
||||
try:
|
||||
result = subprocess.run(
|
||||
["ollama", "create", tag, "-f", modelfile_path],
|
||||
capture_output=True, text=True, timeout=60,
|
||||
)
|
||||
return result.returncode == 0
|
||||
except Exception as exc:
|
||||
print(f"[warn] Could not register {gguf_path.name}: {exc}", file=sys.stderr)
|
||||
return False
|
||||
finally:
|
||||
Path(modelfile_path).unlink(missing_ok=True)
|
||||
|
||||
|
||||
def deregister_gguf(tag: str) -> None:
|
||||
"""Remove a temporary ollama model entry."""
|
||||
import subprocess
|
||||
try:
|
||||
subprocess.run(["ollama", "rm", tag], capture_output=True, timeout=30)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
|
||||
def backfill_disk_models(
|
||||
llm_root: Path,
|
||||
existing_tags: set[str],
|
||||
max_vram_mb: int = 0,
|
||||
) -> list[str]:
|
||||
"""Register GGUFs from disk that aren't already in ollama. Returns new tags.
|
||||
|
||||
max_vram_mb: skip files whose size exceeds this threshold (0 = no limit).
|
||||
GGUF file size is a reliable VRAM proxy -- quantized weights load ~1:1.
|
||||
"""
|
||||
ggufs = find_disk_ggufs(llm_root)
|
||||
if not ggufs:
|
||||
print(f"No .gguf files found under {llm_root}", file=sys.stderr)
|
||||
return []
|
||||
|
||||
new_tags: list[str] = []
|
||||
skipped_oom = 0
|
||||
for gguf in ggufs:
|
||||
size_mb = gguf.stat().st_size // (1024 * 1024)
|
||||
if max_vram_mb and size_mb > max_vram_mb:
|
||||
print(f" [skip-oom] {gguf.name} ({size_mb} MB > {max_vram_mb} MB limit)")
|
||||
skipped_oom += 1
|
||||
continue
|
||||
tag = gguf_to_ollama_tag(gguf)
|
||||
if tag in existing_tags:
|
||||
print(f" [skip] {gguf.name} already registered as {tag}")
|
||||
continue
|
||||
print(f" [register] {gguf.name} ({size_mb} MB) → {tag} ...", end=" ", flush=True)
|
||||
if register_gguf(gguf, tag):
|
||||
print("ok")
|
||||
new_tags.append(tag)
|
||||
else:
|
||||
print("failed")
|
||||
|
||||
if skipped_oom:
|
||||
print(f" [info] {skipped_oom} GGUF(s) skipped (exceed {max_vram_mb} MB VRAM limit)")
|
||||
return new_tags
|
||||
|
||||
|
||||
def list_ollama_models() -> list[str]:
|
||||
"""Return model names from ollama /api/tags, filtered to text-gen candidates."""
|
||||
try:
|
||||
resp = httpx.get(f"{_OLLAMA_URL}/api/tags", timeout=10.0)
|
||||
resp.raise_for_status()
|
||||
models = resp.json().get("models", [])
|
||||
# Exclude embedding-only models
|
||||
exclude = {"mxbai-embed-large", "nomic-embed-text", "all-minilm"}
|
||||
return [
|
||||
m["name"] for m in models
|
||||
if not any(x in m["name"].lower() for x in exclude)
|
||||
]
|
||||
except Exception as exc:
|
||||
print(f"[warn] Could not reach ollama: {exc}", file=sys.stderr)
|
||||
return []
|
||||
|
||||
|
||||
# ── Run benchmark ─────────────────────────────────────────────────────────────
|
||||
|
||||
@dataclass
|
||||
class ModelResult:
|
||||
model_id: str
|
||||
prompt_results: list[dict[str, Any]] = field(default_factory=list)
|
||||
avg_score: float = 0.0
|
||||
avg_latency_ms: float = 0.0
|
||||
total_filler_hits: int = 0
|
||||
total_em_dashes: int = 0
|
||||
total_semicolons: int = 0
|
||||
|
||||
|
||||
def _bench_one_model(
|
||||
model_id: str,
|
||||
prompts: list[dict[str, str]],
|
||||
profile: Any,
|
||||
use_cforch: bool,
|
||||
cforch_url: str,
|
||||
use_vllm: bool = False,
|
||||
) -> "ModelResult | None":
|
||||
"""Run all prompts for a single model. Thread-safe — all output is prefixed with model_id.
|
||||
|
||||
Dispatch priority:
|
||||
use_vllm=True → allocate vllm via cf-orch, then generate_cftext() (OpenAI-compatible)
|
||||
use_cforch=True → allocate cf-text via cf-orch, then generate_cftext()
|
||||
else → direct ollama generate()
|
||||
Both vllm and cf-text expose /v1/chat/completions so generate_cftext() works for both.
|
||||
"""
|
||||
prefix = f"[{model_id}]"
|
||||
result = ModelResult(model_id=model_id)
|
||||
|
||||
service_url: str | None = None
|
||||
allocation_id: str = ""
|
||||
if use_vllm:
|
||||
alloc = cforch_allocate_vllm(model_id, cforch_url)
|
||||
if alloc is None:
|
||||
print(f"{prefix} [skip] vllm allocation failed", flush=True)
|
||||
return None
|
||||
service_url, allocation_id = alloc
|
||||
print(f"{prefix} vllm allocated: {service_url}", flush=True)
|
||||
elif use_cforch:
|
||||
alloc = cforch_allocate(model_id, cforch_url)
|
||||
if alloc is None:
|
||||
print(f"{prefix} [skip] cf-orch allocation failed", flush=True)
|
||||
return None
|
||||
service_url, allocation_id = alloc
|
||||
print(f"{prefix} allocated: {service_url}", flush=True)
|
||||
|
||||
try:
|
||||
for prompt_def in prompts:
|
||||
tag = prompt_def["tag"]
|
||||
user_prompt = (
|
||||
f"Thread: {prompt_def['thread_title']}\n\n"
|
||||
f"{prompt_def['thread_body']}\n\n"
|
||||
f"Write a reply:"
|
||||
)
|
||||
print(f"{prefix} [{tag}] generating...", flush=True)
|
||||
|
||||
if (use_cforch or use_vllm) and service_url:
|
||||
# Both cf-text and vllm expose /v1/chat/completions — same call
|
||||
output, elapsed_ms = generate_cftext(service_url, model_id, user_prompt, system=SYSTEM_PROMPT)
|
||||
else:
|
||||
output, elapsed_ms = generate(model_id, user_prompt, system=SYSTEM_PROMPT)
|
||||
|
||||
signals = extract_signals(output)
|
||||
score = score_against_profile(signals, profile)
|
||||
|
||||
print(f"{prefix} [{tag}] {score:.0f}/100 ({elapsed_ms:.0f}ms)", flush=True)
|
||||
if signals.filler_hits:
|
||||
print(f"{prefix} ⚠ filler: {signals.filler_hits}", flush=True)
|
||||
if signals.em_dash_count:
|
||||
print(f"{prefix} ⚠ em-dashes: {signals.em_dash_count}", flush=True)
|
||||
|
||||
result.prompt_results.append({
|
||||
"tag": tag,
|
||||
"user_prompt": user_prompt,
|
||||
"output": output,
|
||||
"signals": {
|
||||
"avg_sentence_length": signals.avg_sentence_length,
|
||||
"em_dash_count": signals.em_dash_count,
|
||||
"semicolon_count": signals.semicolon_count,
|
||||
"filler_hits": signals.filler_hits,
|
||||
"question_ratio": signals.question_ratio,
|
||||
"word_count": signals.word_count,
|
||||
},
|
||||
"score": score,
|
||||
"latency_ms": elapsed_ms,
|
||||
})
|
||||
finally:
|
||||
if use_cforch and allocation_id:
|
||||
cforch_release(allocation_id, cforch_url)
|
||||
|
||||
if not result.prompt_results:
|
||||
return None
|
||||
|
||||
scores = [r["score"] for r in result.prompt_results]
|
||||
latencies = [r["latency_ms"] for r in result.prompt_results]
|
||||
result.avg_score = sum(scores) / len(scores)
|
||||
result.avg_latency_ms = sum(latencies) / len(latencies)
|
||||
result.total_filler_hits = sum(len(r["signals"]["filler_hits"]) for r in result.prompt_results)
|
||||
result.total_em_dashes = sum(r["signals"]["em_dash_count"] for r in result.prompt_results)
|
||||
result.total_semicolons = sum(r["signals"]["semicolon_count"] for r in result.prompt_results)
|
||||
|
||||
print(f"{prefix} done — avg score {result.avg_score:.0f}/100", flush=True)
|
||||
return result
|
||||
|
||||
|
||||
def run_benchmark(
|
||||
model_ids: list[str],
|
||||
corpus_dir: Path,
|
||||
prompts: list[dict[str, str]],
|
||||
use_cforch: bool = False,
|
||||
use_vllm: bool = False,
|
||||
cforch_url: str = _CFORCH_URL,
|
||||
workers: int = 1,
|
||||
) -> list[ModelResult]:
|
||||
profile = build_corpus_profile(corpus_dir)
|
||||
if profile:
|
||||
print(f"Corpus profile loaded from {corpus_dir} ({len(list(corpus_dir.glob('*.txt')))} samples)")
|
||||
print(f" Target avg sentence length: {profile.avg_sentence_length:.1f} words")
|
||||
else:
|
||||
print(f"[warn] No corpus samples found in {corpus_dir} -- scoring on hard violations only")
|
||||
|
||||
backend = "vllm via cf-orch" if use_vllm else ("cf-text via cf-orch" if use_cforch else "ollama")
|
||||
print(f" Backend: {backend}")
|
||||
|
||||
effective_workers = min(workers, len(model_ids)) if model_ids else 1
|
||||
print(f" Workers: {effective_workers} (of {len(model_ids)} models)", flush=True)
|
||||
|
||||
results: list[ModelResult] = []
|
||||
|
||||
if effective_workers <= 1:
|
||||
# Sequential path — simpler output, easier to follow for single-model runs
|
||||
for model_id in model_ids:
|
||||
print(f"\n{'='*60}\nModel: {model_id}", flush=True)
|
||||
r = _bench_one_model(model_id, prompts, profile, use_cforch, cforch_url, use_vllm)
|
||||
if r:
|
||||
results.append(r)
|
||||
else:
|
||||
from concurrent.futures import ThreadPoolExecutor, as_completed
|
||||
print(f" Fanning out {len(model_ids)} models across {effective_workers} workers...", flush=True)
|
||||
with ThreadPoolExecutor(max_workers=effective_workers) as pool:
|
||||
futures = {
|
||||
pool.submit(_bench_one_model, mid, prompts, profile, use_cforch, cforch_url, use_vllm): mid
|
||||
for mid in model_ids
|
||||
}
|
||||
for future in as_completed(futures):
|
||||
r = future.result()
|
||||
if r:
|
||||
results.append(r)
|
||||
|
||||
return sorted(results, key=lambda r: r.avg_score, reverse=True)
|
||||
|
||||
|
||||
# ── Markdown report ───────────────────────────────────────────────────────────
|
||||
|
||||
def render_report(results: list[ModelResult], corpus_dir: Path) -> str:
|
||||
date_str = datetime.now().strftime("%Y-%m-%d %H:%M")
|
||||
lines: list[str] = [
|
||||
f"# Voice Benchmark Results",
|
||||
f"",
|
||||
f"**Date:** {date_str} ",
|
||||
f"**Corpus:** `{corpus_dir}` ",
|
||||
f"**Models tested:** {len(results)} ",
|
||||
f"**Prompts per model:** {len(TEST_PROMPTS)}",
|
||||
f"",
|
||||
f"## Rankings",
|
||||
f"",
|
||||
f"| Rank | Model | Score | Latency | Em-dashes | Fillers | Semicolons |",
|
||||
f"|------|-------|-------|---------|-----------|---------|------------|",
|
||||
]
|
||||
|
||||
for i, r in enumerate(results, 1):
|
||||
medal = {1: "🥇", 2: "🥈", 3: "🥉"}.get(i, f"#{i}")
|
||||
lines.append(
|
||||
f"| {medal} | `{r.model_id}` | {r.avg_score:.0f}/100 "
|
||||
f"| {r.avg_latency_ms:.0f}ms "
|
||||
f"| {r.total_em_dashes} "
|
||||
f"| {r.total_filler_hits} "
|
||||
f"| {r.total_semicolons} |"
|
||||
)
|
||||
|
||||
lines += ["", "## Sample Outputs", ""]
|
||||
|
||||
for r in results[:3]: # top 3 only to keep report readable
|
||||
lines += [f"### `{r.model_id}` (avg score: {r.avg_score:.0f})", ""]
|
||||
for pr in r.prompt_results:
|
||||
lines += [
|
||||
f"**Prompt:** {pr['tag']} ",
|
||||
f"**Score:** {pr['score']:.0f}/100 ",
|
||||
f"",
|
||||
f"```",
|
||||
pr["output"],
|
||||
f"```",
|
||||
f"",
|
||||
]
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def save_report(results: list[ModelResult], corpus_dir: Path) -> Path:
|
||||
_RESULTS_DIR.mkdir(exist_ok=True)
|
||||
date_str = datetime.now().strftime("%Y-%m-%d_%H%M")
|
||||
report_path = _RESULTS_DIR / f"voice_{date_str}.md"
|
||||
report_path.write_text(render_report(results, corpus_dir), encoding="utf-8")
|
||||
|
||||
# Also save raw JSON for programmatic use
|
||||
json_path = _RESULTS_DIR / f"voice_{date_str}.json"
|
||||
json_path.write_text(
|
||||
json.dumps(
|
||||
[
|
||||
{
|
||||
"model_id": r.model_id,
|
||||
"avg_score": r.avg_score,
|
||||
"avg_latency_ms": r.avg_latency_ms,
|
||||
"total_filler_hits": r.total_filler_hits,
|
||||
"total_em_dashes": r.total_em_dashes,
|
||||
"total_semicolons": r.total_semicolons,
|
||||
"prompt_results": r.prompt_results,
|
||||
}
|
||||
for r in results
|
||||
],
|
||||
indent=2,
|
||||
),
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
return report_path
|
||||
|
||||
|
||||
# ── CLI commands ──────────────────────────────────────────────────────────────
|
||||
|
||||
def cmd_list_models(_args: argparse.Namespace) -> None:
|
||||
models = list_ollama_models()
|
||||
if not models:
|
||||
print("No models found (is ollama running?)")
|
||||
return
|
||||
print(f"{len(models)} models available:\n")
|
||||
for m in models:
|
||||
print(f" {m}")
|
||||
|
||||
|
||||
def cmd_run(args: argparse.Namespace) -> None:
|
||||
corpus_dir = Path(args.samples)
|
||||
if not corpus_dir.exists():
|
||||
print(f"[error] Corpus directory not found: {corpus_dir}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
max_vram_mb: int = getattr(args, "max_vram", 7200)
|
||||
use_cforch: bool = getattr(args, "cforch", False)
|
||||
use_vllm: bool = getattr(args, "vllm", False)
|
||||
cforch_url: str = getattr(args, "cforch_url", _CFORCH_URL)
|
||||
registered_tags: list[str] = []
|
||||
|
||||
def _filter_ollama_by_size(ids: list[str], include_large: bool) -> list[str]:
|
||||
"""Apply name-pattern size filter to ollama model list."""
|
||||
if include_large:
|
||||
return ids
|
||||
skip_patterns = ["270b", "70b", "32b", "30b", "21b", "20b", "deepseek-r1"]
|
||||
filtered = [m for m in ids if not any(p in m.lower() for p in skip_patterns)]
|
||||
skipped = len(ids) - len(filtered)
|
||||
if skipped:
|
||||
print(f"[info] Skipped {skipped} large model(s) by name pattern. "
|
||||
"Pass --include-large to include them.")
|
||||
return filtered
|
||||
|
||||
if args.models and args.models != "all":
|
||||
model_ids = [m.strip() for m in args.models.split(",") if m.strip()]
|
||||
elif use_cforch:
|
||||
# cf-orch path: pull model list from catalog, filter by vram_mb
|
||||
catalog = cforch_list_catalog(cforch_url)
|
||||
if not catalog:
|
||||
print("[warn] cf-orch catalog empty or unreachable -- falling back to ollama models")
|
||||
use_cforch = False
|
||||
model_ids = _filter_ollama_by_size(list_ollama_models(), args.include_large)
|
||||
if not model_ids:
|
||||
print("[error] No models found. Pass --models explicitly or check ollama.", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
else:
|
||||
before = list(catalog.items())
|
||||
allowed = {mid: mb for mid, mb in before if mb == 0 or mb <= max_vram_mb}
|
||||
skipped_oom = {mid: mb for mid, mb in before if mid not in allowed}
|
||||
model_ids = list(allowed.keys())
|
||||
print(f"[info] cf-orch catalog: {len(before)} model(s), "
|
||||
f"{len(allowed)} within {max_vram_mb} MB VRAM limit")
|
||||
if skipped_oom:
|
||||
print(f"[info] Skipped (OOM risk): "
|
||||
+ ", ".join(f"{mid} ({mb} MB)" for mid, mb in sorted(skipped_oom.items())))
|
||||
else:
|
||||
# Ollama path
|
||||
model_ids = list_ollama_models()
|
||||
if not model_ids:
|
||||
print("[error] No models found. Pass --models explicitly or check ollama.", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
# Backfill GGUFs from disk before filtering -- skips files that exceed VRAM limit
|
||||
if getattr(args, "scan_disk", None):
|
||||
llm_root = Path(args.scan_disk)
|
||||
print(f"\nScanning {llm_root} for unregistered GGUFs (limit: {max_vram_mb} MB)...")
|
||||
registered_tags = backfill_disk_models(llm_root, set(model_ids), max_vram_mb=max_vram_mb)
|
||||
model_ids = list_ollama_models() # re-fetch with new registrations
|
||||
|
||||
model_ids = _filter_ollama_by_size(model_ids, args.include_large)
|
||||
|
||||
print(f"\nRunning voice benchmark on {len(model_ids)} model(s)...")
|
||||
try:
|
||||
results = run_benchmark(model_ids, corpus_dir, TEST_PROMPTS, use_cforch=use_cforch, use_vllm=use_vllm, cforch_url=cforch_url, workers=args.workers)
|
||||
report_path = save_report(results, corpus_dir)
|
||||
print(f"\n{'='*60}")
|
||||
print(f"Results saved to: {report_path}")
|
||||
print(f"\n{render_report(results, corpus_dir)}")
|
||||
finally:
|
||||
if registered_tags:
|
||||
print(f"\nCleaning up {len(registered_tags)} temporary ollama registrations...")
|
||||
for tag in registered_tags:
|
||||
deregister_gguf(tag)
|
||||
|
||||
|
||||
def cmd_show_last(_args: argparse.Namespace) -> None:
|
||||
reports = sorted(_RESULTS_DIR.glob("voice_*.md"), reverse=True)
|
||||
if not reports:
|
||||
print("No benchmark results found. Run --run first.")
|
||||
return
|
||||
print(reports[0].read_text(encoding="utf-8"))
|
||||
|
||||
|
||||
# ── Entry point ───────────────────────────────────────────────────────────────
|
||||
|
||||
def main() -> None:
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Voice benchmark harness for local text-gen models",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
)
|
||||
sub = parser.add_subparsers(dest="cmd")
|
||||
|
||||
sub.add_parser("list-models", help="List available ollama models")
|
||||
|
||||
run_p = sub.add_parser("run", help="Run the benchmark")
|
||||
run_p.add_argument("--models", default="all", help="Comma-separated model IDs, or 'all'")
|
||||
run_p.add_argument("--samples", default=str(_CORPUS_DIR), help="Path to voice corpus directory")
|
||||
run_p.add_argument("--include-large", action="store_true", help="Include models >20B params")
|
||||
run_p.add_argument("--scan-disk", metavar="LLM_ROOT", help="Scan directory for GGUFs not yet in ollama (e.g. /Library/Assets/LLM)")
|
||||
run_p.add_argument("--cforch", action="store_true", help="Route generation through cf-orch/cf-text instead of direct ollama")
|
||||
run_p.add_argument("--vllm", action="store_true", help="Route generation through cf-orch/vllm (OpenAI-compatible) instead of ollama")
|
||||
run_p.add_argument("--cforch-url", default=_CFORCH_URL, help=f"cf-orch coordinator URL (default: {_CFORCH_URL})")
|
||||
run_p.add_argument("--max-vram", type=int, default=7200, metavar="MB",
|
||||
help="Skip models whose VRAM footprint exceeds this limit in MB (default: 7200)")
|
||||
run_p.add_argument("--workers", type=int, default=1, metavar="N",
|
||||
help="Parallel workers — run N models simultaneously (default: 1; use 4+ with cf-orch)")
|
||||
|
||||
sub.add_parser("show-last", help="Print the most recent benchmark report")
|
||||
|
||||
# Also support legacy --list-models / --run / --show-last flags for manage.sh compat
|
||||
parser.add_argument("--list-models", action="store_true")
|
||||
parser.add_argument("--run", action="store_true")
|
||||
parser.add_argument("--show-last", action="store_true")
|
||||
parser.add_argument("--models", default="all")
|
||||
parser.add_argument("--samples", default=str(_CORPUS_DIR))
|
||||
parser.add_argument("--include-large", action="store_true")
|
||||
parser.add_argument("--scan-disk", metavar="LLM_ROOT")
|
||||
parser.add_argument("--cforch", action="store_true")
|
||||
parser.add_argument("--vllm", action="store_true")
|
||||
parser.add_argument("--cforch-url", default=_CFORCH_URL)
|
||||
parser.add_argument("--max-vram", type=int, default=7200, metavar="MB")
|
||||
parser.add_argument("--workers", type=int, default=1, metavar="N")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if args.cmd == "list-models" or args.list_models:
|
||||
cmd_list_models(args)
|
||||
elif args.cmd == "run" or args.run:
|
||||
cmd_run(args)
|
||||
elif args.cmd == "show-last" or args.show_last:
|
||||
cmd_show_last(args)
|
||||
else:
|
||||
parser.print_help()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
355
scripts/gather_corpus.py
Normal file
355
scripts/gather_corpus.py
Normal file
|
|
@ -0,0 +1,355 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
Corpus gatherer for the voice benchmark fine-tune pipeline.
|
||||
|
||||
Pulls writing samples from multiple sources and drops .txt files into
|
||||
data/voice_corpus/ in the format expected by benchmark_voice.py.
|
||||
|
||||
Sources:
|
||||
- Reddit: u/pyr0ball post history + comment history (public JSON API)
|
||||
- Campaign copy: claude-bridge/reddit-poster/campaigns/*.py (BODY strings)
|
||||
- Documents: brainmap, homeprojects notes, selected personal writing
|
||||
- Discord: requires manual export (see instructions below)
|
||||
|
||||
Usage:
|
||||
# Full gather (Reddit + local sources)
|
||||
conda run -n cf python scripts/gather_corpus.py
|
||||
|
||||
# Reddit only
|
||||
conda run -n cf python scripts/gather_corpus.py --source reddit
|
||||
|
||||
# Local files only (no network)
|
||||
conda run -n cf python scripts/gather_corpus.py --source local
|
||||
|
||||
# Process a Discord data export zip
|
||||
conda run -n cf python scripts/gather_corpus.py --discord /path/to/discord-export.zip
|
||||
|
||||
Discord export instructions:
|
||||
Discord Settings → Privacy & Safety → Request all my data
|
||||
Wait for email, download zip, then run with --discord flag.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import ast
|
||||
import json
|
||||
import re
|
||||
import time
|
||||
import zipfile
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
import httpx
|
||||
|
||||
# ------------------------------------------------------------------ #
|
||||
# Paths
|
||||
# ------------------------------------------------------------------ #
|
||||
|
||||
_ROOT = Path(__file__).parent.parent
|
||||
_CORPUS_DIR = _ROOT / "data" / "style_corpus"
|
||||
_CLAUDE_BRIDGE = Path("/Library/Development/CircuitForge/claude-bridge")
|
||||
_DOCUMENTS = Path("/Library/Documents")
|
||||
|
||||
_REDDIT_USER = "pyr0ball"
|
||||
_USER_AGENT = "Avocet/0.1 corpus-gatherer (CircuitForge; personal research)"
|
||||
_REDDIT_BASE = "https://www.reddit.com"
|
||||
|
||||
# Minimum character length to include a sample (filters out one-liners)
|
||||
_MIN_LENGTH = 80
|
||||
|
||||
# Phrases that suggest AI-generated content — skip these
|
||||
_AI_TELLS = [
|
||||
"certainly!", "absolutely!", "great question", "i'd be happy to",
|
||||
"i apologize for", "it's worth noting", "in conclusion,",
|
||||
"feel free to reach out",
|
||||
]
|
||||
|
||||
|
||||
# ------------------------------------------------------------------ #
|
||||
# Helpers
|
||||
# ------------------------------------------------------------------ #
|
||||
|
||||
def _is_ai_generated(text: str) -> bool:
|
||||
lower = text.lower()
|
||||
return any(phrase in lower for phrase in _AI_TELLS)
|
||||
|
||||
|
||||
def _clean(text: str) -> str:
|
||||
"""Strip Reddit formatting artifacts and normalize whitespace."""
|
||||
text = re.sub(r"\[deleted\]|\[removed\]", "", text)
|
||||
text = re.sub(r"\s+", " ", text).strip()
|
||||
return text
|
||||
|
||||
|
||||
def _write_corpus_file(filename: str, samples: list[str], source_label: str) -> None:
|
||||
"""Write samples to a corpus .txt file with minimal separators."""
|
||||
path = _CORPUS_DIR / filename
|
||||
kept = [s for s in samples if len(s) >= _MIN_LENGTH and not _is_ai_generated(s)]
|
||||
if not kept:
|
||||
print(f" [skip] {filename} — no samples passed filters")
|
||||
return
|
||||
separator = "\n\n---\n\n"
|
||||
path.write_text(separator.join(kept), encoding="utf-8")
|
||||
print(f" [ok] {filename} — {len(kept)} samples ({path.stat().st_size // 1024}KB)")
|
||||
|
||||
|
||||
# ------------------------------------------------------------------ #
|
||||
# Reddit source
|
||||
# ------------------------------------------------------------------ #
|
||||
|
||||
def _reddit_fetch_page(
|
||||
client: httpx.Client,
|
||||
listing_type: str,
|
||||
after: str | None,
|
||||
) -> tuple[list[dict[str, Any]], str | None]:
|
||||
"""Fetch one page of a user's submitted posts or comments."""
|
||||
params: dict[str, Any] = {"limit": 100, "raw_json": 1}
|
||||
if after:
|
||||
params["after"] = after
|
||||
url = f"{_REDDIT_BASE}/user/{_REDDIT_USER}/{listing_type}.json"
|
||||
resp = client.get(url, params=params)
|
||||
resp.raise_for_status()
|
||||
data = resp.json()
|
||||
children = data["data"]["children"]
|
||||
new_after = data["data"].get("after")
|
||||
return [c["data"] for c in children], new_after
|
||||
|
||||
|
||||
def _reddit_fetch_all(listing_type: str, max_items: int = 1000) -> list[dict[str, Any]]:
|
||||
"""Paginate through a user listing until exhausted or max_items reached."""
|
||||
items: list[dict[str, Any]] = []
|
||||
after: str | None = None
|
||||
with httpx.Client(
|
||||
headers={"User-Agent": _USER_AGENT},
|
||||
follow_redirects=True,
|
||||
timeout=20.0,
|
||||
) as client:
|
||||
while len(items) < max_items:
|
||||
try:
|
||||
page, after = _reddit_fetch_page(client, listing_type, after)
|
||||
except httpx.HTTPStatusError as exc:
|
||||
# Reddit blocks unauthenticated pagination after the first page;
|
||||
# save what we have rather than crashing.
|
||||
print(f" stopped at {len(items)} {listing_type} (HTTP {exc.response.status_code})")
|
||||
break
|
||||
if not page:
|
||||
break
|
||||
items.extend(page)
|
||||
print(f" fetched {len(items)} {listing_type}...")
|
||||
if not after:
|
||||
break
|
||||
time.sleep(1.0) # respect rate limit
|
||||
return items
|
||||
|
||||
|
||||
def gather_reddit() -> None:
|
||||
print("Fetching Reddit history for u/pyr0ball...")
|
||||
|
||||
# Posts (submitted)
|
||||
print(" Posts:")
|
||||
posts = _reddit_fetch_all("submitted")
|
||||
post_texts: list[str] = []
|
||||
for p in posts:
|
||||
body = _clean(p.get("selftext", "") or "")
|
||||
title = _clean(p.get("title", ""))
|
||||
if len(body) >= _MIN_LENGTH:
|
||||
post_texts.append(f"{title}\n\n{body}")
|
||||
elif len(title) >= 20:
|
||||
# Title-only posts (link posts) — include title as micro-sample
|
||||
post_texts.append(title)
|
||||
_write_corpus_file("social_post_reddit.txt", post_texts, "reddit/submitted")
|
||||
|
||||
# Comments
|
||||
print(" Comments:")
|
||||
comments = _reddit_fetch_all("comments")
|
||||
comment_texts: list[str] = []
|
||||
for c in comments:
|
||||
body = _clean(c.get("body", "") or "")
|
||||
if body and body not in ("[deleted]", "[removed]"):
|
||||
comment_texts.append(body)
|
||||
_write_corpus_file("social_reply_reddit_comments.txt", comment_texts, "reddit/comments")
|
||||
|
||||
print(f" Done. {len(posts)} posts, {len(comments)} comments fetched.")
|
||||
|
||||
|
||||
# ------------------------------------------------------------------ #
|
||||
# Campaign copy source (claude-bridge)
|
||||
# ------------------------------------------------------------------ #
|
||||
|
||||
def _extract_body_from_campaign(py_file: Path) -> str | None:
|
||||
"""
|
||||
Parse a campaign Python file and extract the BODY string literal.
|
||||
Uses AST to handle multi-line strings safely.
|
||||
"""
|
||||
try:
|
||||
tree = ast.parse(py_file.read_text(encoding="utf-8"))
|
||||
for node in ast.walk(tree):
|
||||
if isinstance(node, ast.Assign):
|
||||
for target in node.targets:
|
||||
if isinstance(target, ast.Name) and target.id == "BODY":
|
||||
if isinstance(node.value, ast.Constant):
|
||||
return str(node.value.value)
|
||||
except (SyntaxError, UnicodeDecodeError):
|
||||
pass
|
||||
return None
|
||||
|
||||
|
||||
def gather_campaigns() -> None:
|
||||
campaigns_dir = _CLAUDE_BRIDGE / "reddit-poster" / "campaigns"
|
||||
if not campaigns_dir.exists():
|
||||
print(f" [skip] campaigns dir not found: {campaigns_dir}")
|
||||
return
|
||||
|
||||
print("Gathering campaign copy from claude-bridge...")
|
||||
samples: list[str] = []
|
||||
for py_file in sorted(campaigns_dir.glob("*.py")):
|
||||
body = _extract_body_from_campaign(py_file)
|
||||
if body:
|
||||
samples.append(body.strip())
|
||||
print(f" {py_file.name} — {len(body)} chars")
|
||||
|
||||
_write_corpus_file("narrative_campaign_copy.txt", samples, "claude-bridge/campaigns")
|
||||
|
||||
|
||||
# ------------------------------------------------------------------ #
|
||||
# Documents source
|
||||
# ------------------------------------------------------------------ #
|
||||
|
||||
def gather_documents() -> None:
|
||||
print("Gathering local Documents...")
|
||||
samples: list[str] = []
|
||||
|
||||
# brainmap — personal planning/thinking notes
|
||||
brainmap = _DOCUMENTS / "brainmap_v1.md"
|
||||
if brainmap.exists():
|
||||
text = _clean(brainmap.read_text(encoding="utf-8"))
|
||||
if len(text) >= _MIN_LENGTH:
|
||||
samples.append(text)
|
||||
print(f" brainmap_v1.md — {len(text)} chars")
|
||||
|
||||
# HomeProjects handoff notes — casual technical prose
|
||||
for handoff in sorted((_DOCUMENTS / "HomeProjects").glob("handoff*.md")):
|
||||
text = _clean(handoff.read_text(encoding="utf-8", errors="replace"))
|
||||
if len(text) >= _MIN_LENGTH:
|
||||
samples.append(text)
|
||||
print(f" {handoff.name} — {len(text)} chars")
|
||||
|
||||
# Personal letters (Closet folder) — intimate prose voice
|
||||
closet = _DOCUMENTS / "Closet"
|
||||
if closet.exists():
|
||||
for letter in closet.glob("*.md"):
|
||||
text = _clean(letter.read_text(encoding="utf-8", errors="replace"))
|
||||
if len(text) >= _MIN_LENGTH and not _is_ai_generated(text):
|
||||
samples.append(text)
|
||||
print(f" {letter.name} — {len(text)} chars")
|
||||
|
||||
_write_corpus_file("narrative_personal_docs.txt", samples, "documents")
|
||||
|
||||
|
||||
# ------------------------------------------------------------------ #
|
||||
# Discord export source
|
||||
# ------------------------------------------------------------------ #
|
||||
|
||||
def gather_discord(export_zip: Path) -> None:
|
||||
"""
|
||||
Process a Discord data export zip (from Settings → Privacy & Safety → Request all my data).
|
||||
|
||||
Expected zip structure:
|
||||
messages/
|
||||
c{channel_id}/
|
||||
messages.json -- list of {ID, Timestamp, Contents, Attachments}
|
||||
account/
|
||||
user.json -- {username, ...}
|
||||
"""
|
||||
print(f"Processing Discord export: {export_zip}")
|
||||
samples: list[str] = []
|
||||
message_count = 0
|
||||
|
||||
with zipfile.ZipFile(export_zip) as zf:
|
||||
# Find all messages.json files
|
||||
message_files = [n for n in zf.namelist() if n.endswith("/messages.json")]
|
||||
print(f" Found {len(message_files)} channel(s)")
|
||||
|
||||
for mf in message_files:
|
||||
try:
|
||||
data = json.loads(zf.read(mf))
|
||||
except (json.JSONDecodeError, KeyError):
|
||||
continue
|
||||
|
||||
for msg in data:
|
||||
content = _clean(msg.get("Contents", "") or "")
|
||||
# Skip system messages, bot commands, very short messages
|
||||
if (
|
||||
len(content) < _MIN_LENGTH
|
||||
or content.startswith("/")
|
||||
or content.startswith("!")
|
||||
or _is_ai_generated(content)
|
||||
):
|
||||
continue
|
||||
# Skip messages that are just URLs or attachments
|
||||
if re.match(r"^https?://\S+$", content):
|
||||
continue
|
||||
samples.append(content)
|
||||
message_count += 1
|
||||
|
||||
print(f" {message_count} messages → {len(samples)} passed filters")
|
||||
_write_corpus_file("social_reply_discord.txt", samples, "discord")
|
||||
|
||||
|
||||
# ------------------------------------------------------------------ #
|
||||
# Entrypoint
|
||||
# ------------------------------------------------------------------ #
|
||||
|
||||
def main() -> None:
|
||||
parser = argparse.ArgumentParser(description="Gather writing corpus for voice benchmark")
|
||||
parser.add_argument(
|
||||
"--source",
|
||||
choices=["reddit", "local", "all"],
|
||||
default="all",
|
||||
help="Which sources to gather (default: all)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--discord",
|
||||
type=Path,
|
||||
metavar="ZIP",
|
||||
help="Path to Discord data export zip",
|
||||
)
|
||||
args = parser.parse_args()
|
||||
|
||||
_CORPUS_DIR.mkdir(parents=True, exist_ok=True)
|
||||
print(f"Output: {_CORPUS_DIR}\n")
|
||||
|
||||
if args.source in ("reddit", "all"):
|
||||
gather_reddit()
|
||||
print()
|
||||
|
||||
if args.source in ("local", "all"):
|
||||
gather_campaigns()
|
||||
print()
|
||||
gather_documents()
|
||||
print()
|
||||
|
||||
if args.discord:
|
||||
if not args.discord.exists():
|
||||
print(f"Error: Discord export not found: {args.discord}")
|
||||
else:
|
||||
gather_discord(args.discord)
|
||||
print()
|
||||
|
||||
if not args.discord and args.source in ("local", "all"):
|
||||
print("Discord: manual step required")
|
||||
print(" 1. Discord Settings → Privacy & Safety → Request all my data")
|
||||
print(" 2. Download the zip from the email link")
|
||||
print(" 3. Run: python scripts/gather_corpus.py --discord /path/to/package.zip")
|
||||
print()
|
||||
|
||||
# Summary
|
||||
corpus_files = sorted(_CORPUS_DIR.glob("*.txt"))
|
||||
total_chars = sum(f.stat().st_size for f in corpus_files)
|
||||
print(f"Corpus: {len(corpus_files)} file(s), {total_chars // 1024}KB total")
|
||||
for f in corpus_files:
|
||||
print(f" {f.name}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
|
|
@ -122,17 +122,88 @@ def test_lookup_returns_correct_shape(client):
|
|||
assert data["already_queued"] is False
|
||||
|
||||
|
||||
def test_lookup_unknown_pipeline_tag_returns_null_adapter(client):
|
||||
"""An unrecognised pipeline_tag yields adapter_recommendation=null."""
|
||||
def test_lookup_unknown_pipeline_tag_returns_null_adapter_and_incompatible(client):
|
||||
"""An unrecognised pipeline_tag yields adapter_recommendation=null and compatible=False."""
|
||||
mock_resp = MagicMock()
|
||||
mock_resp.status_code = 200
|
||||
mock_resp.json.return_value = _make_hf_response("org/m", "audio-classification")
|
||||
mock_resp.json.return_value = _make_hf_response("org/m", "reinforcement-learning")
|
||||
|
||||
with patch("app.models.httpx.get", return_value=mock_resp):
|
||||
r = client.get("/api/models/lookup", params={"repo_id": "org/m"})
|
||||
|
||||
assert r.status_code == 200
|
||||
assert r.json()["adapter_recommendation"] is None
|
||||
data = r.json()
|
||||
assert data["adapter_recommendation"] is None
|
||||
assert data["compatible"] is False
|
||||
assert data["role"] is None
|
||||
assert data["service"] is None
|
||||
assert "CircuitForge model ecosystem" in data["warning"]
|
||||
|
||||
|
||||
def test_lookup_stt_tag_returns_compatible_with_cf_stt_service(client):
|
||||
"""automatic-speech-recognition tag yields compatible=True, service=cf-stt."""
|
||||
mock_resp = MagicMock()
|
||||
mock_resp.status_code = 200
|
||||
mock_resp.json.return_value = _make_hf_response("openai/whisper-base", "automatic-speech-recognition")
|
||||
|
||||
with patch("app.models.httpx.get", return_value=mock_resp):
|
||||
r = client.get("/api/models/lookup", params={"repo_id": "openai/whisper-base"})
|
||||
|
||||
assert r.status_code == 200
|
||||
data = r.json()
|
||||
assert data["compatible"] is True
|
||||
assert data["adapter_recommendation"] is None
|
||||
assert data["role"] == "stt"
|
||||
assert data["service"] == "cf-stt"
|
||||
assert data["warning"] is None
|
||||
|
||||
|
||||
def test_lookup_vision_tag_returns_compatible_with_cf_vision_service(client):
|
||||
"""image-classification tag yields compatible=True, service=cf-vision."""
|
||||
mock_resp = MagicMock()
|
||||
mock_resp.status_code = 200
|
||||
mock_resp.json.return_value = _make_hf_response("google/siglip-base", "image-classification")
|
||||
|
||||
with patch("app.models.httpx.get", return_value=mock_resp):
|
||||
r = client.get("/api/models/lookup", params={"repo_id": "google/siglip-base"})
|
||||
|
||||
assert r.status_code == 200
|
||||
data = r.json()
|
||||
assert data["compatible"] is True
|
||||
assert data["role"] == "vision"
|
||||
assert data["service"] == "cf-vision"
|
||||
|
||||
|
||||
def test_lookup_audio_classification_tag_returns_cf_voice_service(client):
|
||||
"""audio-classification tag yields compatible=True, service=cf-voice."""
|
||||
mock_resp = MagicMock()
|
||||
mock_resp.status_code = 200
|
||||
mock_resp.json.return_value = _make_hf_response("org/audio-model", "audio-classification")
|
||||
|
||||
with patch("app.models.httpx.get", return_value=mock_resp):
|
||||
r = client.get("/api/models/lookup", params={"repo_id": "org/audio-model"})
|
||||
|
||||
assert r.status_code == 200
|
||||
data = r.json()
|
||||
assert data["compatible"] is True
|
||||
assert data["role"] == "classifier"
|
||||
assert data["service"] == "cf-voice"
|
||||
|
||||
|
||||
def test_lookup_embedding_tag_returns_compatible_with_cf_core_service(client):
|
||||
"""feature-extraction tag yields compatible=True, service=cf-core."""
|
||||
mock_resp = MagicMock()
|
||||
mock_resp.status_code = 200
|
||||
mock_resp.json.return_value = _make_hf_response("BAAI/bge-small-en", "feature-extraction")
|
||||
|
||||
with patch("app.models.httpx.get", return_value=mock_resp):
|
||||
r = client.get("/api/models/lookup", params={"repo_id": "BAAI/bge-small-en"})
|
||||
|
||||
assert r.status_code == 200
|
||||
data = r.json()
|
||||
assert data["compatible"] is True
|
||||
assert data["role"] == "embedding"
|
||||
assert data["service"] == "cf-core"
|
||||
|
||||
|
||||
def test_lookup_already_queued_flag(client):
|
||||
|
|
@ -181,6 +252,26 @@ def test_queue_add_returns_entry_fields(client):
|
|||
assert entry["adapter_recommendation"] == "ZeroShotAdapter"
|
||||
|
||||
|
||||
def test_queue_preserves_role_and_service(client):
|
||||
"""POST /queue with role/service fields round-trips them through GET /queue."""
|
||||
r = client.post("/api/models/queue", json={
|
||||
"repo_id": "openai/whisper-base",
|
||||
"pipeline_tag": "automatic-speech-recognition",
|
||||
"adapter_recommendation": None,
|
||||
"role": "stt",
|
||||
"service": "cf-stt",
|
||||
})
|
||||
assert r.status_code == 201
|
||||
entry = r.json()
|
||||
assert entry["role"] == "stt"
|
||||
assert entry["service"] == "cf-stt"
|
||||
|
||||
r2 = client.get("/api/models/queue")
|
||||
items = r2.json()
|
||||
assert items[0]["role"] == "stt"
|
||||
assert items[0]["service"] == "cf-stt"
|
||||
|
||||
|
||||
# ── POST /queue — 409 duplicate ────────────────────────────────────────────────
|
||||
|
||||
def test_queue_duplicate_returns_409(client):
|
||||
|
|
@ -317,7 +408,12 @@ def test_installed_detects_downloaded_model(client, tmp_path):
|
|||
model_dir.mkdir()
|
||||
(model_dir / "config.json").write_text(json.dumps({"model_type": "bert"}), encoding="utf-8")
|
||||
(model_dir / "model_info.json").write_text(
|
||||
json.dumps({"repo_id": "org/mymodel", "adapter_recommendation": "ZeroShotAdapter"}),
|
||||
json.dumps({
|
||||
"repo_id": "org/mymodel",
|
||||
"adapter_recommendation": "ZeroShotAdapter",
|
||||
"role": "classifier",
|
||||
"service": "avocet",
|
||||
}),
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
|
|
@ -329,6 +425,51 @@ def test_installed_detects_downloaded_model(client, tmp_path):
|
|||
assert items[0]["name"] == "org--mymodel"
|
||||
assert items[0]["adapter"] == "ZeroShotAdapter"
|
||||
assert items[0]["model_id"] == "org/mymodel"
|
||||
assert items[0]["role"] == "classifier"
|
||||
assert items[0]["service"] == "avocet"
|
||||
|
||||
|
||||
def test_installed_stt_model_surfaces_role_and_service(client):
|
||||
"""A downloaded STT model's role/service are returned by GET /installed."""
|
||||
from app import models as models_module
|
||||
|
||||
model_dir = models_module._MODELS_DIR / "openai--whisper-base"
|
||||
model_dir.mkdir()
|
||||
(model_dir / "config.json").write_text(json.dumps({"model_type": "whisper"}), encoding="utf-8")
|
||||
(model_dir / "model_info.json").write_text(
|
||||
json.dumps({
|
||||
"repo_id": "openai/whisper-base",
|
||||
"adapter_recommendation": None,
|
||||
"role": "stt",
|
||||
"service": "cf-stt",
|
||||
}),
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
r = client.get("/api/models/installed")
|
||||
assert r.status_code == 200
|
||||
items = r.json()
|
||||
assert items[0]["role"] == "stt"
|
||||
assert items[0]["service"] == "cf-stt"
|
||||
assert items[0]["adapter"] is None
|
||||
|
||||
|
||||
def test_installed_finetuned_model_defaults_to_avocet_service(client):
|
||||
"""Fine-tuned models with no role/service in training_info default to avocet/classifier."""
|
||||
from app import models as models_module
|
||||
|
||||
model_dir = models_module._MODELS_DIR / "my-finetuned-v2"
|
||||
model_dir.mkdir()
|
||||
(model_dir / "training_info.json").write_text(
|
||||
json.dumps({"base_model": "microsoft/deberta-v3-base", "epochs": 3}),
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
r = client.get("/api/models/installed")
|
||||
assert r.status_code == 200
|
||||
items = r.json()
|
||||
assert items[0]["role"] == "classifier"
|
||||
assert items[0]["service"] == "avocet"
|
||||
|
||||
|
||||
def test_installed_detects_finetuned_model(client):
|
||||
|
|
|
|||
4
web/.gitignore
vendored
4
web/.gitignore
vendored
|
|
@ -22,3 +22,7 @@ dist-ssr
|
|||
*.njsproj
|
||||
*.sln
|
||||
*.sw?
|
||||
|
||||
# Local environment overrides
|
||||
.env
|
||||
|
||||
|
|
|
|||
File diff suppressed because it is too large
Load diff
1026
web/src/views/ClassifierTab.vue
Normal file
1026
web/src/views/ClassifierTab.vue
Normal file
File diff suppressed because it is too large
Load diff
708
web/src/views/CompareTab.vue
Normal file
708
web/src/views/CompareTab.vue
Normal file
|
|
@ -0,0 +1,708 @@
|
|||
<template>
|
||||
<div class="compare-tab">
|
||||
|
||||
<!-- Source toggle -->
|
||||
<div class="source-toggle" role="group" aria-label="Prompt source">
|
||||
<button class="source-btn" :class="{ active: promptSource === 'tasks' }" @click="promptSource = 'tasks'">
|
||||
📋 cf-orch Tasks
|
||||
</button>
|
||||
<button class="source-btn" :class="{ active: promptSource === 'style' }" @click="promptSource = 'style'">
|
||||
✍️ Writing Style Prompts
|
||||
</button>
|
||||
</div>
|
||||
|
||||
<!-- Task selector (cf-orch tasks) -->
|
||||
<details v-if="promptSource === 'tasks'" class="model-picker" open>
|
||||
<summary class="picker-summary">
|
||||
<span class="picker-title">📋 Pick a Task</span>
|
||||
<span class="picker-badge">{{ cmpSelectedTask ? cmpSelectedTask.name : 'None selected' }}</span>
|
||||
</summary>
|
||||
<div class="picker-body">
|
||||
<div v-if="llmTasksLoading" class="picker-loading">Loading tasks…</div>
|
||||
<div v-else-if="llmTasks.length === 0" class="picker-empty">No tasks found — check cforch config.</div>
|
||||
<template v-else>
|
||||
<div v-for="(tasks, type) in llmTasksByType" :key="type" class="picker-category">
|
||||
<span class="picker-cat-name picker-cat-section">{{ type }}</span>
|
||||
<div class="picker-model-list">
|
||||
<label v-for="t in tasks" :key="t.id" class="picker-model-row">
|
||||
<input
|
||||
type="radio"
|
||||
name="cmp-task"
|
||||
:checked="cmpSelectedTask?.id === t.id"
|
||||
@change="selectCmpTask(t)"
|
||||
/>
|
||||
<span class="picker-model-name" :title="t.name">{{ t.name }}</span>
|
||||
</label>
|
||||
</div>
|
||||
</div>
|
||||
</template>
|
||||
</div>
|
||||
</details>
|
||||
|
||||
<!-- Writing style prompt selector -->
|
||||
<details v-if="promptSource === 'style'" class="model-picker" open>
|
||||
<summary class="picker-summary">
|
||||
<span class="picker-title">✍️ Pick a Writing Style Prompt</span>
|
||||
<span class="picker-badge">{{ selectedVoicePrompt ? selectedVoicePrompt.tag : 'None selected' }}</span>
|
||||
</summary>
|
||||
<div class="picker-body">
|
||||
<div class="picker-model-list style-prompt-list">
|
||||
<label v-for="vp in STYLE_PROMPTS" :key="vp.tag" class="picker-model-row style-prompt-row">
|
||||
<input
|
||||
type="radio"
|
||||
name="cmp-style-prompt"
|
||||
:checked="selectedVoicePrompt?.tag === vp.tag"
|
||||
@change="selectVoicePrompt(vp)"
|
||||
/>
|
||||
<span class="style-prompt-tag">{{ vp.tag }}</span>
|
||||
<span class="style-prompt-title">{{ vp.thread_title }}</span>
|
||||
</label>
|
||||
</div>
|
||||
</div>
|
||||
</details>
|
||||
|
||||
<!-- Prompt editor + model picker (shown once a prompt source is ready) -->
|
||||
<template v-if="promptSource === 'tasks' ? !!cmpSelectedTask : !!selectedVoicePrompt">
|
||||
<label class="prompt-label" for="cmp-prompt">Prompt</label>
|
||||
<textarea
|
||||
id="cmp-prompt"
|
||||
class="cmp-prompt-editor"
|
||||
v-model="cmpPrompt"
|
||||
rows="6"
|
||||
/>
|
||||
|
||||
<!-- Ollama model picker -->
|
||||
<details class="model-picker" open>
|
||||
<summary class="picker-summary">
|
||||
<span class="picker-title">🤖 Ollama Models</span>
|
||||
<span class="picker-badge">{{ cmpSelectedModels.size }} / {{ ollamaLlmModels.length }}</span>
|
||||
</summary>
|
||||
<div class="picker-body">
|
||||
<label class="picker-cat-header">
|
||||
<input
|
||||
type="checkbox"
|
||||
:checked="cmpSelectedModels.size === ollamaLlmModels.length"
|
||||
:indeterminate="cmpSelectedModels.size > 0 && cmpSelectedModels.size < ollamaLlmModels.length"
|
||||
@change="toggleAllCmpModels(($event.target as HTMLInputElement).checked)"
|
||||
/>
|
||||
<span class="picker-cat-name">All ollama models</span>
|
||||
</label>
|
||||
<div class="picker-model-list">
|
||||
<label v-for="m in ollamaLlmModels" :key="m.id" class="picker-model-row">
|
||||
<input
|
||||
type="checkbox"
|
||||
:checked="cmpSelectedModels.has(m.id)"
|
||||
@change="toggleCmpModel(m.id, ($event.target as HTMLInputElement).checked)"
|
||||
/>
|
||||
<span class="picker-model-name">{{ m.name }}</span>
|
||||
<span class="picker-adapter-type">{{ m.tags.slice(0, 3).join(', ') }}</span>
|
||||
</label>
|
||||
</div>
|
||||
</div>
|
||||
</details>
|
||||
|
||||
<!-- Run controls -->
|
||||
<div class="run-controls">
|
||||
<button
|
||||
class="btn-run"
|
||||
:disabled="cmpRunning || cmpSelectedModels.size === 0"
|
||||
@click="startCompare"
|
||||
>{{ cmpRunning ? '⏳ Running…' : '⚖️ Compare Models' }}</button>
|
||||
<button v-if="cmpRunning" class="btn-cancel" @click="cancelCompare">✕ Cancel</button>
|
||||
</div>
|
||||
|
||||
<!-- Progress log -->
|
||||
<div v-if="cmpLog.length > 0" class="run-log">
|
||||
<div class="log-lines">
|
||||
<div v-for="(line, i) in cmpLog" :key="i" class="log-line">{{ line }}</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- Side-by-side results -->
|
||||
<template v-if="cmpResults.length > 0">
|
||||
<h2 class="chart-title">Side-by-Side Responses</h2>
|
||||
<div class="cmp-results-grid">
|
||||
<div
|
||||
v-for="r in cmpResults"
|
||||
:key="r.model"
|
||||
class="cmp-result-card"
|
||||
:class="{ 'cmp-error': !!r.error }"
|
||||
>
|
||||
<div class="cmp-result-header">
|
||||
<span class="cmp-model-name">{{ r.model }}</span>
|
||||
<span class="cmp-meta">
|
||||
<template v-if="r.error"><span class="err-badge">error</span></template>
|
||||
<template v-else>{{ (r.elapsed_ms / 1000).toFixed(1) }}s</template>
|
||||
</span>
|
||||
</div>
|
||||
<pre v-if="r.error" class="cmp-error-text">{{ r.error }}</pre>
|
||||
<pre v-else class="cmp-response">{{ r.response }}</pre>
|
||||
</div>
|
||||
</div>
|
||||
</template>
|
||||
</template>
|
||||
|
||||
</div>
|
||||
</template>
|
||||
|
||||
<script setup lang="ts">
|
||||
import { ref, computed, onMounted } from 'vue'
|
||||
import { useApiFetch } from '../composables/useApi'
|
||||
|
||||
// ── Types ───────────────────────────────────────────────────────────────────
|
||||
interface CfOrchTask {
|
||||
id: string
|
||||
name: string
|
||||
type: string
|
||||
prompt: string
|
||||
system: string
|
||||
}
|
||||
|
||||
interface CfOrchModel {
|
||||
name: string
|
||||
id: string
|
||||
service: string
|
||||
tags: string[]
|
||||
vram_estimate_mb?: number
|
||||
}
|
||||
|
||||
interface CmpResult {
|
||||
model: string
|
||||
response: string
|
||||
elapsed_ms: number
|
||||
error: string | null
|
||||
}
|
||||
|
||||
interface VoicePrompt {
|
||||
tag: string
|
||||
thread_title: string
|
||||
thread_body: string
|
||||
}
|
||||
|
||||
// ── Writing style prompts (mirrors TEST_PROMPTS in benchmark_style.py) ──────
|
||||
const STYLE_SYSTEM = "You are a writing assistant. Your job is to write a Reddit reply that matches the user's voice — casual, direct, community-first. No em dashes. No filler phrases. No semicolons. Short punchy sentences."
|
||||
|
||||
const STYLE_PROMPTS: VoicePrompt[] = [
|
||||
{
|
||||
tag: 'selfhosted_ai_fatigue',
|
||||
thread_title: "Anyone else getting tired of re-explaining their setup every time an AI model forgets?",
|
||||
thread_body: "Every session I start over. My whole hardware setup, what tools I use, what I've already tried. It's exhausting. There has to be a better way.",
|
||||
},
|
||||
{
|
||||
tag: 'privacy_local_llm',
|
||||
thread_title: "What's the point of running local LLMs if the apps still phone home?",
|
||||
thread_body: "I went through all the trouble of setting up ollama and now I find out the frontend I'm using is sending telemetry. Kind of defeats the purpose.",
|
||||
},
|
||||
{
|
||||
tag: 'solarpunk_tech',
|
||||
thread_title: "What does solarpunk computing actually look like in practice?",
|
||||
thread_body: "I keep seeing the aesthetic but not a lot of concrete examples of people living it out with their tech choices. What does it mean day to day?",
|
||||
},
|
||||
{
|
||||
tag: 'nd_tools',
|
||||
thread_title: "Tools that actually help with executive function vs ones that just add friction",
|
||||
thread_body: "I've tried a dozen productivity apps and most of them require more executive function to maintain than they save. What actually sticks for you?",
|
||||
},
|
||||
{
|
||||
tag: 'data_ownership',
|
||||
thread_title: "Who actually owns your data when you use a 'free' AI tool?",
|
||||
thread_body: "Read the ToS on three different AI assistants today. In all three cases your inputs can be used for training, shared with partners, and retained indefinitely. Is this just accepted now?",
|
||||
},
|
||||
{
|
||||
tag: 'digital_culture',
|
||||
thread_title: "The internet used to feel like it belonged to everyone. What happened?",
|
||||
thread_body: "I grew up on forums, IRC, personal homepages. Now everything is a platform owned by someone trying to extract value from the community that built it.",
|
||||
},
|
||||
]
|
||||
|
||||
// ── State ───────────────────────────────────────────────────────────────────
|
||||
const llmTasks = ref<CfOrchTask[]>([])
|
||||
const llmTasksLoading = ref(false)
|
||||
const llmModels = ref<CfOrchModel[]>([])
|
||||
|
||||
const promptSource = ref<'tasks' | 'style'>('tasks')
|
||||
const cmpSelectedTask = ref<CfOrchTask | null>(null)
|
||||
const selectedVoicePrompt = ref<VoicePrompt | null>(null)
|
||||
const cmpSystemPrompt = ref('')
|
||||
const cmpPrompt = ref('')
|
||||
const cmpSelectedModels = ref<Set<string>>(new Set())
|
||||
const cmpRunning = ref(false)
|
||||
const cmpLog = ref<string[]>([])
|
||||
const cmpResults = ref<CmpResult[]>([])
|
||||
const cmpEventSource = ref<EventSource | null>(null)
|
||||
|
||||
// ── Computed ────────────────────────────────────────────────────────────────
|
||||
const ollamaLlmModels = computed(() =>
|
||||
llmModels.value.filter(m => m.service === 'ollama')
|
||||
)
|
||||
|
||||
const llmTasksByType = computed((): Record<string, CfOrchTask[]> => {
|
||||
const groups: Record<string, CfOrchTask[]> = {}
|
||||
for (const t of llmTasks.value) {
|
||||
if (!groups[t.type]) groups[t.type] = []
|
||||
groups[t.type].push(t)
|
||||
}
|
||||
return groups
|
||||
})
|
||||
|
||||
// ── Helpers ─────────────────────────────────────────────────────────────────
|
||||
function selectCmpTask(t: CfOrchTask) {
|
||||
cmpSelectedTask.value = t
|
||||
cmpPrompt.value = t.prompt || ''
|
||||
cmpSystemPrompt.value = t.system || ''
|
||||
cmpResults.value = []
|
||||
cmpLog.value = []
|
||||
}
|
||||
|
||||
function selectVoicePrompt(vp: VoicePrompt) {
|
||||
selectedVoicePrompt.value = vp
|
||||
cmpPrompt.value = `Thread: ${vp.thread_title}\n\n${vp.thread_body}\n\nWrite a reply:`
|
||||
cmpSystemPrompt.value = STYLE_SYSTEM
|
||||
cmpResults.value = []
|
||||
cmpLog.value = []
|
||||
}
|
||||
|
||||
function toggleCmpModel(id: string, checked: boolean) {
|
||||
const next = new Set(cmpSelectedModels.value)
|
||||
checked ? next.add(id) : next.delete(id)
|
||||
cmpSelectedModels.value = next
|
||||
}
|
||||
|
||||
function toggleAllCmpModels(checked: boolean) {
|
||||
cmpSelectedModels.value = checked
|
||||
? new Set(ollamaLlmModels.value.map(m => m.id))
|
||||
: new Set()
|
||||
}
|
||||
|
||||
// ── Data loaders ──────────────────────────────────────────────────────────────
|
||||
async function loadLlmTasks() {
|
||||
llmTasksLoading.value = true
|
||||
const { data } = await useApiFetch<{ tasks: CfOrchTask[]; types: string[] }>('/api/cforch/tasks')
|
||||
llmTasksLoading.value = false
|
||||
if (data?.tasks) {
|
||||
llmTasks.value = data.tasks
|
||||
}
|
||||
}
|
||||
|
||||
async function loadLlmModels() {
|
||||
const { data } = await useApiFetch<{ models: CfOrchModel[] }>('/api/cforch/models')
|
||||
if (data?.models) {
|
||||
llmModels.value = data.models
|
||||
// Pre-select all ollama models
|
||||
cmpSelectedModels.value = new Set(
|
||||
data.models.filter(m => m.service === 'ollama').map(m => m.id)
|
||||
)
|
||||
}
|
||||
}
|
||||
|
||||
// ── Run / cancel ──────────────────────────────────────────────────────────────
|
||||
function startCompare() {
|
||||
if (!cmpPrompt.value.trim() || cmpSelectedModels.value.size === 0) return
|
||||
cmpRunning.value = true
|
||||
cmpResults.value = []
|
||||
cmpLog.value = []
|
||||
|
||||
const params = new URLSearchParams({
|
||||
prompt: cmpPrompt.value,
|
||||
model_ids: [...cmpSelectedModels.value].join(','),
|
||||
system: cmpSystemPrompt.value,
|
||||
})
|
||||
|
||||
const es = new EventSource(`/api/imitate/run?${params}`)
|
||||
cmpEventSource.value = es
|
||||
|
||||
es.onmessage = (event: MessageEvent) => {
|
||||
try {
|
||||
const msg = JSON.parse(event.data)
|
||||
if (msg.type === 'start') {
|
||||
cmpLog.value.push(`Comparing ${msg.total_models} models…`)
|
||||
} else if (msg.type === 'model_start') {
|
||||
cmpLog.value.push(`→ ${msg.model}…`)
|
||||
} else if (msg.type === 'model_done') {
|
||||
const status = msg.error
|
||||
? `✕ ${msg.error}`
|
||||
: `✓ ${(msg.elapsed_ms / 1000).toFixed(1)}s`
|
||||
cmpLog.value.push(` ${msg.model}: ${status}`)
|
||||
cmpResults.value.push({
|
||||
model: msg.model,
|
||||
response: msg.response,
|
||||
elapsed_ms: msg.elapsed_ms,
|
||||
error: msg.error ?? null,
|
||||
})
|
||||
} else if (msg.type === 'complete') {
|
||||
cmpRunning.value = false
|
||||
es.close()
|
||||
}
|
||||
} catch { /* ignore malformed frames */ }
|
||||
}
|
||||
|
||||
es.onerror = () => {
|
||||
cmpLog.value.push('Connection error.')
|
||||
cmpRunning.value = false
|
||||
es.close()
|
||||
cmpEventSource.value = null
|
||||
}
|
||||
}
|
||||
|
||||
function cancelCompare() {
|
||||
cmpEventSource.value?.close()
|
||||
cmpEventSource.value = null
|
||||
cmpRunning.value = false
|
||||
cmpLog.value.push('Cancelled.')
|
||||
}
|
||||
|
||||
onMounted(() => {
|
||||
loadLlmTasks()
|
||||
loadLlmModels()
|
||||
})
|
||||
</script>
|
||||
|
||||
<style scoped>
|
||||
.compare-tab {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 1.75rem;
|
||||
}
|
||||
|
||||
/* ── Source toggle ──────────────────────────────────────── */
|
||||
.source-toggle {
|
||||
display: inline-flex;
|
||||
border: 1px solid var(--color-border, #d0d7e8);
|
||||
border-radius: 0.5rem;
|
||||
overflow: hidden;
|
||||
align-self: flex-start;
|
||||
}
|
||||
|
||||
.source-btn {
|
||||
padding: 0.4rem 1rem;
|
||||
font-size: 0.83rem;
|
||||
font-family: var(--font-body, sans-serif);
|
||||
font-weight: 500;
|
||||
border: none;
|
||||
background: var(--color-surface, #fff);
|
||||
color: var(--color-text-secondary, #6b7a99);
|
||||
cursor: pointer;
|
||||
transition: background 0.15s, color 0.15s;
|
||||
}
|
||||
.source-btn:not(:last-child) { border-right: 1px solid var(--color-border, #d0d7e8); }
|
||||
.source-btn.active { background: var(--app-primary, #2A6080); color: #fff; }
|
||||
.source-btn:not(.active):hover { background: var(--color-surface-raised, #e4ebf5); }
|
||||
|
||||
/* ── Voice prompt list ──────────────────────────────────── */
|
||||
.style-prompt-list { flex-direction: column !important; flex-wrap: nowrap !important; padding-left: 0 !important; gap: 0.4rem !important; }
|
||||
|
||||
.style-prompt-row {
|
||||
flex-direction: column !important;
|
||||
align-items: flex-start !important;
|
||||
gap: 0.15rem !important;
|
||||
padding: 0.5rem 0.6rem;
|
||||
border: 1px solid var(--color-border, #d0d7e8);
|
||||
border-radius: 0.35rem;
|
||||
background: var(--color-surface, #f4f7fc);
|
||||
cursor: pointer;
|
||||
transition: background 0.1s;
|
||||
}
|
||||
.style-prompt-row:hover { background: var(--color-surface-raised, #e4ebf5); }
|
||||
.style-prompt-row:has(input:checked) {
|
||||
background: color-mix(in srgb, var(--app-primary, #2A6080) 10%, transparent);
|
||||
border-color: var(--app-primary, #2A6080);
|
||||
}
|
||||
.style-prompt-row input { display: none; }
|
||||
|
||||
.style-prompt-tag {
|
||||
font-family: var(--font-mono, monospace);
|
||||
font-size: 0.72rem;
|
||||
color: var(--app-primary, #2A6080);
|
||||
font-weight: 700;
|
||||
text-transform: uppercase;
|
||||
letter-spacing: 0.04em;
|
||||
}
|
||||
|
||||
.style-prompt-title {
|
||||
font-size: 0.83rem;
|
||||
color: var(--color-text, #1a2338);
|
||||
line-height: 1.4;
|
||||
}
|
||||
|
||||
/* ── Buttons ────────────────────────────────────────────── */
|
||||
.btn-run {
|
||||
padding: 0.45rem 1.1rem;
|
||||
border-radius: 0.375rem;
|
||||
border: none;
|
||||
background: var(--app-primary, #2A6080);
|
||||
color: #fff;
|
||||
font-size: 0.88rem;
|
||||
font-family: var(--font-body, sans-serif);
|
||||
cursor: pointer;
|
||||
transition: opacity 0.15s;
|
||||
}
|
||||
.btn-run:disabled { opacity: 0.5; cursor: not-allowed; }
|
||||
.btn-run:not(:disabled):hover { opacity: 0.85; }
|
||||
|
||||
.btn-cancel {
|
||||
padding: 0.45rem 0.9rem;
|
||||
background: transparent;
|
||||
border: 1px solid var(--color-text-secondary, #6b7a99);
|
||||
color: var(--color-text-secondary, #6b7a99);
|
||||
border-radius: 0.4rem;
|
||||
font-size: 0.85rem;
|
||||
font-weight: 500;
|
||||
cursor: pointer;
|
||||
transition: background 0.15s;
|
||||
}
|
||||
.btn-cancel:hover {
|
||||
background: color-mix(in srgb, var(--color-text-secondary, #6b7a99) 12%, transparent);
|
||||
}
|
||||
|
||||
/* ── Run controls row ───────────────────────────────────── */
|
||||
.run-controls {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 0.75rem;
|
||||
flex-wrap: wrap;
|
||||
}
|
||||
|
||||
/* ── Run log ────────────────────────────────────────────── */
|
||||
.run-log {
|
||||
border: 1px solid var(--color-border, #d0d7e8);
|
||||
border-radius: 0.5rem;
|
||||
overflow: hidden;
|
||||
font-family: var(--font-mono, monospace);
|
||||
font-size: 0.78rem;
|
||||
}
|
||||
|
||||
.log-lines {
|
||||
max-height: 160px;
|
||||
overflow-y: auto;
|
||||
padding: 0.5rem 0.75rem;
|
||||
background: var(--color-surface, #fff);
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 0.1rem;
|
||||
}
|
||||
|
||||
.log-line { color: var(--color-text, #1a2338); line-height: 1.5; }
|
||||
|
||||
/* ── Chart title ────────────────────────────────────────── */
|
||||
.chart-title {
|
||||
font-size: 0.95rem;
|
||||
font-weight: 600;
|
||||
color: var(--color-text, #1a2338);
|
||||
margin: 0;
|
||||
}
|
||||
|
||||
/* ── Model Picker ───────────────────────────────────────── */
|
||||
.model-picker {
|
||||
border: 1px solid var(--color-border, #d0d7e8);
|
||||
border-radius: 0.5rem;
|
||||
overflow: hidden;
|
||||
}
|
||||
|
||||
.picker-summary {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 0.6rem;
|
||||
padding: 0.65rem 0.9rem;
|
||||
cursor: pointer;
|
||||
user-select: none;
|
||||
list-style: none;
|
||||
background: var(--color-surface-raised, #e4ebf5);
|
||||
}
|
||||
.picker-summary::-webkit-details-marker { display: none; }
|
||||
.picker-summary::before { content: '▶ '; font-size: 0.65rem; color: var(--color-text-secondary, #6b7a99); }
|
||||
details[open] .picker-summary::before { content: '▼ '; }
|
||||
|
||||
.picker-title {
|
||||
font-size: 0.9rem;
|
||||
font-weight: 600;
|
||||
color: var(--color-text, #1a2338);
|
||||
}
|
||||
|
||||
.picker-badge {
|
||||
font-size: 0.75rem;
|
||||
color: var(--color-text-secondary, #6b7a99);
|
||||
background: var(--color-surface, #fff);
|
||||
border: 1px solid var(--color-border, #d0d7e8);
|
||||
padding: 0.15rem 0.5rem;
|
||||
border-radius: 1rem;
|
||||
font-family: var(--font-mono, monospace);
|
||||
margin-left: auto;
|
||||
}
|
||||
|
||||
.picker-body {
|
||||
padding: 0.75rem;
|
||||
border-top: 1px solid var(--color-border, #d0d7e8);
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 0.75rem;
|
||||
}
|
||||
|
||||
.picker-loading, .picker-empty {
|
||||
font-size: 0.85rem;
|
||||
color: var(--color-text-secondary, #6b7a99);
|
||||
padding: 0.5rem 0;
|
||||
}
|
||||
|
||||
.picker-category {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 0.3rem;
|
||||
}
|
||||
|
||||
.picker-cat-header {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 0.45rem;
|
||||
font-size: 0.82rem;
|
||||
font-weight: 700;
|
||||
color: var(--color-text, #1a2338);
|
||||
text-transform: uppercase;
|
||||
letter-spacing: 0.04em;
|
||||
cursor: pointer;
|
||||
}
|
||||
|
||||
.picker-cat-name { /* inherits from cat-header or section */ }
|
||||
|
||||
.picker-cat-section {
|
||||
font-weight: 600;
|
||||
font-size: 0.82rem;
|
||||
padding: 0.35rem 0;
|
||||
display: block;
|
||||
color: var(--color-text, #1a2338);
|
||||
}
|
||||
|
||||
.picker-model-list {
|
||||
display: flex;
|
||||
flex-wrap: wrap;
|
||||
gap: 0.35rem 0.75rem;
|
||||
padding-left: 1.4rem;
|
||||
}
|
||||
|
||||
.picker-model-row {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 0.35rem;
|
||||
font-size: 0.82rem;
|
||||
cursor: pointer;
|
||||
color: var(--color-text, #1a2338);
|
||||
}
|
||||
|
||||
.picker-model-name {
|
||||
font-family: var(--font-mono, monospace);
|
||||
font-size: 0.78rem;
|
||||
white-space: nowrap;
|
||||
max-width: 18ch;
|
||||
overflow: hidden;
|
||||
text-overflow: ellipsis;
|
||||
}
|
||||
|
||||
.picker-adapter-type {
|
||||
font-size: 0.68rem;
|
||||
color: var(--color-text-secondary, #6b7a99);
|
||||
background: var(--color-surface-raised, #e4ebf5);
|
||||
border: 1px solid var(--color-border, #d0d7e8);
|
||||
border-radius: 0.25rem;
|
||||
padding: 0.05rem 0.3rem;
|
||||
font-family: var(--font-mono, monospace);
|
||||
}
|
||||
|
||||
/* ── Prompt editor ──────────────────────────────────────── */
|
||||
.prompt-label {
|
||||
font-size: 0.85rem;
|
||||
font-weight: 600;
|
||||
color: var(--color-text-secondary, #6b7a99);
|
||||
margin-top: 0.5rem;
|
||||
}
|
||||
|
||||
.cmp-prompt-editor {
|
||||
width: 100%;
|
||||
font-family: var(--font-mono, monospace);
|
||||
font-size: 0.85rem;
|
||||
padding: 0.75rem;
|
||||
border: 1px solid var(--color-border, #d0d7e8);
|
||||
border-radius: 0.375rem;
|
||||
background: var(--color-surface, #f0f4fc);
|
||||
color: var(--color-text, #1a2338);
|
||||
resize: vertical;
|
||||
line-height: 1.5;
|
||||
box-sizing: border-box;
|
||||
}
|
||||
.cmp-prompt-editor:focus {
|
||||
outline: 2px solid var(--app-primary, #2A6080);
|
||||
outline-offset: -1px;
|
||||
}
|
||||
|
||||
/* ── Results grid ───────────────────────────────────────── */
|
||||
.cmp-results-grid {
|
||||
display: grid;
|
||||
grid-template-columns: repeat(auto-fill, minmax(280px, 1fr));
|
||||
gap: 1rem;
|
||||
margin-top: 0.5rem;
|
||||
}
|
||||
|
||||
.cmp-result-card {
|
||||
border: 1px solid var(--color-border, #d0d7e8);
|
||||
border-radius: 0.5rem;
|
||||
overflow: hidden;
|
||||
background: var(--color-surface, #f0f4fc);
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
}
|
||||
|
||||
.cmp-result-card.cmp-error {
|
||||
border-color: #fca5a5;
|
||||
}
|
||||
|
||||
.cmp-result-header {
|
||||
display: flex;
|
||||
justify-content: space-between;
|
||||
align-items: center;
|
||||
padding: 0.5rem 0.75rem;
|
||||
background: var(--color-surface-raised, #e4ebf5);
|
||||
border-bottom: 1px solid var(--color-border, #d0d7e8);
|
||||
}
|
||||
|
||||
.cmp-model-name {
|
||||
font-size: 0.82rem;
|
||||
font-weight: 600;
|
||||
color: var(--color-text, #1a2338);
|
||||
overflow: hidden;
|
||||
text-overflow: ellipsis;
|
||||
white-space: nowrap;
|
||||
}
|
||||
|
||||
.cmp-meta {
|
||||
font-size: 0.75rem;
|
||||
color: var(--color-text-secondary, #6b7a99);
|
||||
flex-shrink: 0;
|
||||
margin-left: 0.5rem;
|
||||
}
|
||||
|
||||
.err-badge {
|
||||
background: #fee2e2;
|
||||
color: #991b1b;
|
||||
border-radius: 9999px;
|
||||
padding: 0.1rem 0.45rem;
|
||||
font-size: 0.7rem;
|
||||
font-weight: 600;
|
||||
}
|
||||
|
||||
.cmp-response, .cmp-error-text {
|
||||
padding: 0.75rem;
|
||||
font-size: 0.82rem;
|
||||
white-space: pre-wrap;
|
||||
word-break: break-word;
|
||||
max-height: 300px;
|
||||
overflow-y: auto;
|
||||
margin: 0;
|
||||
flex: 1;
|
||||
color: var(--color-text, #1a2338);
|
||||
}
|
||||
|
||||
.cmp-error-text { color: #b91c1c; }
|
||||
|
||||
@media (max-width: 600px) {
|
||||
.picker-model-list { padding-left: 0; }
|
||||
.picker-model-name { max-width: 14ch; }
|
||||
}
|
||||
</style>
|
||||
|
|
@ -49,12 +49,30 @@
|
|||
<div v-if="sampleLoading" class="picker-loading">Fetching sample from API…</div>
|
||||
|
||||
<template v-else-if="rawSample">
|
||||
<!-- Fetched text preview -->
|
||||
<details class="sample-preview" open>
|
||||
<!-- Listing image thumbnail (Snipe vision samples) -->
|
||||
<div v-if="imageUrl" class="sample-image-row">
|
||||
<img :src="imageUrl" class="sample-image-thumb" alt="Listing photo" @error="imageUrl = ''" />
|
||||
<span class="image-badge">📷 image will be sent to vision models</span>
|
||||
</div>
|
||||
|
||||
<!-- Fetched text preview (hidden when prompt_template is {input_text} with no text_fields) -->
|
||||
<details v-if="rawSample.text" class="sample-preview" open>
|
||||
<summary class="sample-preview-toggle">Raw sample text</summary>
|
||||
<pre class="sample-text">{{ rawSample.text }}</pre>
|
||||
</details>
|
||||
|
||||
<!-- System context (shown only when the product provides one) -->
|
||||
<template v-if="systemPrompt">
|
||||
<details class="sample-preview">
|
||||
<summary class="sample-preview-toggle">System context <span class="system-badge">sent separately to model</span></summary>
|
||||
<textarea
|
||||
class="prompt-editor system-editor"
|
||||
v-model="systemPrompt"
|
||||
rows="4"
|
||||
/>
|
||||
</details>
|
||||
</template>
|
||||
|
||||
<!-- Prompt editor -->
|
||||
<label class="prompt-label" for="prompt-editor">Prompt sent to models</label>
|
||||
<textarea
|
||||
|
|
@ -112,6 +130,42 @@
|
|||
</div>
|
||||
</details>
|
||||
|
||||
<!-- cf-text model picker (live catalog from cf-orch) -->
|
||||
<details class="model-picker">
|
||||
<summary class="picker-summary">
|
||||
<span class="picker-title">⚡ cf-text Models <span class="cforch-badge">via cf-orch</span></span>
|
||||
<span class="picker-badge">{{ selectedCfTextModels.size }} / {{ cfTextCatalog.length }}</span>
|
||||
</summary>
|
||||
<div class="picker-body">
|
||||
<div v-if="catalogLoading" class="picker-loading">Loading catalog from cf-orch…</div>
|
||||
<div v-else-if="cfTextCatalog.length === 0" class="picker-empty">
|
||||
No cf-text models available — check cf-orch coordinator is running.
|
||||
</div>
|
||||
<template v-else>
|
||||
<label class="picker-cat-header">
|
||||
<input
|
||||
type="checkbox"
|
||||
:checked="selectedCfTextModels.size === cfTextCatalog.length"
|
||||
:indeterminate="selectedCfTextModels.size > 0 && selectedCfTextModels.size < cfTextCatalog.length"
|
||||
@change="toggleAllCfText(($event.target as HTMLInputElement).checked)"
|
||||
/>
|
||||
<span class="picker-cat-name">All cf-text models</span>
|
||||
</label>
|
||||
<div class="picker-model-list">
|
||||
<label v-for="m in cfTextCatalog" :key="m.id" class="picker-model-row">
|
||||
<input
|
||||
type="checkbox"
|
||||
:checked="selectedCfTextModels.has(m.id)"
|
||||
@change="toggleCfText(m.id, ($event.target as HTMLInputElement).checked)"
|
||||
/>
|
||||
<span class="picker-model-name" :title="m.description || m.id">{{ m.id }}</span>
|
||||
<span v-if="m.vram_mb" class="tag">{{ Math.round(m.vram_mb / 1024 * 10) / 10 }}GB</span>
|
||||
</label>
|
||||
</div>
|
||||
</template>
|
||||
</div>
|
||||
</details>
|
||||
|
||||
<!-- Temperature -->
|
||||
<div class="temp-row">
|
||||
<label for="temp-slider" class="temp-label">Temperature: <strong>{{ temperature.toFixed(1) }}</strong></label>
|
||||
|
|
@ -128,7 +182,7 @@
|
|||
<div class="run-row">
|
||||
<button
|
||||
class="btn-run"
|
||||
:disabled="running || selectedModels.size === 0"
|
||||
:disabled="running || (selectedModels.size === 0 && selectedCfTextModels.size === 0)"
|
||||
@click="startRun"
|
||||
>
|
||||
{{ running ? '⏳ Running…' : '▶ Run' }}
|
||||
|
|
@ -204,6 +258,8 @@ interface Sample {
|
|||
sample_index: number
|
||||
text: string
|
||||
prompt: string
|
||||
system_prompt: string
|
||||
image_url: string
|
||||
raw_item: Record<string, unknown>
|
||||
}
|
||||
|
||||
|
|
@ -215,6 +271,12 @@ interface ModelEntry {
|
|||
vram_estimate_mb: number
|
||||
}
|
||||
|
||||
interface CatalogEntry {
|
||||
id: string
|
||||
vram_mb: number
|
||||
description: string
|
||||
}
|
||||
|
||||
interface RunResult {
|
||||
model: string
|
||||
response: string
|
||||
|
|
@ -232,11 +294,17 @@ const sampleLoading = ref(false)
|
|||
const sampleError = ref<string | null>(null)
|
||||
const rawSample = ref<Sample | null>(null)
|
||||
const editedPrompt = ref('')
|
||||
const systemPrompt = ref('')
|
||||
const imageUrl = ref('')
|
||||
|
||||
const modelsLoading = ref(false)
|
||||
const allModels = ref<ModelEntry[]>([])
|
||||
const selectedModels = ref<Set<string>>(new Set())
|
||||
|
||||
const catalogLoading = ref(false)
|
||||
const cfTextCatalog = ref<CatalogEntry[]>([])
|
||||
const selectedCfTextModels = ref<Set<string>>(new Set())
|
||||
|
||||
const temperature = ref(0.7)
|
||||
|
||||
const running = ref(false)
|
||||
|
|
@ -261,7 +329,7 @@ const successfulResults = computed(() =>
|
|||
// ── Lifecycle ─────────────────────────────────────────────────────────────────
|
||||
|
||||
onMounted(async () => {
|
||||
await Promise.all([loadProducts(), loadModels()])
|
||||
await Promise.all([loadProducts(), loadModels(), loadCfTextCatalog()])
|
||||
})
|
||||
|
||||
// ── Methods ────────────────────────────────────────────────────────────────────
|
||||
|
|
@ -298,10 +366,38 @@ async function loadModels() {
|
|||
}
|
||||
}
|
||||
|
||||
async function loadCfTextCatalog() {
|
||||
catalogLoading.value = true
|
||||
try {
|
||||
const resp = await fetch('/api/imitate/catalog')
|
||||
if (!resp.ok) throw new Error(`HTTP ${resp.status}`)
|
||||
const data = await resp.json()
|
||||
cfTextCatalog.value = data.models ?? []
|
||||
} catch {
|
||||
cfTextCatalog.value = []
|
||||
} finally {
|
||||
catalogLoading.value = false
|
||||
}
|
||||
}
|
||||
|
||||
function toggleCfText(id: string, checked: boolean) {
|
||||
const next = new Set(selectedCfTextModels.value)
|
||||
checked ? next.add(id) : next.delete(id)
|
||||
selectedCfTextModels.value = next
|
||||
}
|
||||
|
||||
function toggleAllCfText(checked: boolean) {
|
||||
selectedCfTextModels.value = checked
|
||||
? new Set(cfTextCatalog.value.map(m => m.id))
|
||||
: new Set()
|
||||
}
|
||||
|
||||
async function selectProduct(p: Product) {
|
||||
selectedProduct.value = p
|
||||
rawSample.value = null
|
||||
editedPrompt.value = ''
|
||||
systemPrompt.value = ''
|
||||
imageUrl.value = ''
|
||||
sampleError.value = null
|
||||
results.value = []
|
||||
runLog.value = []
|
||||
|
|
@ -321,6 +417,8 @@ async function fetchSample() {
|
|||
const data: Sample = await resp.json()
|
||||
rawSample.value = data
|
||||
editedPrompt.value = data.prompt
|
||||
systemPrompt.value = data.system_prompt ?? ''
|
||||
imageUrl.value = data.image_url ?? ''
|
||||
} catch (err: unknown) {
|
||||
sampleError.value = err instanceof Error ? err.message : String(err)
|
||||
} finally {
|
||||
|
|
@ -341,7 +439,8 @@ function toggleAllModels(checked: boolean) {
|
|||
}
|
||||
|
||||
function startRun() {
|
||||
if (running.value || !editedPrompt.value.trim() || selectedModels.value.size === 0) return
|
||||
const hasModels = selectedModels.value.size > 0 || selectedCfTextModels.value.size > 0
|
||||
if (running.value || !editedPrompt.value.trim() || !hasModels) return
|
||||
|
||||
running.value = true
|
||||
results.value = []
|
||||
|
|
@ -349,10 +448,13 @@ function startRun() {
|
|||
correctionsPushMsg.value = null
|
||||
|
||||
const params = new URLSearchParams({
|
||||
prompt: editedPrompt.value,
|
||||
model_ids: [...selectedModels.value].join(','),
|
||||
temperature: temperature.value.toString(),
|
||||
product_id: selectedProduct.value?.id ?? '',
|
||||
prompt: editedPrompt.value,
|
||||
model_ids: [...selectedModels.value].join(','),
|
||||
cf_text_model_ids: [...selectedCfTextModels.value].join(','),
|
||||
temperature: temperature.value.toString(),
|
||||
product_id: selectedProduct.value?.id ?? '',
|
||||
system: systemPrompt.value,
|
||||
image_url: imageUrl.value,
|
||||
})
|
||||
|
||||
const es = new EventSource(`/api/imitate/run?${params}`)
|
||||
|
|
@ -362,9 +464,13 @@ function startRun() {
|
|||
try {
|
||||
const msg = JSON.parse(event.data)
|
||||
if (msg.type === 'start') {
|
||||
runLog.value.push(`Running ${msg.total_models} model(s)…`)
|
||||
const imgNote = msg.has_image ? ' (with image)' : ''
|
||||
runLog.value.push(`Running ${msg.total_models} model(s)${imgNote}…`)
|
||||
} else if (msg.type === 'model_start') {
|
||||
runLog.value.push(`→ ${msg.model}…`)
|
||||
const svc = msg.service === 'cf-text' ? ' [cf-text]' : ''
|
||||
runLog.value.push(`→ ${msg.model}${svc}…`)
|
||||
} else if (msg.type === 'model_coldstart') {
|
||||
runLog.value.push(` ⏳ ${msg.model}: cold start — waiting for service to load…`)
|
||||
} else if (msg.type === 'model_done') {
|
||||
const status = msg.error
|
||||
? `✕ error: ${msg.error}`
|
||||
|
|
@ -586,6 +692,46 @@ async function pushCorrections() {
|
|||
color: var(--color-text, #1a2338);
|
||||
}
|
||||
|
||||
.sample-image-row {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 0.75rem;
|
||||
flex-wrap: wrap;
|
||||
}
|
||||
|
||||
.sample-image-thumb {
|
||||
width: 120px;
|
||||
height: 90px;
|
||||
object-fit: cover;
|
||||
border-radius: 0.375rem;
|
||||
border: 1px solid var(--color-border, #d0d7e8);
|
||||
flex-shrink: 0;
|
||||
}
|
||||
|
||||
.image-badge {
|
||||
font-size: 0.78rem;
|
||||
color: var(--color-text-secondary, #6b7a99);
|
||||
}
|
||||
|
||||
.system-badge {
|
||||
font-size: 0.68rem;
|
||||
background: color-mix(in srgb, var(--app-primary, #2A6080) 15%, transparent);
|
||||
color: var(--app-primary, #2A6080);
|
||||
border-radius: 9999px;
|
||||
padding: 0.1rem 0.5rem;
|
||||
margin-left: 0.4rem;
|
||||
font-weight: 600;
|
||||
vertical-align: middle;
|
||||
}
|
||||
|
||||
.system-editor {
|
||||
border-top: 1px solid var(--color-border, #d0d7e8);
|
||||
border-radius: 0;
|
||||
border-left: none;
|
||||
border-right: none;
|
||||
border-bottom: none;
|
||||
}
|
||||
|
||||
.prompt-label {
|
||||
font-size: 0.85rem;
|
||||
font-weight: 600;
|
||||
|
|
@ -895,4 +1041,15 @@ async function pushCorrections() {
|
|||
|
||||
.msg-ok { color: #065f46; }
|
||||
.msg-err { color: #b91c1c; }
|
||||
|
||||
.cforch-badge {
|
||||
font-size: 0.68rem;
|
||||
background: color-mix(in srgb, var(--app-accent, #059669) 18%, transparent);
|
||||
color: var(--app-accent, #059669);
|
||||
border-radius: 9999px;
|
||||
padding: 0.1rem 0.5rem;
|
||||
margin-left: 0.4rem;
|
||||
font-weight: 600;
|
||||
vertical-align: middle;
|
||||
}
|
||||
</style>
|
||||
|
|
|
|||
715
web/src/views/LlmEvalTab.vue
Normal file
715
web/src/views/LlmEvalTab.vue
Normal file
|
|
@ -0,0 +1,715 @@
|
|||
<template>
|
||||
<div class="llm-eval-tab">
|
||||
|
||||
<!-- Task Selection -->
|
||||
<details class="model-picker" open>
|
||||
<summary class="picker-summary">
|
||||
<span class="picker-title">📋 Task Selection</span>
|
||||
<span class="picker-badge">{{ llmTaskBadge }}</span>
|
||||
</summary>
|
||||
<div class="picker-body">
|
||||
<div v-if="llmTasksLoading" class="picker-loading">Loading tasks…</div>
|
||||
<div v-else-if="Object.keys(llmTasksByType).length === 0" class="picker-empty">
|
||||
No tasks found — check API connection.
|
||||
</div>
|
||||
<template v-else>
|
||||
<div v-for="(tasks, type) in llmTasksByType" :key="type" class="picker-category">
|
||||
<label class="picker-cat-header">
|
||||
<input
|
||||
type="checkbox"
|
||||
:checked="isTaskTypeAllSelected(tasks)"
|
||||
:indeterminate="isTaskTypeIndeterminate(tasks)"
|
||||
@change="toggleTaskType(tasks, ($event.target as HTMLInputElement).checked)"
|
||||
/>
|
||||
<span class="picker-cat-name">{{ type }}</span>
|
||||
<span class="picker-cat-count">({{ tasks.length }})</span>
|
||||
</label>
|
||||
<div class="picker-model-list">
|
||||
<label v-for="t in tasks" :key="t.id" class="picker-model-row">
|
||||
<input
|
||||
type="checkbox"
|
||||
:checked="selectedLlmTasks.has(t.id)"
|
||||
@change="toggleLlmTask(t.id, ($event.target as HTMLInputElement).checked)"
|
||||
/>
|
||||
<span class="picker-model-name" :title="t.name">{{ t.name }}</span>
|
||||
</label>
|
||||
</div>
|
||||
</div>
|
||||
</template>
|
||||
</div>
|
||||
</details>
|
||||
|
||||
<!-- Model Selection -->
|
||||
<details class="model-picker" open>
|
||||
<summary class="picker-summary">
|
||||
<span class="picker-title">🎯 Model Selection</span>
|
||||
<span class="picker-badge">{{ llmModelBadge }}</span>
|
||||
</summary>
|
||||
<div class="picker-body">
|
||||
<div v-if="llmModelsLoading" class="picker-loading">Loading models…</div>
|
||||
<div v-else-if="Object.keys(llmModelsByService).length === 0" class="picker-empty">
|
||||
No models found — check cf-orch connection.
|
||||
</div>
|
||||
<template v-else>
|
||||
<div v-for="(models, service) in llmModelsByService" :key="service" class="picker-category">
|
||||
<label class="picker-cat-header">
|
||||
<input
|
||||
type="checkbox"
|
||||
:checked="isServiceAllSelected(models)"
|
||||
:indeterminate="isServiceIndeterminate(models)"
|
||||
@change="toggleService(models, ($event.target as HTMLInputElement).checked)"
|
||||
/>
|
||||
<span class="picker-cat-name">{{ service }}</span>
|
||||
<span class="picker-cat-count">({{ models.length }})</span>
|
||||
</label>
|
||||
<div class="picker-model-list">
|
||||
<label v-for="m in models" :key="m.id" class="picker-model-row">
|
||||
<input
|
||||
type="checkbox"
|
||||
:checked="selectedLlmModels.has(m.id)"
|
||||
@change="toggleLlmModel(m.id, ($event.target as HTMLInputElement).checked)"
|
||||
/>
|
||||
<span class="picker-model-name" :title="m.name">{{ m.name }}</span>
|
||||
<span class="picker-adapter-type" v-if="m.tags.length">{{ m.tags.join(', ') }}</span>
|
||||
</label>
|
||||
</div>
|
||||
</div>
|
||||
</template>
|
||||
</div>
|
||||
</details>
|
||||
|
||||
<!-- Run Controls -->
|
||||
<div class="run-controls">
|
||||
<button
|
||||
class="btn-run"
|
||||
:disabled="llmRunning || selectedLlmTasks.size === 0 || selectedLlmModels.size === 0"
|
||||
@click="startLlmBenchmark"
|
||||
>
|
||||
{{ llmRunning ? '⏳ Running…' : '▶ Run LLM Eval' }}
|
||||
</button>
|
||||
<button v-if="llmRunning" class="btn-cancel" @click="cancelLlmBenchmark">✕ Cancel</button>
|
||||
<span v-if="selectedLlmTasks.size === 0 || selectedLlmModels.size === 0" class="run-hint">
|
||||
Select at least one task and one model to run.
|
||||
</span>
|
||||
</div>
|
||||
|
||||
<!-- Progress log -->
|
||||
<div v-if="llmRunning || llmRunLog.length" class="run-log">
|
||||
<div class="run-log-title">
|
||||
<span>{{ llmRunning ? '⏳ Running LLM eval…' : llmError ? '❌ Failed' : '✅ Done' }}</span>
|
||||
<button class="btn-ghost" @click="llmRunLog = []; llmError = ''">Clear</button>
|
||||
</div>
|
||||
<div class="log-lines" ref="llmLogEl">
|
||||
<div
|
||||
v-for="(line, i) in llmRunLog"
|
||||
:key="i"
|
||||
class="log-line"
|
||||
:class="{ 'log-error': line.startsWith('ERROR') || line.startsWith('[error]') }"
|
||||
>{{ line }}</div>
|
||||
</div>
|
||||
<p v-if="llmError" class="run-error">{{ llmError }}</p>
|
||||
</div>
|
||||
|
||||
<!-- Results table -->
|
||||
<template v-if="llmResults.length > 0">
|
||||
<h2 class="chart-title">LLM Eval Results</h2>
|
||||
<div class="heatmap-scroll">
|
||||
<table class="heatmap llm-results-table">
|
||||
<thead>
|
||||
<tr>
|
||||
<th class="hm-label-col">Model</th>
|
||||
<th class="hm-model-col">overall</th>
|
||||
<th v-for="col in llmTaskTypeCols" :key="col" class="hm-model-col">{{ col }}</th>
|
||||
<th class="hm-model-col">tok/s</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr v-for="row in llmResults" :key="row.model_id">
|
||||
<td class="hm-label-cell llm-model-name-cell" :title="row.model_id">{{ row.model_name }}</td>
|
||||
<td
|
||||
class="hm-value-cell"
|
||||
:class="{ 'bt-best': llmBestByCol['overall'] === row.model_id }"
|
||||
>{{ pct(row.avg_quality_score) }}</td>
|
||||
<td
|
||||
v-for="col in llmTaskTypeCols"
|
||||
:key="col"
|
||||
class="hm-value-cell"
|
||||
:class="{ 'bt-best': llmBestByCol[col] === row.model_id }"
|
||||
>{{ row.quality_by_task_type[col] != null ? pct(row.quality_by_task_type[col]) : '—' }}</td>
|
||||
<td class="hm-value-cell llm-tps-cell">{{ row.avg_tokens_per_sec.toFixed(1) }}</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
<p class="heatmap-hint">Run LLM Eval to refresh. Green = best per column.</p>
|
||||
</template>
|
||||
|
||||
</div>
|
||||
</template>
|
||||
|
||||
<script setup lang="ts">
|
||||
import { ref, computed, onMounted, nextTick } from 'vue'
|
||||
import { useApiFetch } from '../composables/useApi'
|
||||
|
||||
// ── Types ───────────────────────────────────────────────────────────────────
|
||||
interface CfOrchTask {
|
||||
id: string
|
||||
name: string
|
||||
type: string
|
||||
prompt: string
|
||||
system: string
|
||||
}
|
||||
|
||||
interface CfOrchModel {
|
||||
name: string
|
||||
id: string
|
||||
service: string
|
||||
tags: string[]
|
||||
vram_estimate_mb?: number
|
||||
}
|
||||
|
||||
interface LlmModelResult {
|
||||
model_name: string
|
||||
model_id: string
|
||||
node_id: string
|
||||
avg_tokens_per_sec: number
|
||||
avg_completion_ms: number
|
||||
avg_quality_score: number
|
||||
finetune_candidates: number
|
||||
error_count: number
|
||||
quality_by_task_type: Record<string, number>
|
||||
}
|
||||
|
||||
// ── State ───────────────────────────────────────────────────────────────────
|
||||
const llmTasks = ref<CfOrchTask[]>([])
|
||||
const llmTasksLoading = ref(false)
|
||||
const llmModels = ref<CfOrchModel[]>([])
|
||||
const llmModelsLoading = ref(false)
|
||||
|
||||
const selectedLlmTasks = ref<Set<string>>(new Set())
|
||||
const selectedLlmModels = ref<Set<string>>(new Set())
|
||||
|
||||
const llmRunning = ref(false)
|
||||
const llmRunLog = ref<string[]>([])
|
||||
const llmError = ref('')
|
||||
const llmResults = ref<LlmModelResult[]>([])
|
||||
const llmEventSource = ref<EventSource | null>(null)
|
||||
const llmLogEl = ref<HTMLElement | null>(null)
|
||||
|
||||
// ── Computed ────────────────────────────────────────────────────────────────
|
||||
const llmTasksByType = computed((): Record<string, CfOrchTask[]> => {
|
||||
const groups: Record<string, CfOrchTask[]> = {}
|
||||
for (const t of llmTasks.value) {
|
||||
if (!groups[t.type]) groups[t.type] = []
|
||||
groups[t.type].push(t)
|
||||
}
|
||||
return groups
|
||||
})
|
||||
|
||||
const llmModelsByService = computed((): Record<string, CfOrchModel[]> => {
|
||||
const groups: Record<string, CfOrchModel[]> = {}
|
||||
for (const m of llmModels.value) {
|
||||
if (!groups[m.service]) groups[m.service] = []
|
||||
groups[m.service].push(m)
|
||||
}
|
||||
return groups
|
||||
})
|
||||
|
||||
const llmTaskBadge = computed(() => {
|
||||
const total = llmTasks.value.length
|
||||
if (total === 0) return 'No tasks available'
|
||||
const sel = selectedLlmTasks.value.size
|
||||
if (sel === total) return `All tasks (${total})`
|
||||
return `${sel} of ${total} tasks selected`
|
||||
})
|
||||
|
||||
const llmModelBadge = computed(() => {
|
||||
const total = llmModels.value.length
|
||||
if (total === 0) return 'No models available'
|
||||
const sel = selectedLlmModels.value.size
|
||||
if (sel === total) return `All models (${total})`
|
||||
return `${sel} of ${total} selected`
|
||||
})
|
||||
|
||||
const llmTaskTypeCols = computed(() => {
|
||||
const types = new Set<string>()
|
||||
for (const r of llmResults.value) {
|
||||
for (const k of Object.keys(r.quality_by_task_type)) types.add(k)
|
||||
}
|
||||
return [...types].sort()
|
||||
})
|
||||
|
||||
const llmBestByCol = computed((): Record<string, string> => {
|
||||
const best: Record<string, string> = {}
|
||||
if (llmResults.value.length === 0) return best
|
||||
|
||||
let bestId = '', bestVal = -Infinity
|
||||
for (const r of llmResults.value) {
|
||||
if (r.avg_quality_score > bestVal) { bestVal = r.avg_quality_score; bestId = r.model_id }
|
||||
}
|
||||
best['overall'] = bestId
|
||||
|
||||
for (const col of llmTaskTypeCols.value) {
|
||||
bestId = ''; bestVal = -Infinity
|
||||
for (const r of llmResults.value) {
|
||||
const v = r.quality_by_task_type[col]
|
||||
if (v != null && v > bestVal) { bestVal = v; bestId = r.model_id }
|
||||
}
|
||||
best[col] = bestId
|
||||
}
|
||||
return best
|
||||
})
|
||||
|
||||
// ── Helpers ─────────────────────────────────────────────────────────────────
|
||||
function pct(v: number): string {
|
||||
return `${(v * 100).toFixed(1)}%`
|
||||
}
|
||||
|
||||
// Task picker helpers
|
||||
function isTaskTypeAllSelected(tasks: CfOrchTask[]): boolean {
|
||||
return tasks.length > 0 && tasks.every(t => selectedLlmTasks.value.has(t.id))
|
||||
}
|
||||
function isTaskTypeIndeterminate(tasks: CfOrchTask[]): boolean {
|
||||
const some = tasks.some(t => selectedLlmTasks.value.has(t.id))
|
||||
return some && !isTaskTypeAllSelected(tasks)
|
||||
}
|
||||
function toggleLlmTask(id: string, checked: boolean) {
|
||||
const next = new Set(selectedLlmTasks.value)
|
||||
checked ? next.add(id) : next.delete(id)
|
||||
selectedLlmTasks.value = next
|
||||
}
|
||||
function toggleTaskType(tasks: CfOrchTask[], checked: boolean) {
|
||||
const next = new Set(selectedLlmTasks.value)
|
||||
for (const t of tasks) {
|
||||
checked ? next.add(t.id) : next.delete(t.id)
|
||||
}
|
||||
selectedLlmTasks.value = next
|
||||
}
|
||||
|
||||
// Model picker helpers
|
||||
function isServiceAllSelected(models: CfOrchModel[]): boolean {
|
||||
return models.length > 0 && models.every(m => selectedLlmModels.value.has(m.id))
|
||||
}
|
||||
function isServiceIndeterminate(models: CfOrchModel[]): boolean {
|
||||
const some = models.some(m => selectedLlmModels.value.has(m.id))
|
||||
return some && !isServiceAllSelected(models)
|
||||
}
|
||||
function toggleLlmModel(id: string, checked: boolean) {
|
||||
const next = new Set(selectedLlmModels.value)
|
||||
checked ? next.add(id) : next.delete(id)
|
||||
selectedLlmModels.value = next
|
||||
}
|
||||
function toggleService(models: CfOrchModel[], checked: boolean) {
|
||||
const next = new Set(selectedLlmModels.value)
|
||||
for (const m of models) {
|
||||
checked ? next.add(m.id) : next.delete(m.id)
|
||||
}
|
||||
selectedLlmModels.value = next
|
||||
}
|
||||
|
||||
// ── Data loaders ─────────────────────────────────────────────────────────────
|
||||
async function loadLlmTasks() {
|
||||
llmTasksLoading.value = true
|
||||
const { data } = await useApiFetch<{ tasks: CfOrchTask[]; types: string[] }>('/api/cforch/tasks')
|
||||
llmTasksLoading.value = false
|
||||
if (data?.tasks) {
|
||||
llmTasks.value = data.tasks
|
||||
selectedLlmTasks.value = new Set(data.tasks.map(t => t.id))
|
||||
}
|
||||
}
|
||||
|
||||
async function loadLlmModels() {
|
||||
llmModelsLoading.value = true
|
||||
const { data } = await useApiFetch<{ models: CfOrchModel[] }>('/api/cforch/models')
|
||||
llmModelsLoading.value = false
|
||||
if (data?.models) {
|
||||
llmModels.value = data.models
|
||||
selectedLlmModels.value = new Set(data.models.map(m => m.id))
|
||||
}
|
||||
}
|
||||
|
||||
async function loadLlmResults() {
|
||||
const { data } = await useApiFetch<LlmModelResult[]>('/api/cforch/results')
|
||||
if (Array.isArray(data) && data.length > 0) {
|
||||
llmResults.value = data
|
||||
}
|
||||
}
|
||||
|
||||
// ── Run / cancel ──────────────────────────────────────────────────────────────
|
||||
function startLlmBenchmark() {
|
||||
llmRunning.value = true
|
||||
llmRunLog.value = []
|
||||
llmError.value = ''
|
||||
|
||||
const params = new URLSearchParams()
|
||||
const taskIds = [...selectedLlmTasks.value].join(',')
|
||||
if (taskIds) params.set('task_ids', taskIds)
|
||||
|
||||
const es = new EventSource(`/api/cforch/run?${params}`)
|
||||
llmEventSource.value = es
|
||||
|
||||
es.onmessage = async (e: MessageEvent) => {
|
||||
const msg = JSON.parse(e.data)
|
||||
if (msg.type === 'progress' && typeof msg.message === 'string') {
|
||||
llmRunLog.value.push(msg.message)
|
||||
await nextTick()
|
||||
llmLogEl.value?.scrollTo({ top: llmLogEl.value.scrollHeight, behavior: 'smooth' })
|
||||
} else if (msg.type === 'result' && Array.isArray(msg.summary)) {
|
||||
llmResults.value = msg.summary
|
||||
} else if (msg.type === 'complete') {
|
||||
llmRunning.value = false
|
||||
es.close()
|
||||
llmEventSource.value = null
|
||||
} else if (msg.type === 'error' && typeof msg.message === 'string') {
|
||||
llmError.value = msg.message
|
||||
llmRunning.value = false
|
||||
es.close()
|
||||
llmEventSource.value = null
|
||||
}
|
||||
}
|
||||
|
||||
es.onerror = () => {
|
||||
if (llmRunning.value) llmError.value = 'Connection lost'
|
||||
llmRunning.value = false
|
||||
es.close()
|
||||
llmEventSource.value = null
|
||||
}
|
||||
}
|
||||
|
||||
async function cancelLlmBenchmark() {
|
||||
llmEventSource.value?.close()
|
||||
llmEventSource.value = null
|
||||
llmRunning.value = false
|
||||
await fetch('/api/cforch/cancel', { method: 'POST' }).catch(() => {})
|
||||
}
|
||||
|
||||
onMounted(() => {
|
||||
loadLlmTasks()
|
||||
loadLlmModels()
|
||||
loadLlmResults()
|
||||
})
|
||||
</script>
|
||||
|
||||
<style scoped>
|
||||
.llm-eval-tab {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 1.75rem;
|
||||
}
|
||||
|
||||
/* ── Buttons ────────────────────────────────────────────── */
|
||||
.btn-run {
|
||||
padding: 0.45rem 1.1rem;
|
||||
border-radius: 0.375rem;
|
||||
border: none;
|
||||
background: var(--app-primary, #2A6080);
|
||||
color: #fff;
|
||||
font-size: 0.88rem;
|
||||
font-family: var(--font-body, sans-serif);
|
||||
cursor: pointer;
|
||||
transition: opacity 0.15s;
|
||||
}
|
||||
.btn-run:disabled { opacity: 0.5; cursor: not-allowed; }
|
||||
.btn-run:not(:disabled):hover { opacity: 0.85; }
|
||||
|
||||
.btn-cancel {
|
||||
padding: 0.45rem 0.9rem;
|
||||
background: transparent;
|
||||
border: 1px solid var(--color-text-secondary, #6b7a99);
|
||||
color: var(--color-text-secondary, #6b7a99);
|
||||
border-radius: 0.4rem;
|
||||
font-size: 0.85rem;
|
||||
font-weight: 500;
|
||||
cursor: pointer;
|
||||
transition: background 0.15s;
|
||||
}
|
||||
.btn-cancel:hover {
|
||||
background: color-mix(in srgb, var(--color-text-secondary, #6b7a99) 12%, transparent);
|
||||
}
|
||||
|
||||
.btn-ghost {
|
||||
background: none;
|
||||
border: none;
|
||||
color: var(--color-text-secondary, #6b7a99);
|
||||
cursor: pointer;
|
||||
font-size: 0.78rem;
|
||||
padding: 0.1rem 0.3rem;
|
||||
border-radius: 0.2rem;
|
||||
}
|
||||
.btn-ghost:hover { background: var(--color-border, #d0d7e8); }
|
||||
|
||||
/* ── Run controls row ───────────────────────────────────── */
|
||||
.run-controls {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 0.75rem;
|
||||
flex-wrap: wrap;
|
||||
}
|
||||
|
||||
.run-hint {
|
||||
font-size: 0.8rem;
|
||||
color: var(--color-text-secondary, #6b7a99);
|
||||
}
|
||||
|
||||
/* ── Run log ────────────────────────────────────────────── */
|
||||
.run-log {
|
||||
border: 1px solid var(--color-border, #d0d7e8);
|
||||
border-radius: 0.5rem;
|
||||
overflow: hidden;
|
||||
font-family: var(--font-mono, monospace);
|
||||
font-size: 0.78rem;
|
||||
}
|
||||
|
||||
.run-log-title {
|
||||
display: flex;
|
||||
justify-content: space-between;
|
||||
align-items: center;
|
||||
padding: 0.4rem 0.75rem;
|
||||
background: var(--color-surface-raised, #e4ebf5);
|
||||
border-bottom: 1px solid var(--color-border, #d0d7e8);
|
||||
font-size: 0.8rem;
|
||||
color: var(--color-text-secondary, #6b7a99);
|
||||
}
|
||||
|
||||
.log-lines {
|
||||
max-height: 200px;
|
||||
overflow-y: auto;
|
||||
padding: 0.5rem 0.75rem;
|
||||
background: var(--color-surface, #fff);
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 0.1rem;
|
||||
}
|
||||
|
||||
.log-line { color: var(--color-text, #1a2338); line-height: 1.5; }
|
||||
.log-line.log-error { color: var(--color-error, #ef4444); }
|
||||
|
||||
.run-error {
|
||||
margin: 0;
|
||||
padding: 0.4rem 0.75rem;
|
||||
background: color-mix(in srgb, var(--color-error, #ef4444) 10%, transparent);
|
||||
color: var(--color-error, #ef4444);
|
||||
font-size: 0.82rem;
|
||||
font-family: var(--font-mono, monospace);
|
||||
}
|
||||
|
||||
/* ── Chart title ────────────────────────────────────────── */
|
||||
.chart-title {
|
||||
font-size: 0.95rem;
|
||||
font-weight: 600;
|
||||
color: var(--color-text, #1a2338);
|
||||
margin: 0;
|
||||
}
|
||||
|
||||
/* ── Heatmap ────────────────────────────────────────────── */
|
||||
.heatmap-scroll {
|
||||
overflow-x: auto;
|
||||
border-radius: 0.5rem;
|
||||
border: 1px solid var(--color-border, #d0d7e8);
|
||||
}
|
||||
|
||||
.heatmap {
|
||||
border-collapse: collapse;
|
||||
min-width: 100%;
|
||||
font-size: 0.78rem;
|
||||
}
|
||||
|
||||
.hm-label-col {
|
||||
text-align: left;
|
||||
min-width: 11rem;
|
||||
padding: 0.4rem 0.6rem;
|
||||
background: var(--color-surface-raised, #e4ebf5);
|
||||
font-weight: 600;
|
||||
border-bottom: 1px solid var(--color-border, #d0d7e8);
|
||||
position: sticky;
|
||||
left: 0;
|
||||
}
|
||||
|
||||
.hm-model-col {
|
||||
min-width: 5rem;
|
||||
max-width: 8rem;
|
||||
padding: 0.4rem 0.5rem;
|
||||
background: var(--color-surface-raised, #e4ebf5);
|
||||
border-bottom: 1px solid var(--color-border, #d0d7e8);
|
||||
font-family: var(--font-mono, monospace);
|
||||
font-size: 0.7rem;
|
||||
text-overflow: ellipsis;
|
||||
overflow: hidden;
|
||||
white-space: nowrap;
|
||||
text-align: center;
|
||||
}
|
||||
|
||||
.hm-label-cell {
|
||||
padding: 0.35rem 0.6rem;
|
||||
background: var(--color-surface, #fff);
|
||||
border-top: 1px solid var(--color-border, #d0d7e8);
|
||||
white-space: nowrap;
|
||||
font-family: var(--font-mono, monospace);
|
||||
font-size: 0.74rem;
|
||||
position: sticky;
|
||||
left: 0;
|
||||
}
|
||||
|
||||
.hm-value-cell {
|
||||
padding: 0.35rem 0.5rem;
|
||||
text-align: center;
|
||||
font-family: var(--font-mono, monospace);
|
||||
font-variant-numeric: tabular-nums;
|
||||
border-top: 1px solid var(--color-border, #d0d7e8);
|
||||
cursor: default;
|
||||
}
|
||||
|
||||
.heatmap-hint {
|
||||
font-size: 0.75rem;
|
||||
color: var(--color-text-secondary, #6b7a99);
|
||||
margin: 0;
|
||||
}
|
||||
|
||||
/* LLM-specific table styles */
|
||||
.llm-results-table .bt-best {
|
||||
color: var(--color-success, #3a7a32);
|
||||
font-weight: 700;
|
||||
background: color-mix(in srgb, var(--color-success, #3a7a32) 8%, transparent);
|
||||
}
|
||||
|
||||
.llm-model-name-cell {
|
||||
font-family: var(--font-mono, monospace);
|
||||
font-size: 0.75rem;
|
||||
white-space: nowrap;
|
||||
max-width: 16rem;
|
||||
overflow: hidden;
|
||||
text-overflow: ellipsis;
|
||||
background: var(--color-surface, #fff);
|
||||
border-top: 1px solid var(--color-border, #d0d7e8);
|
||||
padding: 0.35rem 0.6rem;
|
||||
position: sticky;
|
||||
left: 0;
|
||||
}
|
||||
|
||||
.llm-tps-cell {
|
||||
font-family: var(--font-mono, monospace);
|
||||
font-variant-numeric: tabular-nums;
|
||||
white-space: nowrap;
|
||||
}
|
||||
|
||||
/* ── Model Picker ───────────────────────────────────────── */
|
||||
.model-picker {
|
||||
border: 1px solid var(--color-border, #d0d7e8);
|
||||
border-radius: 0.5rem;
|
||||
overflow: hidden;
|
||||
}
|
||||
|
||||
.picker-summary {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 0.6rem;
|
||||
padding: 0.65rem 0.9rem;
|
||||
cursor: pointer;
|
||||
user-select: none;
|
||||
list-style: none;
|
||||
background: var(--color-surface-raised, #e4ebf5);
|
||||
}
|
||||
.picker-summary::-webkit-details-marker { display: none; }
|
||||
.picker-summary::before { content: '▶ '; font-size: 0.65rem; color: var(--color-text-secondary, #6b7a99); }
|
||||
details[open] .picker-summary::before { content: '▼ '; }
|
||||
|
||||
.picker-title {
|
||||
font-size: 0.9rem;
|
||||
font-weight: 600;
|
||||
color: var(--color-text, #1a2338);
|
||||
}
|
||||
|
||||
.picker-badge {
|
||||
font-size: 0.75rem;
|
||||
color: var(--color-text-secondary, #6b7a99);
|
||||
background: var(--color-surface, #fff);
|
||||
border: 1px solid var(--color-border, #d0d7e8);
|
||||
padding: 0.15rem 0.5rem;
|
||||
border-radius: 1rem;
|
||||
font-family: var(--font-mono, monospace);
|
||||
margin-left: auto;
|
||||
}
|
||||
|
||||
.picker-body {
|
||||
padding: 0.75rem;
|
||||
border-top: 1px solid var(--color-border, #d0d7e8);
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 0.75rem;
|
||||
}
|
||||
|
||||
.picker-loading, .picker-empty {
|
||||
font-size: 0.85rem;
|
||||
color: var(--color-text-secondary, #6b7a99);
|
||||
padding: 0.5rem 0;
|
||||
}
|
||||
|
||||
.picker-category {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 0.3rem;
|
||||
}
|
||||
|
||||
.picker-cat-header {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 0.45rem;
|
||||
font-size: 0.82rem;
|
||||
font-weight: 700;
|
||||
color: var(--color-text, #1a2338);
|
||||
text-transform: uppercase;
|
||||
letter-spacing: 0.04em;
|
||||
cursor: pointer;
|
||||
}
|
||||
|
||||
.picker-cat-name { /* inherits from cat-header */ }
|
||||
|
||||
.picker-cat-count {
|
||||
font-weight: 400;
|
||||
color: var(--color-text-secondary, #6b7a99);
|
||||
font-family: var(--font-mono, monospace);
|
||||
font-size: 0.75rem;
|
||||
text-transform: none;
|
||||
letter-spacing: 0;
|
||||
}
|
||||
|
||||
.picker-model-list {
|
||||
display: flex;
|
||||
flex-wrap: wrap;
|
||||
gap: 0.35rem 0.75rem;
|
||||
padding-left: 1.4rem;
|
||||
}
|
||||
|
||||
.picker-model-row {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 0.35rem;
|
||||
font-size: 0.82rem;
|
||||
cursor: pointer;
|
||||
color: var(--color-text, #1a2338);
|
||||
}
|
||||
|
||||
.picker-model-name {
|
||||
font-family: var(--font-mono, monospace);
|
||||
font-size: 0.78rem;
|
||||
white-space: nowrap;
|
||||
max-width: 18ch;
|
||||
overflow: hidden;
|
||||
text-overflow: ellipsis;
|
||||
}
|
||||
|
||||
.picker-adapter-type {
|
||||
font-size: 0.68rem;
|
||||
color: var(--color-text-secondary, #6b7a99);
|
||||
background: var(--color-surface-raised, #e4ebf5);
|
||||
border: 1px solid var(--color-border, #d0d7e8);
|
||||
border-radius: 0.25rem;
|
||||
padding: 0.05rem 0.3rem;
|
||||
font-family: var(--font-mono, monospace);
|
||||
}
|
||||
|
||||
@media (max-width: 600px) {
|
||||
.picker-model-list { padding-left: 0; }
|
||||
.picker-model-name { max-width: 14ch; }
|
||||
}
|
||||
</style>
|
||||
|
|
@ -42,6 +42,12 @@
|
|||
<span v-if="lookupResult.pipeline_tag" class="chip chip-pipeline">
|
||||
{{ lookupResult.pipeline_tag }}
|
||||
</span>
|
||||
<span v-if="lookupResult.role" class="chip chip-role">
|
||||
{{ lookupResult.role }}
|
||||
</span>
|
||||
<span v-if="lookupResult.service" class="chip" :class="serviceChipClass(lookupResult.service)">
|
||||
{{ lookupResult.service }}
|
||||
</span>
|
||||
<span v-if="lookupResult.adapter_recommendation" class="chip chip-adapter">
|
||||
{{ lookupResult.adapter_recommendation }}
|
||||
</span>
|
||||
|
|
@ -61,11 +67,10 @@
|
|||
|
||||
<button
|
||||
class="btn-primary btn-add-queue"
|
||||
:class="{ 'btn-add-queue-warn': !lookupResult.compatible }"
|
||||
:disabled="lookupResult.already_installed || lookupResult.already_queued || addingToQueue"
|
||||
@click="addToQueue"
|
||||
>
|
||||
{{ addingToQueue ? 'Adding…' : lookupResult.compatible ? 'Add to queue' : 'Add anyway' }}
|
||||
{{ addingToQueue ? 'Adding…' : 'Add to queue' }}
|
||||
</button>
|
||||
</div>
|
||||
</section>
|
||||
|
|
@ -90,7 +95,9 @@
|
|||
</button>
|
||||
</div>
|
||||
<div class="model-meta">
|
||||
<span v-if="model.pipeline_tag" class="chip chip-pipeline">{{ model.pipeline_tag }}</span>
|
||||
<span v-if="model.pipeline_tag" class="chip chip-pipeline">{{ model.pipeline_tag }}</span>
|
||||
<span v-if="model.role" class="chip chip-role">{{ model.role }}</span>
|
||||
<span v-if="model.service" class="chip" :class="serviceChipClass(model.service)">{{ model.service }}</span>
|
||||
<span v-if="model.adapter_recommendation" class="chip chip-adapter">{{ model.adapter_recommendation }}</span>
|
||||
</div>
|
||||
<div class="model-card-actions">
|
||||
|
|
@ -116,6 +123,8 @@
|
|||
</div>
|
||||
<div class="model-meta">
|
||||
<span v-if="model.pipeline_tag" class="chip chip-pipeline">{{ model.pipeline_tag }}</span>
|
||||
<span v-if="model.role" class="chip chip-role">{{ model.role }}</span>
|
||||
<span v-if="model.service" class="chip" :class="serviceChipClass(model.service)">{{ model.service }}</span>
|
||||
</div>
|
||||
|
||||
<div v-if="downloadErrors[model.id]" class="download-error" role="alert">
|
||||
|
|
@ -124,14 +133,19 @@
|
|||
<div v-else class="progress-wrap" :aria-label="`Download progress for ${model.repo_id}`">
|
||||
<div
|
||||
class="progress-bar"
|
||||
:style="{ width: `${downloadProgress[model.id] ?? 0}%` }"
|
||||
:style="{ width: `${downloadProgress[model.repo_id]?.pct ?? 0}%` }"
|
||||
role="progressbar"
|
||||
:aria-valuenow="downloadProgress[model.id] ?? 0"
|
||||
:aria-valuenow="downloadProgress[model.repo_id]?.pct ?? 0"
|
||||
aria-valuemin="0"
|
||||
aria-valuemax="100"
|
||||
/>
|
||||
<span class="progress-label">
|
||||
{{ downloadProgress[model.id] == null ? 'Preparing…' : `${downloadProgress[model.id]}%` }}
|
||||
{{
|
||||
!downloadProgress[model.repo_id] ? 'Preparing…'
|
||||
: downloadProgress[model.repo_id].pct != null ? `${Math.round(downloadProgress[model.repo_id].pct!)}%`
|
||||
: downloadProgress[model.repo_id].bytes > 0 ? `${(downloadProgress[model.repo_id].bytes / 1024 / 1024).toFixed(0)} MB downloaded…`
|
||||
: 'Preparing…'
|
||||
}}
|
||||
</span>
|
||||
</div>
|
||||
</div>
|
||||
|
|
@ -145,42 +159,90 @@
|
|||
No models installed yet.
|
||||
</div>
|
||||
|
||||
<div v-else class="installed-table-wrap">
|
||||
<table class="installed-table">
|
||||
<thead>
|
||||
<tr>
|
||||
<th>Name</th>
|
||||
<th>Type</th>
|
||||
<th>Adapter</th>
|
||||
<th>Size</th>
|
||||
<th></th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr v-for="model in installedModels" :key="model.name">
|
||||
<td class="td-name">{{ model.name }}</td>
|
||||
<td>
|
||||
<span
|
||||
class="badge"
|
||||
:class="model.type === 'finetuned' ? 'badge-accent' : 'badge-info'"
|
||||
>
|
||||
{{ model.type }}
|
||||
</span>
|
||||
</td>
|
||||
<td>{{ model.adapter ?? '—' }}</td>
|
||||
<td>{{ humanBytes(model.size) }}</td>
|
||||
<td>
|
||||
<button
|
||||
class="btn-danger btn-sm"
|
||||
@click="deleteInstalled(model.name)"
|
||||
>
|
||||
Delete
|
||||
</button>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
<template v-else>
|
||||
<div
|
||||
v-for="group in installedByService"
|
||||
:key="group.service"
|
||||
class="installed-group"
|
||||
>
|
||||
<div class="installed-group-header">
|
||||
<span class="chip" :class="serviceChipClass(group.service)">
|
||||
{{ serviceLabel(group.service) }}
|
||||
</span>
|
||||
<span class="installed-group-count">{{ group.models.length }} model{{ group.models.length !== 1 ? 's' : '' }}</span>
|
||||
</div>
|
||||
|
||||
<div class="installed-table-wrap">
|
||||
<table class="installed-table">
|
||||
<thead>
|
||||
<tr>
|
||||
<th>Name</th>
|
||||
<th>Type</th>
|
||||
<th>Role</th>
|
||||
<th>Size</th>
|
||||
<th></th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr v-for="model in group.models" :key="model.name">
|
||||
<td class="td-name">{{ model.model_id ?? model.name }}</td>
|
||||
<td>
|
||||
<span
|
||||
class="badge"
|
||||
:class="model.type === 'finetuned' ? 'badge-accent' : 'badge-info'"
|
||||
>
|
||||
{{ model.type }}
|
||||
</span>
|
||||
</td>
|
||||
<td>
|
||||
<span v-if="model.role" class="chip chip-role chip-sm">{{ model.role }}</span>
|
||||
<span v-else>—</span>
|
||||
</td>
|
||||
<td>{{ humanBytes(model.size_bytes) }}</td>
|
||||
<td class="td-actions">
|
||||
<div v-if="!model.service" class="classify-row">
|
||||
<select
|
||||
class="classify-select"
|
||||
:value="classifyDraft[model.name]?.service ?? ''"
|
||||
@change="onServiceChange(model.name, ($event.target as HTMLSelectElement).value)"
|
||||
aria-label="Assign service"
|
||||
>
|
||||
<option value="" disabled>Service…</option>
|
||||
<option v-for="svc in CLASSIFIABLE_SERVICES" :key="svc.value" :value="svc.value">{{ svc.label }}</option>
|
||||
</select>
|
||||
<select
|
||||
class="classify-select"
|
||||
:value="classifyDraft[model.name]?.role ?? ''"
|
||||
:disabled="!classifyDraft[model.name]?.service"
|
||||
@change="(e) => setClassifyRole(model.name, (e.target as HTMLSelectElement).value)"
|
||||
aria-label="Assign role"
|
||||
>
|
||||
<option value="" disabled>Role…</option>
|
||||
<option
|
||||
v-for="role in rolesForService(classifyDraft[model.name]?.service ?? '')"
|
||||
:key="role"
|
||||
:value="role"
|
||||
>{{ role }}</option>
|
||||
</select>
|
||||
<button
|
||||
class="btn-primary btn-sm"
|
||||
:disabled="!classifyDraft[model.name]?.service || !classifyDraft[model.name]?.role"
|
||||
@click="saveClassify(model.name)"
|
||||
>Save</button>
|
||||
</div>
|
||||
<button
|
||||
class="btn-danger btn-sm"
|
||||
@click="deleteInstalled(model.name)"
|
||||
>
|
||||
Delete
|
||||
</button>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
</template>
|
||||
</section>
|
||||
</div>
|
||||
</template>
|
||||
|
|
@ -194,6 +256,8 @@ interface LookupResult {
|
|||
repo_id: string
|
||||
pipeline_tag: string | null
|
||||
adapter_recommendation: string | null
|
||||
role: string | null
|
||||
service: string | null
|
||||
compatible: boolean
|
||||
warning: string | null
|
||||
size: number | null
|
||||
|
|
@ -208,20 +272,27 @@ interface QueuedModel {
|
|||
status: 'pending' | 'downloading' | 'done' | 'error'
|
||||
pipeline_tag: string | null
|
||||
adapter_recommendation: string | null
|
||||
role: string | null
|
||||
service: string | null
|
||||
}
|
||||
|
||||
interface InstalledModel {
|
||||
name: string
|
||||
type: 'finetuned' | 'downloaded'
|
||||
adapter: string | null
|
||||
size: number
|
||||
role: string | null
|
||||
service: string | null
|
||||
size_bytes: number
|
||||
model_id: string | null
|
||||
}
|
||||
|
||||
interface SseProgressEvent {
|
||||
model_id: string
|
||||
pct: number | null
|
||||
status: 'progress' | 'done' | 'error'
|
||||
message?: string
|
||||
type: 'progress' | 'done' | 'error' | 'idle'
|
||||
repo_id?: string
|
||||
pct?: number
|
||||
downloaded_bytes?: number
|
||||
total_bytes?: number
|
||||
error?: string
|
||||
}
|
||||
|
||||
// ── State ─────────────────────────────────────────────
|
||||
|
|
@ -235,7 +306,8 @@ const addingToQueue = ref(false)
|
|||
const queuedModels = ref<QueuedModel[]>([])
|
||||
const installedModels = ref<InstalledModel[]>([])
|
||||
|
||||
const downloadProgress = ref<Record<string, number>>({})
|
||||
const downloadProgress = ref<Record<string, { pct: number | null; bytes: number }>>({})
|
||||
const classifyDraft = ref<Record<string, { service: string; role: string }>>({})
|
||||
const downloadErrors = ref<Record<string, string>>({})
|
||||
|
||||
let pollInterval: ReturnType<typeof setInterval> | null = null
|
||||
|
|
@ -251,8 +323,69 @@ const downloadingModels = computed(() =>
|
|||
queuedModels.value.filter(m => m.status === 'downloading')
|
||||
)
|
||||
|
||||
const SERVICE_ORDER = ['avocet', 'cf-text', 'cf-stt', 'cf-tts', 'cf-vision', 'cf-image', 'cf-core', 'cf-voice', 'other']
|
||||
|
||||
const CLASSIFIABLE_SERVICES = [
|
||||
{ value: 'avocet', label: 'Avocet — Email Classifiers' },
|
||||
{ value: 'cf-text', label: 'cf-text — Language Models' },
|
||||
{ value: 'cf-stt', label: 'cf-stt — Speech Recognition' },
|
||||
{ value: 'cf-tts', label: 'cf-tts — Text to Speech' },
|
||||
{ value: 'cf-vision', label: 'cf-vision — Vision / VLM' },
|
||||
{ value: 'cf-image', label: 'cf-image — Image Generation' },
|
||||
{ value: 'cf-core', label: 'cf-core — Embeddings' },
|
||||
{ value: 'cf-voice', label: 'cf-voice — Audio Classification' },
|
||||
]
|
||||
|
||||
const SERVICE_ROLES: Record<string, string[]> = {
|
||||
'avocet': ['classifier', 'reranker'],
|
||||
'cf-text': ['generator'],
|
||||
'cf-stt': ['stt', 'alm'],
|
||||
'cf-tts': ['tts'],
|
||||
'cf-vision': ['vision', 'vlm', 'embedding'],
|
||||
'cf-image': ['image-gen'],
|
||||
'cf-core': ['embedding'],
|
||||
'cf-voice': ['classifier'],
|
||||
}
|
||||
|
||||
function rolesForService(service: string): string[] {
|
||||
return SERVICE_ROLES[service] ?? []
|
||||
}
|
||||
|
||||
const installedByService = computed(() => {
|
||||
const grouped: Record<string, InstalledModel[]> = {}
|
||||
for (const model of installedModels.value) {
|
||||
const key = model.service ?? 'other'
|
||||
if (!grouped[key]) grouped[key] = []
|
||||
grouped[key].push(model)
|
||||
}
|
||||
// Return ordered sections: known services first, then anything else
|
||||
const keys = [...SERVICE_ORDER.filter(s => grouped[s]), ...Object.keys(grouped).filter(k => !SERVICE_ORDER.includes(k))]
|
||||
return keys.map(key => ({ service: key, models: grouped[key] }))
|
||||
})
|
||||
|
||||
// ── Helpers ───────────────────────────────────────────
|
||||
|
||||
const SERVICE_LABELS: Record<string, string> = {
|
||||
'avocet': 'Avocet — Email Classifiers',
|
||||
'cf-text': 'cf-text — Language Models',
|
||||
'cf-stt': 'cf-stt — Speech Recognition',
|
||||
'cf-tts': 'cf-tts — Text to Speech',
|
||||
'cf-vision': 'cf-vision — Vision / VLM',
|
||||
'cf-image': 'cf-image — Image Generation',
|
||||
'cf-core': 'cf-core — Embeddings',
|
||||
'cf-voice': 'cf-voice — Audio Classification',
|
||||
'other': 'Other — Unclassified',
|
||||
}
|
||||
|
||||
function serviceLabel(service: string): string {
|
||||
return SERVICE_LABELS[service] ?? service
|
||||
}
|
||||
|
||||
function serviceChipClass(service: string | null): string {
|
||||
if (!service) return 'chip-service-other'
|
||||
return `chip-service-${service.replace(/[^a-z0-9]/g, '-')}`
|
||||
}
|
||||
|
||||
function humanBytes(bytes: number | null): string {
|
||||
if (bytes == null) return '—'
|
||||
const units = ['B', 'KB', 'MB', 'GB', 'TB']
|
||||
|
|
@ -305,10 +438,11 @@ async function addToQueue() {
|
|||
if (!lookupResult.value) return
|
||||
addingToQueue.value = true
|
||||
try {
|
||||
const { repo_id, pipeline_tag, adapter_recommendation, role, service } = lookupResult.value
|
||||
const res = await fetch('/api/models/queue', {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({ repo_id: lookupResult.value.repo_id }),
|
||||
body: JSON.stringify({ repo_id, pipeline_tag, adapter_recommendation, role, service }),
|
||||
})
|
||||
if (res.ok) {
|
||||
lookupResult.value = { ...lookupResult.value, already_queued: true }
|
||||
|
|
@ -339,12 +473,50 @@ async function dismissModel(id: string) {
|
|||
} catch { /* ignore */ }
|
||||
}
|
||||
|
||||
function onServiceChange(name: string, service: string) {
|
||||
const roles = SERVICE_ROLES[service] ?? []
|
||||
classifyDraft.value = {
|
||||
...classifyDraft.value,
|
||||
[name]: { service, role: roles.length === 1 ? roles[0] : '' },
|
||||
}
|
||||
}
|
||||
|
||||
function setClassifyRole(name: string, role: string) {
|
||||
classifyDraft.value = {
|
||||
...classifyDraft.value,
|
||||
[name]: { ...classifyDraft.value[name], role },
|
||||
}
|
||||
}
|
||||
|
||||
async function saveClassify(name: string) {
|
||||
const draft = classifyDraft.value[name]
|
||||
if (!draft?.service || !draft?.role) return
|
||||
try {
|
||||
const res = await fetch(`/api/models/installed/${encodeURIComponent(name)}`, {
|
||||
method: 'PATCH',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({ service: draft.service, role: draft.role }),
|
||||
})
|
||||
if (res.ok) {
|
||||
// Update in-place so the model moves to the correct service group
|
||||
installedModels.value = installedModels.value.map(m =>
|
||||
m.name === name ? { ...m, service: draft.service, role: draft.role } : m
|
||||
)
|
||||
const updated = { ...classifyDraft.value }
|
||||
delete updated[name]
|
||||
classifyDraft.value = updated
|
||||
await loadQueue()
|
||||
}
|
||||
} catch { /* non-fatal */ }
|
||||
}
|
||||
|
||||
async function deleteInstalled(name: string) {
|
||||
if (!window.confirm(`Delete installed model "${name}"? This cannot be undone.`)) return
|
||||
try {
|
||||
const res = await fetch(`/api/models/installed/${encodeURIComponent(name)}`, { method: 'DELETE' })
|
||||
if (res.ok) {
|
||||
installedModels.value = installedModels.value.filter(m => m.name !== name)
|
||||
await loadQueue()
|
||||
}
|
||||
} catch { /* ignore */ }
|
||||
}
|
||||
|
|
@ -378,21 +550,28 @@ function startSse() {
|
|||
return
|
||||
}
|
||||
|
||||
const { model_id, pct, status, message } = event
|
||||
const { type, repo_id, pct, downloaded_bytes, error } = event
|
||||
if (!repo_id) return
|
||||
|
||||
if (status === 'progress' && pct != null) {
|
||||
downloadProgress.value = { ...downloadProgress.value, [model_id]: pct }
|
||||
} else if (status === 'done') {
|
||||
if (type === 'progress') {
|
||||
const bytes = downloaded_bytes ?? 0
|
||||
// pct stays null when total_bytes is unknown so we can show "X MB" instead
|
||||
const progress = (pct != null && pct > 0) ? pct : (bytes > 0 ? null : undefined)
|
||||
downloadProgress.value = { ...downloadProgress.value, [repo_id]: { pct: progress ?? null, bytes } }
|
||||
} else if (type === 'done') {
|
||||
const updated = { ...downloadProgress.value }
|
||||
delete updated[model_id]
|
||||
delete updated[repo_id]
|
||||
downloadProgress.value = updated
|
||||
|
||||
queuedModels.value = queuedModels.value.filter(m => m.id !== model_id)
|
||||
queuedModels.value = queuedModels.value.filter(m => m.repo_id !== repo_id)
|
||||
loadInstalled()
|
||||
} else if (status === 'error') {
|
||||
downloadErrors.value = {
|
||||
...downloadErrors.value,
|
||||
[model_id]: message ?? 'Download failed.',
|
||||
} else if (type === 'error') {
|
||||
const entry = queuedModels.value.find(m => m.repo_id === repo_id)
|
||||
if (entry) {
|
||||
downloadErrors.value = {
|
||||
...downloadErrors.value,
|
||||
[entry.id]: error ?? 'Download failed.',
|
||||
}
|
||||
}
|
||||
}
|
||||
})
|
||||
|
|
@ -595,12 +774,6 @@ onUnmounted(() => {
|
|||
align-self: flex-start;
|
||||
}
|
||||
|
||||
.btn-add-queue-warn {
|
||||
background: var(--color-surface-raised, #e4ebf5);
|
||||
color: var(--color-text-secondary, #6b7a99);
|
||||
border: 1px solid var(--color-border, #d0d7e8);
|
||||
}
|
||||
|
||||
/* ── Model cards (queue + downloads) ── */
|
||||
.model-card {
|
||||
border: 1px solid var(--color-border, #a8b8d0);
|
||||
|
|
@ -715,6 +888,35 @@ onUnmounted(() => {
|
|||
word-break: break-all;
|
||||
}
|
||||
|
||||
.td-actions {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 0.4rem;
|
||||
align-items: flex-start;
|
||||
}
|
||||
|
||||
.classify-row {
|
||||
display: flex;
|
||||
gap: 0.35rem;
|
||||
align-items: center;
|
||||
flex-wrap: wrap;
|
||||
}
|
||||
|
||||
.classify-select {
|
||||
font-size: 0.78rem;
|
||||
padding: 0.2rem 0.4rem;
|
||||
border-radius: 4px;
|
||||
border: 1px solid var(--color-border, #444);
|
||||
background: var(--color-surface, #1e1e2e);
|
||||
color: var(--color-text, #cdd6f4);
|
||||
cursor: pointer;
|
||||
}
|
||||
|
||||
.classify-select:disabled {
|
||||
opacity: 0.4;
|
||||
cursor: not-allowed;
|
||||
}
|
||||
|
||||
/* ── Badges ── */
|
||||
.badge-group {
|
||||
display: flex;
|
||||
|
|
@ -777,6 +979,76 @@ onUnmounted(() => {
|
|||
background: color-mix(in srgb, var(--color-accent, #c4732a) 12%, var(--color-surface-alt, #dde4f0));
|
||||
}
|
||||
|
||||
.chip-role {
|
||||
color: var(--color-info, #1e6091);
|
||||
background: color-mix(in srgb, var(--color-info, #1e6091) 12%, var(--color-surface-alt, #dde4f0));
|
||||
}
|
||||
|
||||
.chip-sm {
|
||||
font-size: 0.68rem;
|
||||
padding: 0.1rem 0.4rem;
|
||||
}
|
||||
|
||||
/* Service chips — one colour per CF service */
|
||||
.chip-service-avocet {
|
||||
color: var(--color-primary, #2d5a27);
|
||||
background: color-mix(in srgb, var(--color-primary, #2d5a27) 15%, var(--color-surface-alt, #dde4f0));
|
||||
}
|
||||
|
||||
.chip-service-cf-text {
|
||||
color: #c2410c;
|
||||
background: color-mix(in srgb, #c2410c 12%, var(--color-surface-alt, #dde4f0));
|
||||
}
|
||||
|
||||
.chip-service-cf-stt {
|
||||
color: #5e35b1;
|
||||
background: color-mix(in srgb, #5e35b1 12%, var(--color-surface-alt, #dde4f0));
|
||||
}
|
||||
|
||||
.chip-service-cf-tts {
|
||||
color: #0277bd;
|
||||
background: color-mix(in srgb, #0277bd 12%, var(--color-surface-alt, #dde4f0));
|
||||
}
|
||||
|
||||
.chip-service-cf-vision {
|
||||
color: #00695c;
|
||||
background: color-mix(in srgb, #00695c 12%, var(--color-surface-alt, #dde4f0));
|
||||
}
|
||||
|
||||
.chip-service-cf-core {
|
||||
color: #6d4c41;
|
||||
background: color-mix(in srgb, #6d4c41 12%, var(--color-surface-alt, #dde4f0));
|
||||
}
|
||||
|
||||
.chip-service-cf-voice {
|
||||
color: #ad1457;
|
||||
background: color-mix(in srgb, #ad1457 12%, var(--color-surface-alt, #dde4f0));
|
||||
}
|
||||
|
||||
.chip-service-other {
|
||||
color: var(--color-text-muted, #4a5c7a);
|
||||
background: var(--color-surface-alt, #dde4f0);
|
||||
}
|
||||
|
||||
/* ── Installed group ── */
|
||||
.installed-group {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 0.5rem;
|
||||
}
|
||||
|
||||
.installed-group-header {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 0.5rem;
|
||||
padding: 0.25rem 0;
|
||||
}
|
||||
|
||||
.installed-group-count {
|
||||
font-size: 0.78rem;
|
||||
color: var(--color-text-muted, #4a5c7a);
|
||||
}
|
||||
|
||||
/* ── Buttons ── */
|
||||
.btn-primary, .btn-danger {
|
||||
padding: 0.4rem 0.9rem;
|
||||
|
|
@ -852,7 +1124,7 @@ onUnmounted(() => {
|
|||
|
||||
.installed-table th:nth-child(3),
|
||||
.installed-table td:nth-child(3) {
|
||||
display: none; /* hide Adapter column on very narrow screens */
|
||||
display: none; /* hide Role column on very narrow screens */
|
||||
}
|
||||
}
|
||||
</style>
|
||||
|
|
|
|||
919
web/src/views/StyleTab.vue
Normal file
919
web/src/views/StyleTab.vue
Normal file
|
|
@ -0,0 +1,919 @@
|
|||
<template>
|
||||
<div class="style-tab">
|
||||
|
||||
<!-- ── Controls row ──────────────────────────────────────────────────── -->
|
||||
<div class="style-controls">
|
||||
|
||||
<!-- Model picker -->
|
||||
<details class="model-picker" open>
|
||||
<summary class="picker-summary">
|
||||
<span class="picker-title">✍️ Models</span>
|
||||
<span class="picker-badge">{{ selectedCount }} selected</span>
|
||||
<button class="btn-refresh" :disabled="modelsLoading" @click.stop="loadModels" title="Refresh model list">
|
||||
{{ modelsLoading ? '⏳' : '🔄' }}
|
||||
</button>
|
||||
</summary>
|
||||
<div class="picker-body">
|
||||
<div v-if="modelsLoading" class="picker-loading">Loading models…</div>
|
||||
<div v-else-if="loadError" class="picker-error">{{ loadError }}</div>
|
||||
<template v-else>
|
||||
|
||||
<!-- Ollama group -->
|
||||
<div class="picker-group" v-if="ollamaModels.length">
|
||||
<div class="group-header">
|
||||
<label class="group-check">
|
||||
<input
|
||||
type="checkbox"
|
||||
:checked="isGroupAllSelected('ollama')"
|
||||
:indeterminate="isGroupIndeterminate('ollama')"
|
||||
@change="toggleGroup('ollama', ($event.target as HTMLInputElement).checked)"
|
||||
/>
|
||||
<span class="group-label">Ollama</span>
|
||||
<span class="group-count">({{ ollamaModels.length }})</span>
|
||||
</label>
|
||||
<span class="group-note">auto-synced with Models view</span>
|
||||
</div>
|
||||
<div class="model-list">
|
||||
<label v-for="m in ollamaModels" :key="m.id" class="model-item">
|
||||
<input type="checkbox" :value="m.id" v-model="selectedModels" />
|
||||
<span class="model-name">{{ m.name }}</span>
|
||||
<span v-if="m.size_mb" class="model-meta">{{ formatMb(m.size_mb) }}</span>
|
||||
</label>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- cf-text group -->
|
||||
<div class="picker-group" v-if="cftextModels.length">
|
||||
<div class="group-header">
|
||||
<label class="group-check">
|
||||
<input
|
||||
type="checkbox"
|
||||
:checked="isGroupAllSelected('cf-text')"
|
||||
:indeterminate="isGroupIndeterminate('cf-text')"
|
||||
@change="toggleGroup('cf-text', ($event.target as HTMLInputElement).checked)"
|
||||
/>
|
||||
<span class="group-label">cf-text (cf-orch)</span>
|
||||
<span class="group-count">({{ cftextModels.length }})</span>
|
||||
</label>
|
||||
<span class="group-note">GGUFs via coordinator — enable cf-orch below</span>
|
||||
</div>
|
||||
<div class="model-list">
|
||||
<label v-for="m in cftextModels" :key="m.id" class="model-item">
|
||||
<input type="checkbox" :value="m.id" v-model="selectedModels" />
|
||||
<span class="model-name">{{ m.name }}</span>
|
||||
<span v-if="m.vram_mb" class="model-meta">{{ formatMb(m.vram_mb) }} VRAM</span>
|
||||
</label>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div v-if="!ollamaModels.length && !cftextModels.length" class="picker-empty">
|
||||
No models available — check Ollama and cf-orch connections.
|
||||
</div>
|
||||
|
||||
</template>
|
||||
</div>
|
||||
</details>
|
||||
|
||||
<!-- Options panel -->
|
||||
<details class="options-panel">
|
||||
<summary class="picker-summary">
|
||||
<span class="picker-title">⚙️ Options</span>
|
||||
</summary>
|
||||
<div class="options-body">
|
||||
<label class="option-row">
|
||||
<input type="checkbox" v-model="useCforch" :disabled="running" />
|
||||
<span class="option-label">Use cf-orch backend</span>
|
||||
<span class="option-hint">Routes generation through cf-text instead of ollama</span>
|
||||
</label>
|
||||
<label class="option-row" :class="{ dimmed: !useCforch }">
|
||||
<span class="option-label">Max VRAM (MB)</span>
|
||||
<input
|
||||
type="number"
|
||||
v-model.number="maxVram"
|
||||
:disabled="running || !useCforch"
|
||||
min="1024"
|
||||
max="24576"
|
||||
step="512"
|
||||
class="option-number"
|
||||
/>
|
||||
<span class="option-hint">Skip models exceeding this VRAM limit</span>
|
||||
</label>
|
||||
<label class="option-row">
|
||||
<span class="option-label">Parallel workers</span>
|
||||
<input
|
||||
type="number"
|
||||
v-model.number="workers"
|
||||
:disabled="running"
|
||||
min="1"
|
||||
max="16"
|
||||
step="1"
|
||||
class="option-number"
|
||||
/>
|
||||
<span class="option-hint">Models to score simultaneously (1 = sequential)</span>
|
||||
</label>
|
||||
<label class="option-row">
|
||||
<input type="checkbox" v-model="includeLarge" :disabled="running" />
|
||||
<span class="option-label">Include large models (30B+)</span>
|
||||
<span class="option-hint">Off by default — these take much longer</span>
|
||||
</label>
|
||||
</div>
|
||||
</details>
|
||||
|
||||
</div>
|
||||
|
||||
<!-- ── Run controls ──────────────────────────────────────────────────── -->
|
||||
<div class="run-bar">
|
||||
<button class="btn-run" :disabled="running || selectedCount === 0" @click="startBenchmark">
|
||||
{{ running ? '⏳ Running…' : results.length ? '🔄 Re-run' : '▶ Run Benchmark' }}
|
||||
</button>
|
||||
<button v-if="running" class="btn-cancel" @click="cancelBenchmark">✕ Cancel</button>
|
||||
<span v-if="selectedCount === 0 && !running" class="run-hint">Select at least one model above</span>
|
||||
</div>
|
||||
|
||||
<!-- ── Progress log ──────────────────────────────────────────────────── -->
|
||||
<div v-if="runLog.length" class="run-log">
|
||||
<div class="run-log-header">
|
||||
<span class="run-log-title">Run log</span>
|
||||
<button class="btn-clear" @click="runLog = []">Clear</button>
|
||||
</div>
|
||||
<pre class="run-log-body" ref="logEl">{{ runLog.join('\n') }}</pre>
|
||||
</div>
|
||||
|
||||
<!-- ── Past runs picker ─────────────────────────────────────────────── -->
|
||||
<div class="history-bar" v-if="pastRuns.length">
|
||||
<label class="history-label">📂 Past runs:</label>
|
||||
<select class="history-select" v-model="selectedRun" @change="loadRun(selectedRun)">
|
||||
<option value="">— select a past run —</option>
|
||||
<option v-for="r in pastRuns" :key="r.filename" :value="r.filename">
|
||||
{{ r.date }} · {{ r.model_count }} model{{ r.model_count !== 1 ? 's' : '' }} · top {{ r.top_score }}/100
|
||||
</option>
|
||||
</select>
|
||||
</div>
|
||||
|
||||
<!-- ── Results table ─────────────────────────────────────────────────── -->
|
||||
<div v-if="results.length" class="results-section">
|
||||
<div class="results-header">
|
||||
<h2 class="results-title">Rankings</h2>
|
||||
<button
|
||||
class="btn-corrections"
|
||||
:disabled="sendingCorrections"
|
||||
@click="sendToCorrections"
|
||||
title="Push all outputs from this run into the Corrections review queue"
|
||||
>
|
||||
{{ sendingCorrections ? '⏳ Sending…' : correctionsMsg || '✍️ Send to Corrections' }}
|
||||
</button>
|
||||
</div>
|
||||
<div class="results-table-wrap">
|
||||
<table class="results-table">
|
||||
<thead>
|
||||
<tr>
|
||||
<th>Rank</th>
|
||||
<th>Model</th>
|
||||
<th>Score</th>
|
||||
<th>Latency</th>
|
||||
<th title="Em-dash count">—</th>
|
||||
<th title="Filler phrase hits">Fillers</th>
|
||||
<th title="Semicolons">;</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr
|
||||
v-for="(r, i) in results"
|
||||
:key="r.model_id"
|
||||
class="result-row"
|
||||
:class="{ 'top-row': i === 0 }"
|
||||
@click="toggleExpanded(r.model_id)"
|
||||
>
|
||||
<td class="rank-cell">{{ medal(i) }}</td>
|
||||
<td class="model-cell">
|
||||
<span class="model-name-text">{{ r.model_id }}</span>
|
||||
</td>
|
||||
<td class="score-cell">
|
||||
<span class="score-pill" :style="scorePillStyle(r.avg_score)">
|
||||
{{ r.avg_score.toFixed(0) }}
|
||||
</span>
|
||||
</td>
|
||||
<td class="latency-cell">{{ formatLatency(r.avg_latency_ms) }}</td>
|
||||
<td class="violation-cell" :class="{ 'has-violation': r.total_em_dashes > 0 }">
|
||||
{{ r.total_em_dashes }}
|
||||
</td>
|
||||
<td class="violation-cell" :class="{ 'has-violation': r.total_filler_hits > 0 }">
|
||||
{{ r.total_filler_hits }}
|
||||
</td>
|
||||
<td class="violation-cell" :class="{ 'has-violation': r.total_semicolons > 0 }">
|
||||
{{ r.total_semicolons }}
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
|
||||
<!-- Expandable sample outputs -->
|
||||
<div v-for="r in results" :key="'exp-' + r.model_id">
|
||||
<div v-if="expandedModels.has(r.model_id)" class="sample-outputs">
|
||||
<div class="sample-header">
|
||||
<strong>{{ r.model_id }}</strong>
|
||||
<button class="btn-collapse" @click="toggleExpanded(r.model_id)">✕ Close</button>
|
||||
</div>
|
||||
<div v-for="pr in r.prompt_results" :key="pr.tag" class="sample-prompt">
|
||||
<div class="sample-tag">
|
||||
<span class="tag-name">{{ pr.tag }}</span>
|
||||
<span class="tag-score">{{ pr.score.toFixed(0) }}/100</span>
|
||||
<span class="tag-latency">{{ formatLatency(pr.latency_ms) }}</span>
|
||||
</div>
|
||||
<pre class="sample-text">{{ pr.output || '(no output)' }}</pre>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</template>
|
||||
|
||||
<script setup lang="ts">
|
||||
import { ref, computed, onMounted, nextTick, watch } from 'vue'
|
||||
|
||||
// ── Types ───────────────────────────────────────────────────────────────────
|
||||
|
||||
interface StyleModel {
|
||||
id: string
|
||||
name: string
|
||||
source: 'ollama' | 'cf-text'
|
||||
size_mb?: number | null
|
||||
vram_mb?: number | null
|
||||
description?: string
|
||||
}
|
||||
|
||||
interface PromptResult {
|
||||
tag: string
|
||||
output: string
|
||||
score: number
|
||||
latency_ms: number
|
||||
signals: Record<string, unknown>
|
||||
}
|
||||
|
||||
interface ModelResult {
|
||||
model_id: string
|
||||
avg_score: number
|
||||
avg_latency_ms: number
|
||||
total_filler_hits: number
|
||||
total_em_dashes: number
|
||||
total_semicolons: number
|
||||
prompt_results: PromptResult[]
|
||||
}
|
||||
|
||||
interface PastRun {
|
||||
filename: string
|
||||
date: string
|
||||
model_count: number
|
||||
top_score: number
|
||||
}
|
||||
|
||||
// ── State ───────────────────────────────────────────────────────────────────
|
||||
|
||||
const ollamaModels = ref<StyleModel[]>([])
|
||||
const cftextModels = ref<StyleModel[]>([])
|
||||
const selectedModels = ref<string[]>([])
|
||||
const modelsLoading = ref(false)
|
||||
const loadError = ref('')
|
||||
|
||||
const useCforch = ref(false)
|
||||
const maxVram = ref(7200)
|
||||
const workers = ref(1)
|
||||
const includeLarge = ref(false)
|
||||
|
||||
const running = ref(false)
|
||||
const runLog = ref<string[]>([])
|
||||
const logEl = ref<HTMLPreElement | null>(null)
|
||||
|
||||
const results = ref<ModelResult[]>([])
|
||||
const pastRuns = ref<PastRun[]>([])
|
||||
const selectedRun = ref('')
|
||||
const expandedModels = ref(new Set<string>())
|
||||
const sendingCorrections = ref(false)
|
||||
const correctionsMsg = ref('')
|
||||
|
||||
// ── Computed ─────────────────────────────────────────────────────────────────
|
||||
|
||||
const selectedCount = computed(() => selectedModels.value.length)
|
||||
|
||||
function isGroupAllSelected(source: string): boolean {
|
||||
const group = source === 'ollama' ? ollamaModels.value : cftextModels.value
|
||||
return group.length > 0 && group.every(m => selectedModels.value.includes(m.id))
|
||||
}
|
||||
|
||||
function isGroupIndeterminate(source: string): boolean {
|
||||
const group = source === 'ollama' ? ollamaModels.value : cftextModels.value
|
||||
const count = group.filter(m => selectedModels.value.includes(m.id)).length
|
||||
return count > 0 && count < group.length
|
||||
}
|
||||
|
||||
// ── Actions ──────────────────────────────────────────────────────────────────
|
||||
|
||||
async function loadModels() {
|
||||
modelsLoading.value = true
|
||||
loadError.value = ''
|
||||
try {
|
||||
const resp = await fetch('/api/style/models')
|
||||
if (!resp.ok) throw new Error(`HTTP ${resp.status}`)
|
||||
const data = await resp.json()
|
||||
ollamaModels.value = data.ollama ?? []
|
||||
cftextModels.value = data.cf_text ?? []
|
||||
} catch (e: unknown) {
|
||||
loadError.value = `Failed to load models: ${e instanceof Error ? e.message : String(e)}`
|
||||
} finally {
|
||||
modelsLoading.value = false
|
||||
}
|
||||
}
|
||||
|
||||
async function loadPastRuns() {
|
||||
try {
|
||||
const resp = await fetch('/api/style/results')
|
||||
if (resp.ok) pastRuns.value = await resp.json()
|
||||
} catch { /* non-fatal */ }
|
||||
}
|
||||
|
||||
async function loadRun(filename: string) {
|
||||
if (!filename) return
|
||||
try {
|
||||
const resp = await fetch(`/api/style/results/${filename}`)
|
||||
if (!resp.ok) throw new Error(`HTTP ${resp.status}`)
|
||||
results.value = await resp.json()
|
||||
expandedModels.value.clear()
|
||||
} catch (e: unknown) {
|
||||
runLog.value.push(`[error] Failed to load ${filename}: ${e instanceof Error ? e.message : String(e)}`)
|
||||
}
|
||||
}
|
||||
|
||||
function toggleGroup(source: string, checked: boolean) {
|
||||
const group = source === 'ollama' ? ollamaModels.value : cftextModels.value
|
||||
const ids = group.map(m => m.id)
|
||||
if (checked) {
|
||||
const newSet = new Set([...selectedModels.value, ...ids])
|
||||
selectedModels.value = [...newSet]
|
||||
} else {
|
||||
selectedModels.value = selectedModels.value.filter(id => !ids.includes(id))
|
||||
}
|
||||
}
|
||||
|
||||
function toggleExpanded(modelId: string) {
|
||||
if (expandedModels.value.has(modelId)) {
|
||||
expandedModels.value.delete(modelId)
|
||||
} else {
|
||||
expandedModels.value.add(modelId)
|
||||
}
|
||||
expandedModels.value = new Set(expandedModels.value)
|
||||
}
|
||||
|
||||
function startBenchmark() {
|
||||
if (running.value || selectedCount.value === 0) return
|
||||
running.value = true
|
||||
runLog.value = []
|
||||
results.value = []
|
||||
expandedModels.value.clear()
|
||||
|
||||
const params = new URLSearchParams({
|
||||
models: selectedModels.value.join(','),
|
||||
use_cforch: String(useCforch.value),
|
||||
max_vram: String(maxVram.value),
|
||||
workers: String(workers.value),
|
||||
include_large: String(includeLarge.value),
|
||||
})
|
||||
|
||||
const es = new EventSource(`/api/style/run?${params}`)
|
||||
|
||||
es.onmessage = async (ev) => {
|
||||
try {
|
||||
const msg = JSON.parse(ev.data)
|
||||
if (msg.type === 'progress') {
|
||||
runLog.value.push(msg.message)
|
||||
await nextTick()
|
||||
if (logEl.value) logEl.value.scrollTop = logEl.value.scrollHeight
|
||||
} else if (msg.type === 'result') {
|
||||
results.value = msg.results ?? []
|
||||
await loadPastRuns()
|
||||
} else if (msg.type === 'complete') {
|
||||
running.value = false
|
||||
es.close()
|
||||
} else if (msg.type === 'error') {
|
||||
runLog.value.push(`[error] ${msg.message}`)
|
||||
running.value = false
|
||||
es.close()
|
||||
}
|
||||
} catch { /* ignore parse errors */ }
|
||||
}
|
||||
|
||||
es.onerror = () => {
|
||||
if (running.value) {
|
||||
runLog.value.push('[error] Connection lost')
|
||||
running.value = false
|
||||
}
|
||||
es.close()
|
||||
}
|
||||
}
|
||||
|
||||
async function cancelBenchmark() {
|
||||
try {
|
||||
await fetch('/api/style/cancel', { method: 'POST' })
|
||||
} finally {
|
||||
running.value = false
|
||||
runLog.value.push('[cancelled]')
|
||||
}
|
||||
}
|
||||
|
||||
async function sendToCorrections() {
|
||||
if (!selectedRun.value || sendingCorrections.value) return
|
||||
sendingCorrections.value = true
|
||||
correctionsMsg.value = ''
|
||||
try {
|
||||
const resp = await fetch('/api/style/send-to-corrections', {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({ filename: selectedRun.value, model_ids: [] }),
|
||||
})
|
||||
if (!resp.ok) throw new Error(`HTTP ${resp.status}`)
|
||||
const data = await resp.json()
|
||||
correctionsMsg.value = `✓ ${data.imported} added to Corrections`
|
||||
} catch (e: unknown) {
|
||||
correctionsMsg.value = `Error: ${e instanceof Error ? e.message : String(e)}`
|
||||
} finally {
|
||||
sendingCorrections.value = false
|
||||
}
|
||||
}
|
||||
|
||||
// ── Formatting helpers ────────────────────────────────────────────────────────
|
||||
|
||||
function formatMb(mb: number): string {
|
||||
return mb >= 1024 ? `${(mb / 1024).toFixed(1)} GB` : `${mb} MB`
|
||||
}
|
||||
|
||||
function formatLatency(ms: number): string {
|
||||
return ms >= 1000 ? `${(ms / 1000).toFixed(1)}s` : `${Math.round(ms)}ms`
|
||||
}
|
||||
|
||||
function medal(index: number): string {
|
||||
return ['🥇', '🥈', '🥉'][index] ?? `#${index + 1}`
|
||||
}
|
||||
|
||||
function scorePillStyle(score: number): Record<string, string> {
|
||||
const hue = Math.round((score / 100) * 120) // 0=red, 120=green
|
||||
return {
|
||||
background: `hsl(${hue} 60% 88%)`,
|
||||
color: `hsl(${hue} 60% 28%)`,
|
||||
}
|
||||
}
|
||||
|
||||
// ── Lifecycle ─────────────────────────────────────────────────────────────────
|
||||
|
||||
// Auto-enable cf-orch when cf-text models are selected
|
||||
watch(selectedModels, (ids) => {
|
||||
const hasCftext = ids.some(id => cftextModels.value.find(m => m.id === id))
|
||||
if (hasCftext) useCforch.value = true
|
||||
})
|
||||
|
||||
onMounted(async () => {
|
||||
await Promise.all([loadModels(), loadPastRuns()])
|
||||
// Auto-load the latest results if any exist
|
||||
if (pastRuns.value.length) {
|
||||
selectedRun.value = pastRuns.value[0].filename
|
||||
await loadRun(pastRuns.value[0].filename)
|
||||
}
|
||||
})
|
||||
</script>
|
||||
|
||||
<style scoped>
|
||||
.style-tab {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 1rem;
|
||||
padding: 1rem 0;
|
||||
}
|
||||
|
||||
/* ── Controls ─────────────────────────────────────────────────────────────── */
|
||||
|
||||
.style-controls {
|
||||
display: flex;
|
||||
flex-wrap: wrap;
|
||||
gap: 0.75rem;
|
||||
align-items: flex-start;
|
||||
}
|
||||
|
||||
.model-picker,
|
||||
.options-panel {
|
||||
flex: 1;
|
||||
min-width: 280px;
|
||||
border: 1px solid var(--color-border, #d0d7e8);
|
||||
border-radius: 0.5rem;
|
||||
background: var(--color-surface, #f4f7fc);
|
||||
overflow: hidden;
|
||||
}
|
||||
|
||||
.picker-summary {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 0.5rem;
|
||||
padding: 0.65rem 0.85rem;
|
||||
cursor: pointer;
|
||||
user-select: none;
|
||||
font-size: 0.9rem;
|
||||
font-weight: 600;
|
||||
list-style: none;
|
||||
}
|
||||
|
||||
.picker-summary::-webkit-details-marker { display: none; }
|
||||
|
||||
.picker-title { flex: 1; color: var(--color-text, #1a2338); }
|
||||
.picker-badge {
|
||||
background: var(--app-primary, #2A6080);
|
||||
color: #fff;
|
||||
border-radius: 9999px;
|
||||
padding: 0.1rem 0.5rem;
|
||||
font-size: 0.72rem;
|
||||
font-weight: 700;
|
||||
}
|
||||
|
||||
.btn-refresh {
|
||||
border: none;
|
||||
background: transparent;
|
||||
cursor: pointer;
|
||||
font-size: 0.85rem;
|
||||
padding: 0.1rem 0.25rem;
|
||||
border-radius: 0.25rem;
|
||||
color: var(--color-text-secondary, #6b7a99);
|
||||
}
|
||||
.btn-refresh:hover { background: var(--color-border, #d0d7e8); }
|
||||
.btn-refresh:disabled { opacity: 0.5; cursor: not-allowed; }
|
||||
|
||||
.picker-body,
|
||||
.options-body {
|
||||
padding: 0.75rem;
|
||||
border-top: 1px solid var(--color-border, #d0d7e8);
|
||||
}
|
||||
|
||||
.picker-loading, .picker-empty {
|
||||
color: var(--color-text-secondary, #6b7a99);
|
||||
font-size: 0.85rem;
|
||||
padding: 0.25rem 0;
|
||||
}
|
||||
|
||||
.picker-error {
|
||||
color: #b91c1c;
|
||||
font-size: 0.85rem;
|
||||
}
|
||||
|
||||
/* ── Model groups ──────────────────────────────────────────────────────────── */
|
||||
|
||||
.picker-group {
|
||||
margin-bottom: 0.75rem;
|
||||
}
|
||||
.picker-group:last-child { margin-bottom: 0; }
|
||||
|
||||
.group-header {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 0.5rem;
|
||||
margin-bottom: 0.4rem;
|
||||
}
|
||||
|
||||
.group-check {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 0.35rem;
|
||||
font-size: 0.85rem;
|
||||
font-weight: 600;
|
||||
cursor: pointer;
|
||||
color: var(--color-text, #1a2338);
|
||||
}
|
||||
|
||||
.group-count {
|
||||
color: var(--color-text-secondary, #6b7a99);
|
||||
font-weight: 400;
|
||||
font-size: 0.8rem;
|
||||
}
|
||||
|
||||
.group-note {
|
||||
margin-left: auto;
|
||||
font-size: 0.72rem;
|
||||
color: var(--color-text-secondary, #6b7a99);
|
||||
font-style: italic;
|
||||
}
|
||||
|
||||
.model-list {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 0.2rem;
|
||||
padding-left: 1.25rem;
|
||||
max-height: 220px;
|
||||
overflow-y: auto;
|
||||
}
|
||||
|
||||
.model-item {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 0.4rem;
|
||||
font-size: 0.82rem;
|
||||
cursor: pointer;
|
||||
padding: 0.15rem 0;
|
||||
}
|
||||
|
||||
.model-name { flex: 1; font-family: var(--font-mono, monospace); }
|
||||
|
||||
.model-meta {
|
||||
font-size: 0.72rem;
|
||||
color: var(--color-text-secondary, #6b7a99);
|
||||
}
|
||||
|
||||
/* ── Options ──────────────────────────────────────────────────────────────── */
|
||||
|
||||
.option-row {
|
||||
display: flex;
|
||||
align-items: flex-start;
|
||||
gap: 0.5rem;
|
||||
padding: 0.35rem 0;
|
||||
cursor: pointer;
|
||||
font-size: 0.85rem;
|
||||
}
|
||||
|
||||
.option-label { font-weight: 500; white-space: nowrap; }
|
||||
|
||||
.option-hint {
|
||||
flex: 1;
|
||||
font-size: 0.75rem;
|
||||
color: var(--color-text-secondary, #6b7a99);
|
||||
margin-left: auto;
|
||||
text-align: right;
|
||||
}
|
||||
|
||||
.option-number {
|
||||
width: 90px;
|
||||
padding: 0.2rem 0.4rem;
|
||||
border: 1px solid var(--color-border, #d0d7e8);
|
||||
border-radius: 0.25rem;
|
||||
font-size: 0.85rem;
|
||||
background: var(--color-bg, #fff);
|
||||
color: var(--color-text, #1a2338);
|
||||
}
|
||||
|
||||
.option-row.dimmed { opacity: 0.45; pointer-events: none; }
|
||||
|
||||
/* ── Run bar ──────────────────────────────────────────────────────────────── */
|
||||
|
||||
.run-bar {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 0.65rem;
|
||||
}
|
||||
|
||||
.btn-run {
|
||||
padding: 0.5rem 1.25rem;
|
||||
border: none;
|
||||
border-radius: 0.375rem;
|
||||
background: var(--app-primary, #2A6080);
|
||||
color: #fff;
|
||||
font-size: 0.9rem;
|
||||
font-weight: 600;
|
||||
cursor: pointer;
|
||||
transition: background 0.15s;
|
||||
}
|
||||
.btn-run:hover:not(:disabled) { background: color-mix(in srgb, var(--app-primary, #2A6080) 80%, #000); }
|
||||
.btn-run:disabled { opacity: 0.5; cursor: not-allowed; }
|
||||
|
||||
.btn-cancel {
|
||||
padding: 0.5rem 0.9rem;
|
||||
border: 1px solid #f85149;
|
||||
border-radius: 0.375rem;
|
||||
background: transparent;
|
||||
color: #b91c1c;
|
||||
font-size: 0.85rem;
|
||||
cursor: pointer;
|
||||
transition: background 0.15s;
|
||||
}
|
||||
.btn-cancel:hover { background: #fee2e2; }
|
||||
|
||||
.run-hint {
|
||||
font-size: 0.8rem;
|
||||
color: var(--color-text-secondary, #6b7a99);
|
||||
}
|
||||
|
||||
/* ── Run log ──────────────────────────────────────────────────────────────── */
|
||||
|
||||
.run-log {
|
||||
border: 1px solid var(--color-border, #d0d7e8);
|
||||
border-radius: 0.5rem;
|
||||
overflow: hidden;
|
||||
}
|
||||
|
||||
.run-log-header {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: space-between;
|
||||
padding: 0.4rem 0.75rem;
|
||||
background: var(--color-surface, #f4f7fc);
|
||||
border-bottom: 1px solid var(--color-border, #d0d7e8);
|
||||
font-size: 0.8rem;
|
||||
font-weight: 600;
|
||||
color: var(--color-text-secondary, #6b7a99);
|
||||
}
|
||||
|
||||
.run-log-title { text-transform: uppercase; letter-spacing: 0.05em; }
|
||||
|
||||
.btn-clear {
|
||||
border: none;
|
||||
background: transparent;
|
||||
font-size: 0.75rem;
|
||||
color: var(--color-text-secondary, #6b7a99);
|
||||
cursor: pointer;
|
||||
padding: 0.1rem 0.3rem;
|
||||
border-radius: 0.25rem;
|
||||
}
|
||||
.btn-clear:hover { background: var(--color-border, #d0d7e8); }
|
||||
|
||||
.run-log-body {
|
||||
margin: 0;
|
||||
padding: 0.65rem 0.85rem;
|
||||
font-size: 0.78rem;
|
||||
font-family: var(--font-mono, monospace);
|
||||
white-space: pre-wrap;
|
||||
word-break: break-all;
|
||||
max-height: 260px;
|
||||
overflow-y: auto;
|
||||
background: var(--color-bg, #fff);
|
||||
color: var(--color-text, #1a2338);
|
||||
}
|
||||
|
||||
/* ── History bar ──────────────────────────────────────────────────────────── */
|
||||
|
||||
.history-bar {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 0.6rem;
|
||||
font-size: 0.85rem;
|
||||
}
|
||||
|
||||
.history-label { font-weight: 500; white-space: nowrap; }
|
||||
|
||||
.history-select {
|
||||
flex: 1;
|
||||
padding: 0.3rem 0.5rem;
|
||||
border: 1px solid var(--color-border, #d0d7e8);
|
||||
border-radius: 0.375rem;
|
||||
background: var(--color-surface, #f4f7fc);
|
||||
color: var(--color-text, #1a2338);
|
||||
font-size: 0.85rem;
|
||||
}
|
||||
|
||||
/* ── Results table ────────────────────────────────────────────────────────── */
|
||||
|
||||
.results-section { display: flex; flex-direction: column; gap: 0.75rem; }
|
||||
|
||||
.results-header {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: space-between;
|
||||
gap: 0.75rem;
|
||||
}
|
||||
|
||||
.results-title {
|
||||
font-size: 1rem;
|
||||
font-weight: 700;
|
||||
color: var(--color-text, #1a2338);
|
||||
margin: 0;
|
||||
}
|
||||
|
||||
.btn-corrections {
|
||||
padding: 0.4rem 0.9rem;
|
||||
border: 1px solid var(--app-primary, #2A6080);
|
||||
border-radius: 0.375rem;
|
||||
background: transparent;
|
||||
color: var(--app-primary, #2A6080);
|
||||
font-size: 0.83rem;
|
||||
font-weight: 600;
|
||||
cursor: pointer;
|
||||
white-space: nowrap;
|
||||
transition: background 0.15s, color 0.15s;
|
||||
}
|
||||
.btn-corrections:hover:not(:disabled) {
|
||||
background: var(--app-primary, #2A6080);
|
||||
color: #fff;
|
||||
}
|
||||
.btn-corrections:disabled { opacity: 0.55; cursor: not-allowed; }
|
||||
|
||||
.results-table-wrap {
|
||||
overflow-x: auto;
|
||||
border: 1px solid var(--color-border, #d0d7e8);
|
||||
border-radius: 0.5rem;
|
||||
}
|
||||
|
||||
.results-table {
|
||||
width: 100%;
|
||||
border-collapse: collapse;
|
||||
font-size: 0.85rem;
|
||||
}
|
||||
|
||||
.results-table th {
|
||||
padding: 0.5rem 0.75rem;
|
||||
text-align: left;
|
||||
background: var(--color-surface, #f4f7fc);
|
||||
border-bottom: 1px solid var(--color-border, #d0d7e8);
|
||||
font-size: 0.78rem;
|
||||
font-weight: 700;
|
||||
text-transform: uppercase;
|
||||
letter-spacing: 0.04em;
|
||||
color: var(--color-text-secondary, #6b7a99);
|
||||
white-space: nowrap;
|
||||
}
|
||||
|
||||
.result-row {
|
||||
cursor: pointer;
|
||||
transition: background 0.1s;
|
||||
}
|
||||
.result-row:hover { background: color-mix(in srgb, var(--app-primary, #2A6080) 6%, transparent); }
|
||||
.result-row.top-row { font-weight: 600; }
|
||||
|
||||
.result-row td {
|
||||
padding: 0.5rem 0.75rem;
|
||||
border-bottom: 1px solid var(--color-border, #d0d7e8);
|
||||
}
|
||||
.result-row:last-child td { border-bottom: none; }
|
||||
|
||||
.rank-cell { width: 2.5rem; text-align: center; font-size: 1.1rem; }
|
||||
.model-cell { font-family: var(--font-mono, monospace); word-break: break-all; }
|
||||
.score-cell { width: 5rem; text-align: center; }
|
||||
.latency-cell { width: 5rem; text-align: right; color: var(--color-text-secondary, #6b7a99); }
|
||||
.violation-cell { width: 4rem; text-align: center; color: var(--color-text-secondary, #6b7a99); }
|
||||
.violation-cell.has-violation { color: #b91c1c; font-weight: 700; }
|
||||
|
||||
.score-pill {
|
||||
display: inline-block;
|
||||
padding: 0.15rem 0.55rem;
|
||||
border-radius: 9999px;
|
||||
font-weight: 700;
|
||||
font-size: 0.82rem;
|
||||
}
|
||||
|
||||
/* ── Sample outputs ───────────────────────────────────────────────────────── */
|
||||
|
||||
.sample-outputs {
|
||||
border: 1px solid var(--color-border, #d0d7e8);
|
||||
border-radius: 0.5rem;
|
||||
overflow: hidden;
|
||||
}
|
||||
|
||||
.sample-header {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: space-between;
|
||||
padding: 0.5rem 0.85rem;
|
||||
background: var(--color-surface, #f4f7fc);
|
||||
border-bottom: 1px solid var(--color-border, #d0d7e8);
|
||||
font-size: 0.85rem;
|
||||
}
|
||||
|
||||
.btn-collapse {
|
||||
border: none;
|
||||
background: transparent;
|
||||
font-size: 0.78rem;
|
||||
color: var(--color-text-secondary, #6b7a99);
|
||||
cursor: pointer;
|
||||
}
|
||||
|
||||
.sample-prompt {
|
||||
padding: 0.65rem 0.85rem;
|
||||
border-bottom: 1px solid var(--color-border, #d0d7e8);
|
||||
}
|
||||
.sample-prompt:last-child { border-bottom: none; }
|
||||
|
||||
.sample-tag {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 0.5rem;
|
||||
margin-bottom: 0.35rem;
|
||||
font-size: 0.8rem;
|
||||
}
|
||||
|
||||
.tag-name { font-weight: 600; color: var(--color-text, #1a2338); }
|
||||
.tag-score { color: var(--app-primary, #2A6080); font-weight: 700; }
|
||||
.tag-latency { color: var(--color-text-secondary, #6b7a99); margin-left: auto; }
|
||||
|
||||
.sample-text {
|
||||
margin: 0;
|
||||
font-size: 0.82rem;
|
||||
white-space: pre-wrap;
|
||||
word-break: break-word;
|
||||
max-height: 200px;
|
||||
overflow-y: auto;
|
||||
background: var(--color-bg, #fff);
|
||||
border: 1px solid var(--color-border, #d0d7e8);
|
||||
border-radius: 0.35rem;
|
||||
padding: 0.5rem 0.65rem;
|
||||
color: var(--color-text, #1a2338);
|
||||
font-family: inherit;
|
||||
}
|
||||
|
||||
@media (max-width: 640px) {
|
||||
.style-controls { flex-direction: column; }
|
||||
.model-picker, .options-panel { min-width: 0; }
|
||||
.option-hint { display: none; }
|
||||
.group-note { display: none; }
|
||||
}
|
||||
</style>
|
||||
919
web/src/views/VoiceTab.vue
Normal file
919
web/src/views/VoiceTab.vue
Normal file
|
|
@ -0,0 +1,919 @@
|
|||
<template>
|
||||
<div class="voice-tab">
|
||||
|
||||
<!-- ── Controls row ──────────────────────────────────────────────────── -->
|
||||
<div class="voice-controls">
|
||||
|
||||
<!-- Model picker -->
|
||||
<details class="model-picker" open>
|
||||
<summary class="picker-summary">
|
||||
<span class="picker-title">🎙 Models</span>
|
||||
<span class="picker-badge">{{ selectedCount }} selected</span>
|
||||
<button class="btn-refresh" :disabled="modelsLoading" @click.stop="loadModels" title="Refresh model list">
|
||||
{{ modelsLoading ? '⏳' : '🔄' }}
|
||||
</button>
|
||||
</summary>
|
||||
<div class="picker-body">
|
||||
<div v-if="modelsLoading" class="picker-loading">Loading models…</div>
|
||||
<div v-else-if="loadError" class="picker-error">{{ loadError }}</div>
|
||||
<template v-else>
|
||||
|
||||
<!-- Ollama group -->
|
||||
<div class="picker-group" v-if="ollamaModels.length">
|
||||
<div class="group-header">
|
||||
<label class="group-check">
|
||||
<input
|
||||
type="checkbox"
|
||||
:checked="isGroupAllSelected('ollama')"
|
||||
:indeterminate="isGroupIndeterminate('ollama')"
|
||||
@change="toggleGroup('ollama', ($event.target as HTMLInputElement).checked)"
|
||||
/>
|
||||
<span class="group-label">Ollama</span>
|
||||
<span class="group-count">({{ ollamaModels.length }})</span>
|
||||
</label>
|
||||
<span class="group-note">auto-synced with Models view</span>
|
||||
</div>
|
||||
<div class="model-list">
|
||||
<label v-for="m in ollamaModels" :key="m.id" class="model-item">
|
||||
<input type="checkbox" :value="m.id" v-model="selectedModels" />
|
||||
<span class="model-name">{{ m.name }}</span>
|
||||
<span v-if="m.size_mb" class="model-meta">{{ formatMb(m.size_mb) }}</span>
|
||||
</label>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- cf-text group -->
|
||||
<div class="picker-group" v-if="cftextModels.length">
|
||||
<div class="group-header">
|
||||
<label class="group-check">
|
||||
<input
|
||||
type="checkbox"
|
||||
:checked="isGroupAllSelected('cf-text')"
|
||||
:indeterminate="isGroupIndeterminate('cf-text')"
|
||||
@change="toggleGroup('cf-text', ($event.target as HTMLInputElement).checked)"
|
||||
/>
|
||||
<span class="group-label">cf-text (cf-orch)</span>
|
||||
<span class="group-count">({{ cftextModels.length }})</span>
|
||||
</label>
|
||||
<span class="group-note">GGUFs via coordinator — enable cf-orch below</span>
|
||||
</div>
|
||||
<div class="model-list">
|
||||
<label v-for="m in cftextModels" :key="m.id" class="model-item">
|
||||
<input type="checkbox" :value="m.id" v-model="selectedModels" />
|
||||
<span class="model-name">{{ m.name }}</span>
|
||||
<span v-if="m.vram_mb" class="model-meta">{{ formatMb(m.vram_mb) }} VRAM</span>
|
||||
</label>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div v-if="!ollamaModels.length && !cftextModels.length" class="picker-empty">
|
||||
No models available — check Ollama and cf-orch connections.
|
||||
</div>
|
||||
|
||||
</template>
|
||||
</div>
|
||||
</details>
|
||||
|
||||
<!-- Options panel -->
|
||||
<details class="options-panel">
|
||||
<summary class="picker-summary">
|
||||
<span class="picker-title">⚙️ Options</span>
|
||||
</summary>
|
||||
<div class="options-body">
|
||||
<label class="option-row">
|
||||
<input type="checkbox" v-model="useCforch" :disabled="running" />
|
||||
<span class="option-label">Use cf-orch backend</span>
|
||||
<span class="option-hint">Routes generation through cf-text instead of ollama</span>
|
||||
</label>
|
||||
<label class="option-row" :class="{ dimmed: !useCforch }">
|
||||
<span class="option-label">Max VRAM (MB)</span>
|
||||
<input
|
||||
type="number"
|
||||
v-model.number="maxVram"
|
||||
:disabled="running || !useCforch"
|
||||
min="1024"
|
||||
max="24576"
|
||||
step="512"
|
||||
class="option-number"
|
||||
/>
|
||||
<span class="option-hint">Skip models exceeding this VRAM limit</span>
|
||||
</label>
|
||||
<label class="option-row">
|
||||
<span class="option-label">Parallel workers</span>
|
||||
<input
|
||||
type="number"
|
||||
v-model.number="workers"
|
||||
:disabled="running"
|
||||
min="1"
|
||||
max="16"
|
||||
step="1"
|
||||
class="option-number"
|
||||
/>
|
||||
<span class="option-hint">Models to score simultaneously (1 = sequential)</span>
|
||||
</label>
|
||||
<label class="option-row">
|
||||
<input type="checkbox" v-model="includeLarge" :disabled="running" />
|
||||
<span class="option-label">Include large models (30B+)</span>
|
||||
<span class="option-hint">Off by default — these take much longer</span>
|
||||
</label>
|
||||
</div>
|
||||
</details>
|
||||
|
||||
</div>
|
||||
|
||||
<!-- ── Run controls ──────────────────────────────────────────────────── -->
|
||||
<div class="run-bar">
|
||||
<button class="btn-run" :disabled="running || selectedCount === 0" @click="startBenchmark">
|
||||
{{ running ? '⏳ Running…' : results.length ? '🔄 Re-run' : '▶ Run Benchmark' }}
|
||||
</button>
|
||||
<button v-if="running" class="btn-cancel" @click="cancelBenchmark">✕ Cancel</button>
|
||||
<span v-if="selectedCount === 0 && !running" class="run-hint">Select at least one model above</span>
|
||||
</div>
|
||||
|
||||
<!-- ── Progress log ──────────────────────────────────────────────────── -->
|
||||
<div v-if="runLog.length" class="run-log">
|
||||
<div class="run-log-header">
|
||||
<span class="run-log-title">Run log</span>
|
||||
<button class="btn-clear" @click="runLog = []">Clear</button>
|
||||
</div>
|
||||
<pre class="run-log-body" ref="logEl">{{ runLog.join('\n') }}</pre>
|
||||
</div>
|
||||
|
||||
<!-- ── Past runs picker ─────────────────────────────────────────────── -->
|
||||
<div class="history-bar" v-if="pastRuns.length">
|
||||
<label class="history-label">📂 Past runs:</label>
|
||||
<select class="history-select" v-model="selectedRun" @change="loadRun(selectedRun)">
|
||||
<option value="">— select a past run —</option>
|
||||
<option v-for="r in pastRuns" :key="r.filename" :value="r.filename">
|
||||
{{ r.date }} · {{ r.model_count }} model{{ r.model_count !== 1 ? 's' : '' }} · top {{ r.top_score }}/100
|
||||
</option>
|
||||
</select>
|
||||
</div>
|
||||
|
||||
<!-- ── Results table ─────────────────────────────────────────────────── -->
|
||||
<div v-if="results.length" class="results-section">
|
||||
<div class="results-header">
|
||||
<h2 class="results-title">Rankings</h2>
|
||||
<button
|
||||
class="btn-corrections"
|
||||
:disabled="sendingCorrections"
|
||||
@click="sendToCorrections"
|
||||
title="Push all outputs from this run into the Corrections review queue"
|
||||
>
|
||||
{{ sendingCorrections ? '⏳ Sending…' : correctionsMsg || '✍️ Send to Corrections' }}
|
||||
</button>
|
||||
</div>
|
||||
<div class="results-table-wrap">
|
||||
<table class="results-table">
|
||||
<thead>
|
||||
<tr>
|
||||
<th>Rank</th>
|
||||
<th>Model</th>
|
||||
<th>Score</th>
|
||||
<th>Latency</th>
|
||||
<th title="Em-dash count">—</th>
|
||||
<th title="Filler phrase hits">Fillers</th>
|
||||
<th title="Semicolons">;</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr
|
||||
v-for="(r, i) in results"
|
||||
:key="r.model_id"
|
||||
class="result-row"
|
||||
:class="{ 'top-row': i === 0 }"
|
||||
@click="toggleExpanded(r.model_id)"
|
||||
>
|
||||
<td class="rank-cell">{{ medal(i) }}</td>
|
||||
<td class="model-cell">
|
||||
<span class="model-name-text">{{ r.model_id }}</span>
|
||||
</td>
|
||||
<td class="score-cell">
|
||||
<span class="score-pill" :style="scorePillStyle(r.avg_score)">
|
||||
{{ r.avg_score.toFixed(0) }}
|
||||
</span>
|
||||
</td>
|
||||
<td class="latency-cell">{{ formatLatency(r.avg_latency_ms) }}</td>
|
||||
<td class="violation-cell" :class="{ 'has-violation': r.total_em_dashes > 0 }">
|
||||
{{ r.total_em_dashes }}
|
||||
</td>
|
||||
<td class="violation-cell" :class="{ 'has-violation': r.total_filler_hits > 0 }">
|
||||
{{ r.total_filler_hits }}
|
||||
</td>
|
||||
<td class="violation-cell" :class="{ 'has-violation': r.total_semicolons > 0 }">
|
||||
{{ r.total_semicolons }}
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
|
||||
<!-- Expandable sample outputs -->
|
||||
<div v-for="r in results" :key="'exp-' + r.model_id">
|
||||
<div v-if="expandedModels.has(r.model_id)" class="sample-outputs">
|
||||
<div class="sample-header">
|
||||
<strong>{{ r.model_id }}</strong>
|
||||
<button class="btn-collapse" @click="toggleExpanded(r.model_id)">✕ Close</button>
|
||||
</div>
|
||||
<div v-for="pr in r.prompt_results" :key="pr.tag" class="sample-prompt">
|
||||
<div class="sample-tag">
|
||||
<span class="tag-name">{{ pr.tag }}</span>
|
||||
<span class="tag-score">{{ pr.score.toFixed(0) }}/100</span>
|
||||
<span class="tag-latency">{{ formatLatency(pr.latency_ms) }}</span>
|
||||
</div>
|
||||
<pre class="sample-text">{{ pr.output || '(no output)' }}</pre>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</template>
|
||||
|
||||
<script setup lang="ts">
|
||||
import { ref, computed, onMounted, nextTick, watch } from 'vue'
|
||||
|
||||
// ── Types ───────────────────────────────────────────────────────────────────
|
||||
|
||||
interface VoiceModel {
|
||||
id: string
|
||||
name: string
|
||||
source: 'ollama' | 'cf-text'
|
||||
size_mb?: number | null
|
||||
vram_mb?: number | null
|
||||
description?: string
|
||||
}
|
||||
|
||||
interface PromptResult {
|
||||
tag: string
|
||||
output: string
|
||||
score: number
|
||||
latency_ms: number
|
||||
signals: Record<string, unknown>
|
||||
}
|
||||
|
||||
interface ModelResult {
|
||||
model_id: string
|
||||
avg_score: number
|
||||
avg_latency_ms: number
|
||||
total_filler_hits: number
|
||||
total_em_dashes: number
|
||||
total_semicolons: number
|
||||
prompt_results: PromptResult[]
|
||||
}
|
||||
|
||||
interface PastRun {
|
||||
filename: string
|
||||
date: string
|
||||
model_count: number
|
||||
top_score: number
|
||||
}
|
||||
|
||||
// ── State ───────────────────────────────────────────────────────────────────
|
||||
|
||||
const ollamaModels = ref<VoiceModel[]>([])
|
||||
const cftextModels = ref<VoiceModel[]>([])
|
||||
const selectedModels = ref<string[]>([])
|
||||
const modelsLoading = ref(false)
|
||||
const loadError = ref('')
|
||||
|
||||
const useCforch = ref(false)
|
||||
const maxVram = ref(7200)
|
||||
const workers = ref(1)
|
||||
const includeLarge = ref(false)
|
||||
|
||||
const running = ref(false)
|
||||
const runLog = ref<string[]>([])
|
||||
const logEl = ref<HTMLPreElement | null>(null)
|
||||
|
||||
const results = ref<ModelResult[]>([])
|
||||
const pastRuns = ref<PastRun[]>([])
|
||||
const selectedRun = ref('')
|
||||
const expandedModels = ref(new Set<string>())
|
||||
const sendingCorrections = ref(false)
|
||||
const correctionsMsg = ref('')
|
||||
|
||||
// ── Computed ─────────────────────────────────────────────────────────────────
|
||||
|
||||
const selectedCount = computed(() => selectedModels.value.length)
|
||||
|
||||
function isGroupAllSelected(source: string): boolean {
|
||||
const group = source === 'ollama' ? ollamaModels.value : cftextModels.value
|
||||
return group.length > 0 && group.every(m => selectedModels.value.includes(m.id))
|
||||
}
|
||||
|
||||
function isGroupIndeterminate(source: string): boolean {
|
||||
const group = source === 'ollama' ? ollamaModels.value : cftextModels.value
|
||||
const count = group.filter(m => selectedModels.value.includes(m.id)).length
|
||||
return count > 0 && count < group.length
|
||||
}
|
||||
|
||||
// ── Actions ──────────────────────────────────────────────────────────────────
|
||||
|
||||
async function loadModels() {
|
||||
modelsLoading.value = true
|
||||
loadError.value = ''
|
||||
try {
|
||||
const resp = await fetch('/api/voice/models')
|
||||
if (!resp.ok) throw new Error(`HTTP ${resp.status}`)
|
||||
const data = await resp.json()
|
||||
ollamaModels.value = data.ollama ?? []
|
||||
cftextModels.value = data.cf_text ?? []
|
||||
} catch (e: unknown) {
|
||||
loadError.value = `Failed to load models: ${e instanceof Error ? e.message : String(e)}`
|
||||
} finally {
|
||||
modelsLoading.value = false
|
||||
}
|
||||
}
|
||||
|
||||
async function loadPastRuns() {
|
||||
try {
|
||||
const resp = await fetch('/api/voice/results')
|
||||
if (resp.ok) pastRuns.value = await resp.json()
|
||||
} catch { /* non-fatal */ }
|
||||
}
|
||||
|
||||
async function loadRun(filename: string) {
|
||||
if (!filename) return
|
||||
try {
|
||||
const resp = await fetch(`/api/voice/results/${filename}`)
|
||||
if (!resp.ok) throw new Error(`HTTP ${resp.status}`)
|
||||
results.value = await resp.json()
|
||||
expandedModels.value.clear()
|
||||
} catch (e: unknown) {
|
||||
runLog.value.push(`[error] Failed to load ${filename}: ${e instanceof Error ? e.message : String(e)}`)
|
||||
}
|
||||
}
|
||||
|
||||
function toggleGroup(source: string, checked: boolean) {
|
||||
const group = source === 'ollama' ? ollamaModels.value : cftextModels.value
|
||||
const ids = group.map(m => m.id)
|
||||
if (checked) {
|
||||
const newSet = new Set([...selectedModels.value, ...ids])
|
||||
selectedModels.value = [...newSet]
|
||||
} else {
|
||||
selectedModels.value = selectedModels.value.filter(id => !ids.includes(id))
|
||||
}
|
||||
}
|
||||
|
||||
function toggleExpanded(modelId: string) {
|
||||
if (expandedModels.value.has(modelId)) {
|
||||
expandedModels.value.delete(modelId)
|
||||
} else {
|
||||
expandedModels.value.add(modelId)
|
||||
}
|
||||
expandedModels.value = new Set(expandedModels.value)
|
||||
}
|
||||
|
||||
function startBenchmark() {
|
||||
if (running.value || selectedCount.value === 0) return
|
||||
running.value = true
|
||||
runLog.value = []
|
||||
results.value = []
|
||||
expandedModels.value.clear()
|
||||
|
||||
const params = new URLSearchParams({
|
||||
models: selectedModels.value.join(','),
|
||||
use_cforch: String(useCforch.value),
|
||||
max_vram: String(maxVram.value),
|
||||
workers: String(workers.value),
|
||||
include_large: String(includeLarge.value),
|
||||
})
|
||||
|
||||
const es = new EventSource(`/api/voice/run?${params}`)
|
||||
|
||||
es.onmessage = async (ev) => {
|
||||
try {
|
||||
const msg = JSON.parse(ev.data)
|
||||
if (msg.type === 'progress') {
|
||||
runLog.value.push(msg.message)
|
||||
await nextTick()
|
||||
if (logEl.value) logEl.value.scrollTop = logEl.value.scrollHeight
|
||||
} else if (msg.type === 'result') {
|
||||
results.value = msg.results ?? []
|
||||
await loadPastRuns()
|
||||
} else if (msg.type === 'complete') {
|
||||
running.value = false
|
||||
es.close()
|
||||
} else if (msg.type === 'error') {
|
||||
runLog.value.push(`[error] ${msg.message}`)
|
||||
running.value = false
|
||||
es.close()
|
||||
}
|
||||
} catch { /* ignore parse errors */ }
|
||||
}
|
||||
|
||||
es.onerror = () => {
|
||||
if (running.value) {
|
||||
runLog.value.push('[error] Connection lost')
|
||||
running.value = false
|
||||
}
|
||||
es.close()
|
||||
}
|
||||
}
|
||||
|
||||
async function cancelBenchmark() {
|
||||
try {
|
||||
await fetch('/api/voice/cancel', { method: 'POST' })
|
||||
} finally {
|
||||
running.value = false
|
||||
runLog.value.push('[cancelled]')
|
||||
}
|
||||
}
|
||||
|
||||
async function sendToCorrections() {
|
||||
if (!selectedRun.value || sendingCorrections.value) return
|
||||
sendingCorrections.value = true
|
||||
correctionsMsg.value = ''
|
||||
try {
|
||||
const resp = await fetch('/api/voice/send-to-corrections', {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({ filename: selectedRun.value, model_ids: [] }),
|
||||
})
|
||||
if (!resp.ok) throw new Error(`HTTP ${resp.status}`)
|
||||
const data = await resp.json()
|
||||
correctionsMsg.value = `✓ ${data.imported} added to Corrections`
|
||||
} catch (e: unknown) {
|
||||
correctionsMsg.value = `Error: ${e instanceof Error ? e.message : String(e)}`
|
||||
} finally {
|
||||
sendingCorrections.value = false
|
||||
}
|
||||
}
|
||||
|
||||
// ── Formatting helpers ────────────────────────────────────────────────────────
|
||||
|
||||
function formatMb(mb: number): string {
|
||||
return mb >= 1024 ? `${(mb / 1024).toFixed(1)} GB` : `${mb} MB`
|
||||
}
|
||||
|
||||
function formatLatency(ms: number): string {
|
||||
return ms >= 1000 ? `${(ms / 1000).toFixed(1)}s` : `${Math.round(ms)}ms`
|
||||
}
|
||||
|
||||
function medal(index: number): string {
|
||||
return ['🥇', '🥈', '🥉'][index] ?? `#${index + 1}`
|
||||
}
|
||||
|
||||
function scorePillStyle(score: number): Record<string, string> {
|
||||
const hue = Math.round((score / 100) * 120) // 0=red, 120=green
|
||||
return {
|
||||
background: `hsl(${hue} 60% 88%)`,
|
||||
color: `hsl(${hue} 60% 28%)`,
|
||||
}
|
||||
}
|
||||
|
||||
// ── Lifecycle ─────────────────────────────────────────────────────────────────
|
||||
|
||||
// Auto-enable cf-orch when cf-text models are selected
|
||||
watch(selectedModels, (ids) => {
|
||||
const hasCftext = ids.some(id => cftextModels.value.find(m => m.id === id))
|
||||
if (hasCftext) useCforch.value = true
|
||||
})
|
||||
|
||||
onMounted(async () => {
|
||||
await Promise.all([loadModels(), loadPastRuns()])
|
||||
// Auto-load the latest results if any exist
|
||||
if (pastRuns.value.length) {
|
||||
selectedRun.value = pastRuns.value[0].filename
|
||||
await loadRun(pastRuns.value[0].filename)
|
||||
}
|
||||
})
|
||||
</script>
|
||||
|
||||
<style scoped>
|
||||
.voice-tab {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 1rem;
|
||||
padding: 1rem 0;
|
||||
}
|
||||
|
||||
/* ── Controls ─────────────────────────────────────────────────────────────── */
|
||||
|
||||
.voice-controls {
|
||||
display: flex;
|
||||
flex-wrap: wrap;
|
||||
gap: 0.75rem;
|
||||
align-items: flex-start;
|
||||
}
|
||||
|
||||
.model-picker,
|
||||
.options-panel {
|
||||
flex: 1;
|
||||
min-width: 280px;
|
||||
border: 1px solid var(--color-border, #d0d7e8);
|
||||
border-radius: 0.5rem;
|
||||
background: var(--color-surface, #f4f7fc);
|
||||
overflow: hidden;
|
||||
}
|
||||
|
||||
.picker-summary {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 0.5rem;
|
||||
padding: 0.65rem 0.85rem;
|
||||
cursor: pointer;
|
||||
user-select: none;
|
||||
font-size: 0.9rem;
|
||||
font-weight: 600;
|
||||
list-style: none;
|
||||
}
|
||||
|
||||
.picker-summary::-webkit-details-marker { display: none; }
|
||||
|
||||
.picker-title { flex: 1; color: var(--color-text, #1a2338); }
|
||||
.picker-badge {
|
||||
background: var(--app-primary, #2A6080);
|
||||
color: #fff;
|
||||
border-radius: 9999px;
|
||||
padding: 0.1rem 0.5rem;
|
||||
font-size: 0.72rem;
|
||||
font-weight: 700;
|
||||
}
|
||||
|
||||
.btn-refresh {
|
||||
border: none;
|
||||
background: transparent;
|
||||
cursor: pointer;
|
||||
font-size: 0.85rem;
|
||||
padding: 0.1rem 0.25rem;
|
||||
border-radius: 0.25rem;
|
||||
color: var(--color-text-secondary, #6b7a99);
|
||||
}
|
||||
.btn-refresh:hover { background: var(--color-border, #d0d7e8); }
|
||||
.btn-refresh:disabled { opacity: 0.5; cursor: not-allowed; }
|
||||
|
||||
.picker-body,
|
||||
.options-body {
|
||||
padding: 0.75rem;
|
||||
border-top: 1px solid var(--color-border, #d0d7e8);
|
||||
}
|
||||
|
||||
.picker-loading, .picker-empty {
|
||||
color: var(--color-text-secondary, #6b7a99);
|
||||
font-size: 0.85rem;
|
||||
padding: 0.25rem 0;
|
||||
}
|
||||
|
||||
.picker-error {
|
||||
color: #b91c1c;
|
||||
font-size: 0.85rem;
|
||||
}
|
||||
|
||||
/* ── Model groups ──────────────────────────────────────────────────────────── */
|
||||
|
||||
.picker-group {
|
||||
margin-bottom: 0.75rem;
|
||||
}
|
||||
.picker-group:last-child { margin-bottom: 0; }
|
||||
|
||||
.group-header {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 0.5rem;
|
||||
margin-bottom: 0.4rem;
|
||||
}
|
||||
|
||||
.group-check {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 0.35rem;
|
||||
font-size: 0.85rem;
|
||||
font-weight: 600;
|
||||
cursor: pointer;
|
||||
color: var(--color-text, #1a2338);
|
||||
}
|
||||
|
||||
.group-count {
|
||||
color: var(--color-text-secondary, #6b7a99);
|
||||
font-weight: 400;
|
||||
font-size: 0.8rem;
|
||||
}
|
||||
|
||||
.group-note {
|
||||
margin-left: auto;
|
||||
font-size: 0.72rem;
|
||||
color: var(--color-text-secondary, #6b7a99);
|
||||
font-style: italic;
|
||||
}
|
||||
|
||||
.model-list {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 0.2rem;
|
||||
padding-left: 1.25rem;
|
||||
max-height: 220px;
|
||||
overflow-y: auto;
|
||||
}
|
||||
|
||||
.model-item {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 0.4rem;
|
||||
font-size: 0.82rem;
|
||||
cursor: pointer;
|
||||
padding: 0.15rem 0;
|
||||
}
|
||||
|
||||
.model-name { flex: 1; font-family: var(--font-mono, monospace); }
|
||||
|
||||
.model-meta {
|
||||
font-size: 0.72rem;
|
||||
color: var(--color-text-secondary, #6b7a99);
|
||||
}
|
||||
|
||||
/* ── Options ──────────────────────────────────────────────────────────────── */
|
||||
|
||||
.option-row {
|
||||
display: flex;
|
||||
align-items: flex-start;
|
||||
gap: 0.5rem;
|
||||
padding: 0.35rem 0;
|
||||
cursor: pointer;
|
||||
font-size: 0.85rem;
|
||||
}
|
||||
|
||||
.option-label { font-weight: 500; white-space: nowrap; }
|
||||
|
||||
.option-hint {
|
||||
flex: 1;
|
||||
font-size: 0.75rem;
|
||||
color: var(--color-text-secondary, #6b7a99);
|
||||
margin-left: auto;
|
||||
text-align: right;
|
||||
}
|
||||
|
||||
.option-number {
|
||||
width: 90px;
|
||||
padding: 0.2rem 0.4rem;
|
||||
border: 1px solid var(--color-border, #d0d7e8);
|
||||
border-radius: 0.25rem;
|
||||
font-size: 0.85rem;
|
||||
background: var(--color-bg, #fff);
|
||||
color: var(--color-text, #1a2338);
|
||||
}
|
||||
|
||||
.option-row.dimmed { opacity: 0.45; pointer-events: none; }
|
||||
|
||||
/* ── Run bar ──────────────────────────────────────────────────────────────── */
|
||||
|
||||
.run-bar {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 0.65rem;
|
||||
}
|
||||
|
||||
.btn-run {
|
||||
padding: 0.5rem 1.25rem;
|
||||
border: none;
|
||||
border-radius: 0.375rem;
|
||||
background: var(--app-primary, #2A6080);
|
||||
color: #fff;
|
||||
font-size: 0.9rem;
|
||||
font-weight: 600;
|
||||
cursor: pointer;
|
||||
transition: background 0.15s;
|
||||
}
|
||||
.btn-run:hover:not(:disabled) { background: color-mix(in srgb, var(--app-primary, #2A6080) 80%, #000); }
|
||||
.btn-run:disabled { opacity: 0.5; cursor: not-allowed; }
|
||||
|
||||
.btn-cancel {
|
||||
padding: 0.5rem 0.9rem;
|
||||
border: 1px solid #f85149;
|
||||
border-radius: 0.375rem;
|
||||
background: transparent;
|
||||
color: #b91c1c;
|
||||
font-size: 0.85rem;
|
||||
cursor: pointer;
|
||||
transition: background 0.15s;
|
||||
}
|
||||
.btn-cancel:hover { background: #fee2e2; }
|
||||
|
||||
.run-hint {
|
||||
font-size: 0.8rem;
|
||||
color: var(--color-text-secondary, #6b7a99);
|
||||
}
|
||||
|
||||
/* ── Run log ──────────────────────────────────────────────────────────────── */
|
||||
|
||||
.run-log {
|
||||
border: 1px solid var(--color-border, #d0d7e8);
|
||||
border-radius: 0.5rem;
|
||||
overflow: hidden;
|
||||
}
|
||||
|
||||
.run-log-header {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: space-between;
|
||||
padding: 0.4rem 0.75rem;
|
||||
background: var(--color-surface, #f4f7fc);
|
||||
border-bottom: 1px solid var(--color-border, #d0d7e8);
|
||||
font-size: 0.8rem;
|
||||
font-weight: 600;
|
||||
color: var(--color-text-secondary, #6b7a99);
|
||||
}
|
||||
|
||||
.run-log-title { text-transform: uppercase; letter-spacing: 0.05em; }
|
||||
|
||||
.btn-clear {
|
||||
border: none;
|
||||
background: transparent;
|
||||
font-size: 0.75rem;
|
||||
color: var(--color-text-secondary, #6b7a99);
|
||||
cursor: pointer;
|
||||
padding: 0.1rem 0.3rem;
|
||||
border-radius: 0.25rem;
|
||||
}
|
||||
.btn-clear:hover { background: var(--color-border, #d0d7e8); }
|
||||
|
||||
.run-log-body {
|
||||
margin: 0;
|
||||
padding: 0.65rem 0.85rem;
|
||||
font-size: 0.78rem;
|
||||
font-family: var(--font-mono, monospace);
|
||||
white-space: pre-wrap;
|
||||
word-break: break-all;
|
||||
max-height: 260px;
|
||||
overflow-y: auto;
|
||||
background: var(--color-bg, #fff);
|
||||
color: var(--color-text, #1a2338);
|
||||
}
|
||||
|
||||
/* ── History bar ──────────────────────────────────────────────────────────── */
|
||||
|
||||
.history-bar {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 0.6rem;
|
||||
font-size: 0.85rem;
|
||||
}
|
||||
|
||||
.history-label { font-weight: 500; white-space: nowrap; }
|
||||
|
||||
.history-select {
|
||||
flex: 1;
|
||||
padding: 0.3rem 0.5rem;
|
||||
border: 1px solid var(--color-border, #d0d7e8);
|
||||
border-radius: 0.375rem;
|
||||
background: var(--color-surface, #f4f7fc);
|
||||
color: var(--color-text, #1a2338);
|
||||
font-size: 0.85rem;
|
||||
}
|
||||
|
||||
/* ── Results table ────────────────────────────────────────────────────────── */
|
||||
|
||||
.results-section { display: flex; flex-direction: column; gap: 0.75rem; }
|
||||
|
||||
.results-header {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: space-between;
|
||||
gap: 0.75rem;
|
||||
}
|
||||
|
||||
.results-title {
|
||||
font-size: 1rem;
|
||||
font-weight: 700;
|
||||
color: var(--color-text, #1a2338);
|
||||
margin: 0;
|
||||
}
|
||||
|
||||
.btn-corrections {
|
||||
padding: 0.4rem 0.9rem;
|
||||
border: 1px solid var(--app-primary, #2A6080);
|
||||
border-radius: 0.375rem;
|
||||
background: transparent;
|
||||
color: var(--app-primary, #2A6080);
|
||||
font-size: 0.83rem;
|
||||
font-weight: 600;
|
||||
cursor: pointer;
|
||||
white-space: nowrap;
|
||||
transition: background 0.15s, color 0.15s;
|
||||
}
|
||||
.btn-corrections:hover:not(:disabled) {
|
||||
background: var(--app-primary, #2A6080);
|
||||
color: #fff;
|
||||
}
|
||||
.btn-corrections:disabled { opacity: 0.55; cursor: not-allowed; }
|
||||
|
||||
.results-table-wrap {
|
||||
overflow-x: auto;
|
||||
border: 1px solid var(--color-border, #d0d7e8);
|
||||
border-radius: 0.5rem;
|
||||
}
|
||||
|
||||
.results-table {
|
||||
width: 100%;
|
||||
border-collapse: collapse;
|
||||
font-size: 0.85rem;
|
||||
}
|
||||
|
||||
.results-table th {
|
||||
padding: 0.5rem 0.75rem;
|
||||
text-align: left;
|
||||
background: var(--color-surface, #f4f7fc);
|
||||
border-bottom: 1px solid var(--color-border, #d0d7e8);
|
||||
font-size: 0.78rem;
|
||||
font-weight: 700;
|
||||
text-transform: uppercase;
|
||||
letter-spacing: 0.04em;
|
||||
color: var(--color-text-secondary, #6b7a99);
|
||||
white-space: nowrap;
|
||||
}
|
||||
|
||||
.result-row {
|
||||
cursor: pointer;
|
||||
transition: background 0.1s;
|
||||
}
|
||||
.result-row:hover { background: color-mix(in srgb, var(--app-primary, #2A6080) 6%, transparent); }
|
||||
.result-row.top-row { font-weight: 600; }
|
||||
|
||||
.result-row td {
|
||||
padding: 0.5rem 0.75rem;
|
||||
border-bottom: 1px solid var(--color-border, #d0d7e8);
|
||||
}
|
||||
.result-row:last-child td { border-bottom: none; }
|
||||
|
||||
.rank-cell { width: 2.5rem; text-align: center; font-size: 1.1rem; }
|
||||
.model-cell { font-family: var(--font-mono, monospace); word-break: break-all; }
|
||||
.score-cell { width: 5rem; text-align: center; }
|
||||
.latency-cell { width: 5rem; text-align: right; color: var(--color-text-secondary, #6b7a99); }
|
||||
.violation-cell { width: 4rem; text-align: center; color: var(--color-text-secondary, #6b7a99); }
|
||||
.violation-cell.has-violation { color: #b91c1c; font-weight: 700; }
|
||||
|
||||
.score-pill {
|
||||
display: inline-block;
|
||||
padding: 0.15rem 0.55rem;
|
||||
border-radius: 9999px;
|
||||
font-weight: 700;
|
||||
font-size: 0.82rem;
|
||||
}
|
||||
|
||||
/* ── Sample outputs ───────────────────────────────────────────────────────── */
|
||||
|
||||
.sample-outputs {
|
||||
border: 1px solid var(--color-border, #d0d7e8);
|
||||
border-radius: 0.5rem;
|
||||
overflow: hidden;
|
||||
}
|
||||
|
||||
.sample-header {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: space-between;
|
||||
padding: 0.5rem 0.85rem;
|
||||
background: var(--color-surface, #f4f7fc);
|
||||
border-bottom: 1px solid var(--color-border, #d0d7e8);
|
||||
font-size: 0.85rem;
|
||||
}
|
||||
|
||||
.btn-collapse {
|
||||
border: none;
|
||||
background: transparent;
|
||||
font-size: 0.78rem;
|
||||
color: var(--color-text-secondary, #6b7a99);
|
||||
cursor: pointer;
|
||||
}
|
||||
|
||||
.sample-prompt {
|
||||
padding: 0.65rem 0.85rem;
|
||||
border-bottom: 1px solid var(--color-border, #d0d7e8);
|
||||
}
|
||||
.sample-prompt:last-child { border-bottom: none; }
|
||||
|
||||
.sample-tag {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 0.5rem;
|
||||
margin-bottom: 0.35rem;
|
||||
font-size: 0.8rem;
|
||||
}
|
||||
|
||||
.tag-name { font-weight: 600; color: var(--color-text, #1a2338); }
|
||||
.tag-score { color: var(--app-primary, #2A6080); font-weight: 700; }
|
||||
.tag-latency { color: var(--color-text-secondary, #6b7a99); margin-left: auto; }
|
||||
|
||||
.sample-text {
|
||||
margin: 0;
|
||||
font-size: 0.82rem;
|
||||
white-space: pre-wrap;
|
||||
word-break: break-word;
|
||||
max-height: 200px;
|
||||
overflow-y: auto;
|
||||
background: var(--color-bg, #fff);
|
||||
border: 1px solid var(--color-border, #d0d7e8);
|
||||
border-radius: 0.35rem;
|
||||
padding: 0.5rem 0.65rem;
|
||||
color: var(--color-text, #1a2338);
|
||||
font-family: inherit;
|
||||
}
|
||||
|
||||
@media (max-width: 640px) {
|
||||
.voice-controls { flex-direction: column; }
|
||||
.model-picker, .options-panel { min-width: 0; }
|
||||
.option-hint { display: none; }
|
||||
.group-note { display: none; }
|
||||
}
|
||||
</style>
|
||||
Loading…
Reference in a new issue