Replace 149-line api.py (with inline helpers, JSONL utilities, and ad-hoc
router registrations) with a 57-line pure factory. All business logic was
already extracted to domain modules in B1-B7; this removes the dead code
and adds the /api/corrections/* prefix alongside the /api/sft/* backward-
compat alias. Smoke tests updated to cover the new /api/corrections/ingest
and /api/dashboard routes.
Replaces the ad-hoc _running_procs dict in api.py with a persistent,
inspectable SQLite job queue. Removes old /api/finetune/* routes and
_best_cuda_device from api.py. Adds /api/train/* routes (list, create,
get, cancel, run SSE, results). 16 new tests all passing.
- except clause in cancel_benchmark/cancel_finetune narrowed from Exception
to _subprocess.TimeoutExpired (C1)
- _cancelled_jobs.discard() called after registering new proc to prevent
a stale flag from a prior run masking errors (I2)
- local `import subprocess` removed from run_benchmark and
run_finetune_endpoint; all Popen calls updated to _subprocess.Popen (I1)
- test patch targets updated from subprocess.Popen to app.api._subprocess.Popen;
cancelled-event tests updated to set flag in proc.wait() side-effect so
the discard-on-new-run logic is exercised correctly
Adds POST /api/benchmark/cancel and POST /api/finetune/cancel endpoints
that terminate the running subprocess (kill on 3s timeout), and updates
the run generators to emit a cancelled SSE event instead of error when
the job was intentionally stopped.