circuitforge_core.vision.router now re-exports VisionRouter from the
standalone cf-vision repo. Existing imports unchanged; falls back to
a helpful ImportError stub if cf-vision is not installed.
Closes cf-core#36
Instead of splitting SQL on semicolons (fragile — semicolons appear inside
comments and string literals), use executescript() for correct tokenization.
On 'duplicate column name' error (caused by a prior partial run that
auto-committed some ALTER TABLE statements before crashing), strip the
already-applied ADD COLUMN statement from the script and retry. Limit
to 20 attempts to prevent infinite loops on genuinely broken SQL.
This replaces the earlier per-statement split approach which broke on
migration 004 comment text containing a semicolon inside a -- comment,
causing the remainder ('one row per...') to be treated as raw SQL.
SQLite's executescript() auto-commits each DDL statement individually.
If a migration crashes mid-run, prior ALTER TABLE statements are already
committed but the migration is never recorded as applied. On restart,
the runner re-runs the same file and hits 'duplicate column name' on
already-applied statements, breaking subsequent startups permanently.
Replace executescript() with per-statement execute() calls. 'Duplicate
column name' OperationalErrors are caught and logged as warnings so the
migration can complete and be marked as done. All other errors still
propagate normally.
Introduces circuitforge_core.config.license with validate_license() and
get_license_tier(). Both functions are safe to call when CF_LICENSE_KEY
is absent, returning free tier gracefully. Results are cached 30 min per
(key, product) pair. CF_LICENSE_URL env var overrides the default
Heimdall endpoint. Re-exports added to config.__init__. Existing
test_config.py moved into tests/test_config/ package to co-locate with
new test_license.py (10 tests; 204 total passing).
Adds make_feedback_router(repo, product, demo_mode_fn) which returns a
FastAPI APIRouter with GET /status and POST / endpoints. Handles Forgejo
label creation/reuse, issue body assembly (including repro steps for bugs),
demo mode gating, and FORGEJO_API_TOKEN presence checks. 12 tests covering
all status/submit paths, mock Forgejo interaction, and body content assertions.
Also adds fastapi>=0.110 and httpx>=0.27 to [dev] optional deps.
BREAKING CHANGE: circuitforge_core.resources is no longer available.
Import CFOrchClient from circuitforge_orch.client instead.
cf-orch CLI entry point is now in the circuitforge-orch package.
LLMRouter env-var auto-config:
- No llm.yaml required — auto-configures from ANTHROPIC_API_KEY,
OPENAI_API_KEY, or OLLAMA_HOST on first use
- Bare-metal self-hosters can run any CF product with just env vars
- Falls back to FileNotFoundError with actionable message only when
no env vars are set either
CFOrchClient auth:
- Reads CF_LICENSE_KEY env var (or explicit api_key param)
- Sends Authorization: Bearer <key> on all allocation/release requests
- Required for the hosted public coordinator; no-op for local deployments
HeimdallAuthMiddleware (new):
- FastAPI middleware for cf-orch coordinator
- Enabled by HEIMDALL_URL env var; self-hosted deployments skip it
- 5-min TTL cache (matching Kiwi cloud session) keeps Heimdall off the
per-allocation hot path
- /api/health exempt; free-tier keys rejected with 403 + reason
- 13 tests covering cache TTL, tier ranking, and middleware gating
- ProcessSpec: adopt (bool) and health_path (str, default /health) fields
- ServiceManager: adopt=True probes health_path before spawning; is_running()
uses health probe for adopt services rather than proc table + socket check
- _probe_health() helper: urllib GET on localhost:port+path, returns bool
- Agent /services/{service}/start: returns adopted=True when service was
already running; coordinator sets state=running immediately (no probe wait)
- ServiceInstance: health_path field (default /health)
- service_registry.upsert_instance(): health_path kwarg
- Probe loop uses inst.health_path instead of hardcoded /health
- coordinator allocate_service: looks up health_path from profile spec via
_get_health_path() and stores on ServiceInstance
- All GPU profiles (2/4/6/8/16/24 GB + cpu-16/32): ollama managed block
with adopt=true, health_path=/api/tags, port 11434
- 11 new tests
closes#15
- NodeStore: SQLite persistence for known agent nodes
(~/.local/share/circuitforge/cf-orch-nodes.db)
- upsert on every register(); prune_stale() for 30-day cleanup
- survives coordinator restarts — data readable by next process
- AgentSupervisor.restore_from_store(): reload known nodes on startup,
mark all offline; heartbeat loop brings back any that respond
- AgentSupervisor.register(): persists to NodeStore on every call
- cli.py coordinator: NodeStore wired in; restore_from_store() called
before uvicorn starts
- cli.py agent: one-shot registration replaced with persistent reconnect
loop (daemon thread, 30 s interval) — coordinator restart → nodes
reappear within one cycle with no manual intervention on agent hosts
- 16 new tests: NodeStore (8) + AgentSupervisor watchdog (8)
Coordinator now polls all 'starting' instances every 5 s via GET /health.
On 200: state → running. After 300 s without a healthy response: state →
stopped. Closes#10.
max_mb // 2 was too loose — Qwen2.5-3B needs ~5.9 GB on an 8 GB card
but the threshold only required 3.25 GB free, allowing Ollama to hold
4.5 GB while a load attempt was still dispatched (causing OOM crash).
- node_selector: can_fit = free_mb >= service_max_mb (was // 2)
- coordinator /start: same threshold fix + updated error message
- tests: two new node_selector tests pin the full-ceiling semantics;
updated stale docstring in coordinator app test
- apply_chat_template() returns BatchEncoding in transformers 5.x (not bare tensor);
extract .input_ids explicitly with fallback for 4.x compat
- Switch from deprecated torch_dtype= to dtype= in from_pretrained()
- ServiceRegistry: add sweep_expired_allocations() to remove stale TTL
allocations and transition instances to idle; add get_allocation() helper
- AgentSupervisor._run_idle_sweep: call sweep_expired_allocations() before
idle-timeout check so crashed-caller leaks are cleaned up each sweep tick
- schema._parse_managed: copy raw dict before extracting 'type' key instead
of mutating caller's dict with pop()
- app.release_allocation: validate allocation belongs to the given service
path param before releasing; return 404 if mismatch
- router._try_cf_orch_alloc: replace print() with logger.warning(); add
module-level logger = logging.getLogger(__name__)
- tests: add test_sweep_expired_allocations covering TTL expiry and idle
state transition
- CRITICAL: idle sweep now calls mark_stopped() after successful HTTP stop,
preventing repeated stop POSTs on every 3rd tick for the same instance
- CRITICAL: active_allocations() now filters by gpu_id to avoid marking wrong
instance idle on multi-GPU nodes when an allocation is released
- CRITICAL: VRAM pre-flight guard in ensure_service was dead code — added the
actual HTTPException(503) before the candidate loop
- IMPORTANT: register() now updates agent_url on re-registration if it changed,
so relocated agents are tracked correctly
- IMPORTANT: updated test_service_registry.py callers of active_allocations()
to pass the now-required gpu_id argument
Add idle_stop_after_s to ServiceProfile (default 0 = never stop).
Set 600s (10 min) timeout on vllm slot in all single-GPU profiles.
Backward compatible; non-vllm services inherit default 0 (no auto-stop).
Implements CFOrchClient with allocate() (sync contextmanager) and
allocate_async() (async contextmanager) for cf-orch GPU resource
allocation. Releases allocation on exit; ignores 404 on release;
raises RuntimeError on non-2xx allocation response. Exports
CFOrchClient and Allocation from circuitforge_core.resources.
Note: async test uses unittest.mock rather than httpretty — httpretty
only patches stdlib sockets and does not intercept httpx async (anyio)
transport.
Previously shutdown() only joined the scheduler loop thread. Batch
worker threads (which decrement _reserved_vram in their finally block)
could still be running when shutdown returned, leaving stale VRAM
accounting. Now snapshots active workers under lock and joins them all.
Snapshot-then-join pattern avoids holding the lock across blocking join
calls (which would deadlock since workers acquire the same lock on exit).
Before running a batch of tasks, the scheduler now requests a VRAM lease
from the cf-orch coordinator (POST /api/leases). The lease is held for the
full batch and released in the finally block so it's always cleaned up even
on error. Falls back gracefully if the coordinator is unreachable.
Adds coordinator_url and service_name params to TaskScheduler.__init__
and get_scheduler() so callers can override the default localhost:7700.
coordinator/app.py:
- Add POST /api/nodes — agents POST {node_id, agent_url} to self-register;
coordinator immediately polls the new agent for GPU info
- Add lifespan context manager that starts/stops AgentSupervisor heartbeat
loop (previously the loop was never started)
cli.py start:
- Add --node-id flag (default 'local')
- Pre-register the local agent URL (http://127.0.0.1:{agent_port}) so the
heartbeat loop can poll it immediately on startup
- Drop redundant lease_manager.register_gpu() call — supervisor.poll_agent()
now does this via the heartbeat after the agent responds
cli.py agent:
- Add --advertise-host flag for NATted/multi-homed nodes
- Fire registration POST to coordinator in a daemon thread (2s delay) so
uvicorn.run() can start binding immediately; no double uvicorn.run()
- dashboard.html: node-centric layout — GPU cards with VRAM bars and
sparklines, active leases table with TTL progress bars, service health
pill, auto-refreshes every 5s via fetch() against the local JSON API
- All dynamic content set via DOM textContent / createElementNS — no
innerHTML with user-sourced strings
- coordinator/app.py: serves dashboard.html at GET / (HTMLResponse,
excluded from OpenAPI schema); HTML read at import time from package dir
- test_dashboard_serves_html: verifies 200, content-type text/html,
and key route markers present