Commit graph

77 commits

Author SHA1 Message Date
67493048e2 feat(stt): add cf-stt module — FasterWhisperBackend + managed FastAPI app
Some checks are pending
CI / test (push) Waiting to run
Mirror / mirror (push) Waiting to run
- STTBackend Protocol + STTResult/STTSegment frozen dataclasses (base.py)
- MockSTTBackend for CI/tests (no GPU needed, CF_STT_MOCK=1)
- FasterWhisperBackend: loads model once, thread-safe, VRAM estimate by model size
- app.py: FastAPI service runnable as managed process by cf-orch
  POST /transcribe (multipart audio) → STTTranscribeResponse-compatible JSON
  GET  /health → {status, model, vram_mb}
- __init__.py: process-level singleton + transcribe() convenience fn
- pyproject.toml: stt-faster-whisper + stt-service optional dep groups
2026-04-08 22:14:46 -07:00
5766fa82ab refactor: replace vision stub with cf-vision shim (cf-core#36)
Some checks failed
CI / test (push) Has been cancelled
Mirror / mirror (push) Has been cancelled
circuitforge_core.vision.router now re-exports VisionRouter from the
standalone cf-vision repo. Existing imports unchanged; falls back to
a helpful ImportError stub if cf-vision is not installed.

Closes cf-core#36
2026-04-06 17:59:05 -07:00
48d33a78ef fix: migration runner resilient to partial-failure via retry-with-removal
Some checks are pending
CI / test (push) Waiting to run
Mirror / mirror (push) Waiting to run
Instead of splitting SQL on semicolons (fragile — semicolons appear inside
comments and string literals), use executescript() for correct tokenization.
On 'duplicate column name' error (caused by a prior partial run that
auto-committed some ALTER TABLE statements before crashing), strip the
already-applied ADD COLUMN statement from the script and retry.  Limit
to 20 attempts to prevent infinite loops on genuinely broken SQL.

This replaces the earlier per-statement split approach which broke on
migration 004 comment text containing a semicolon inside a -- comment,
causing the remainder ('one row per...') to be treated as raw SQL.
2026-04-05 22:39:12 -07:00
c9c4828387 fix: make migration runner resilient to partial-failure recovery
Some checks are pending
CI / test (push) Waiting to run
Mirror / mirror (push) Waiting to run
SQLite's executescript() auto-commits each DDL statement individually.
If a migration crashes mid-run, prior ALTER TABLE statements are already
committed but the migration is never recorded as applied.  On restart,
the runner re-runs the same file and hits 'duplicate column name' on
already-applied statements, breaking subsequent startups permanently.

Replace executescript() with per-statement execute() calls.  'Duplicate
column name' OperationalErrors are caught and logged as warnings so the
migration can complete and be marked as done.  All other errors still
propagate normally.
2026-04-05 22:23:29 -07:00
19a26e02a0 Merge pull request 'feat: re-export make_feedback_router from circuitforge_core.api (closes #30)' (#32) from feature/api-exports into main
Some checks are pending
CI / test (push) Waiting to run
Mirror / mirror (push) Waiting to run
2026-04-05 21:37:33 -07:00
3c9c765668 feat: re-export make_feedback_router from circuitforge_core.api (closes #30)
Some checks failed
CI / test (pull_request) Has been cancelled
2026-04-05 21:21:44 -07:00
bb2ed3e992 fix: parameterize bare dict type annotations in license module
Some checks failed
CI / test (pull_request) Has been cancelled
2026-04-05 21:19:10 -07:00
f3bc4ac605 feat: add CF_LICENSE_KEY validation via Heimdall (closes #26)
Introduces circuitforge_core.config.license with validate_license() and
get_license_tier(). Both functions are safe to call when CF_LICENSE_KEY
is absent, returning free tier gracefully. Results are cached 30 min per
(key, product) pair. CF_LICENSE_URL env var overrides the default
Heimdall endpoint. Re-exports added to config.__init__. Existing
test_config.py moved into tests/test_config/ package to co-locate with
new test_license.py (10 tests; 204 total passing).
2026-04-05 21:16:57 -07:00
f0a9ec5c37 fix: raise 502 on label creation failure; narrow subprocess exception scope 2026-04-05 17:36:52 -07:00
0a15ad9522 feat: add circuitforge_core.api.feedback — shared feedback router factory (closes #23)
Adds make_feedback_router(repo, product, demo_mode_fn) which returns a
FastAPI APIRouter with GET /status and POST / endpoints. Handles Forgejo
label creation/reuse, issue body assembly (including repro steps for bugs),
demo mode gating, and FORGEJO_API_TOKEN presence checks. 12 tests covering
all status/submit paths, mock Forgejo interaction, and body content assertions.
Also adds fastapi>=0.110 and httpx>=0.27 to [dev] optional deps.
2026-04-05 17:31:02 -07:00
c244260d1c feat!: strip resources/ from MIT core — moves to circuitforge-orch (v0.8.0)
BREAKING CHANGE: circuitforge_core.resources is no longer available.
Import CFOrchClient from circuitforge_orch.client instead.
cf-orch CLI entry point is now in the circuitforge-orch package.
2026-04-04 22:34:27 -07:00
2259382d0b refactor: replace coordinator-aware TaskScheduler with Protocol + LocalScheduler (MIT); update LLMRouter import path 2026-04-04 22:26:06 -07:00
090a86ce1b refactor: update LLMRouter lazy import — circuitforge_core.resources.client → circuitforge_orch.client 2026-04-04 22:16:17 -07:00
d16bc569cf chore: bump version to 0.7.0 — affiliates + preferences modules 2026-04-04 18:28:52 -07:00
fe19de3d9a feat: affiliates public API surface (__init__.py) 2026-04-04 18:27:45 -07:00
7837fbcad2 feat: affiliates router — wrap_url() with opt-out, BYOK, and CF env-var resolution 2026-04-04 18:20:21 -07:00
73cec07bd2 feat: affiliates disclosure — per-retailer tooltip copy + first-encounter banner constants 2026-04-04 18:14:58 -07:00
4c3f3a95a5 feat: affiliates programs — AffiliateProgram, registry, eBay EPN + Amazon Associates builders 2026-04-04 18:12:45 -07:00
d719ea2309 feat: preferences public helpers — get_user_preference / set_user_preference (closes #22 self-hosted) 2026-04-04 18:10:24 -07:00
0d9d030320 feat: preferences LocalFileStore — YAML-backed single-user preference store 2026-04-04 18:07:35 -07:00
9ee31a09c1 feat: preferences dot-path utilities (get_path, set_path) 2026-04-04 18:04:44 -07:00
e6cd3a2e96 chore: sync __version__ to 0.6.0 (matches pyproject.toml) 2026-04-03 16:48:11 -07:00
3deae056de feat: local-first LLM config + hosted coordinator auth
LLMRouter env-var auto-config:
- No llm.yaml required — auto-configures from ANTHROPIC_API_KEY,
  OPENAI_API_KEY, or OLLAMA_HOST on first use
- Bare-metal self-hosters can run any CF product with just env vars
- Falls back to FileNotFoundError with actionable message only when
  no env vars are set either

CFOrchClient auth:
- Reads CF_LICENSE_KEY env var (or explicit api_key param)
- Sends Authorization: Bearer <key> on all allocation/release requests
- Required for the hosted public coordinator; no-op for local deployments

HeimdallAuthMiddleware (new):
- FastAPI middleware for cf-orch coordinator
- Enabled by HEIMDALL_URL env var; self-hosted deployments skip it
- 5-min TTL cache (matching Kiwi cloud session) keeps Heimdall off the
  per-allocation hot path
- /api/health exempt; free-tier keys rejected with 403 + reason
- 13 tests covering cache TTL, tier ranking, and middleware gating
2026-04-03 08:32:15 -07:00
8d87ed4c9f feat: manage.py cross-platform product manager (closes #6)
- circuitforge_core.manage module — replaces bash-only manage.sh
  - config.py: ManageConfig from manage.toml (TOML via tomllib/tomli)
    app name, default_url, docker compose_file/project, native services
    Falls back to directory name when no manage.toml present
  - docker_mode.py: DockerManager wrapping 'docker compose' (v2 plugin)
    or 'docker-compose' (v1 fallback); docker_available() probe
    Commands: start, stop, restart, status, logs, build
  - native_mode.py: NativeManager with PID file process management
    platformdirs for platform-appropriate PID/log paths
    Windows-compatible log tailing (polling, no tail -f)
    Cross-platform kill: SIGTERM→SIGKILL on Unix, taskkill /F on Windows
  - cli.py: typer CLI — start/stop/restart/status/logs/build/open/install-shims
    Mode auto-detection: Docker available + compose file → docker; else native
    --mode docker|native|auto override
  - templates/manage.sh: bash shim (conda, venv, python3 detection)
  - templates/manage.ps1: PowerShell shim (same detection, Windows)
  - templates/manage.toml.example: annotated config template
  - __main__.py: python -m circuitforge_core.manage entry point

- pyproject.toml: manage extras group (platformdirs, typer)
  cf-manage console script; version bumped to 0.5.0

- 36 tests: config (6), docker_mode (9), native_mode (21)
2026-04-02 23:04:35 -07:00
7bb6b76bd5 feat: ollama adopt-if-running + health_path in ProcessSpec (#16)
- ProcessSpec: adopt (bool) and health_path (str, default /health) fields
- ServiceManager: adopt=True probes health_path before spawning; is_running()
  uses health probe for adopt services rather than proc table + socket check
- _probe_health() helper: urllib GET on localhost:port+path, returns bool
- Agent /services/{service}/start: returns adopted=True when service was
  already running; coordinator sets state=running immediately (no probe wait)
- ServiceInstance: health_path field (default /health)
- service_registry.upsert_instance(): health_path kwarg
- Probe loop uses inst.health_path instead of hardcoded /health
- coordinator allocate_service: looks up health_path from profile spec via
  _get_health_path() and stores on ServiceInstance
- All GPU profiles (2/4/6/8/16/24 GB + cpu-16/32): ollama managed block
  with adopt=true, health_path=/api/tags, port 11434
- 11 new tests
2026-04-02 22:09:42 -07:00
a54a530493 feat: agent watchdog — persist known nodes + auto-reconnect after coordinator restart
closes #15

- NodeStore: SQLite persistence for known agent nodes
  (~/.local/share/circuitforge/cf-orch-nodes.db)
  - upsert on every register(); prune_stale() for 30-day cleanup
  - survives coordinator restarts — data readable by next process

- AgentSupervisor.restore_from_store(): reload known nodes on startup,
  mark all offline; heartbeat loop brings back any that respond

- AgentSupervisor.register(): persists to NodeStore on every call

- cli.py coordinator: NodeStore wired in; restore_from_store() called
  before uvicorn starts

- cli.py agent: one-shot registration replaced with persistent reconnect
  loop (daemon thread, 30 s interval) — coordinator restart → nodes
  reappear within one cycle with no manual intervention on agent hosts

- 16 new tests: NodeStore (8) + AgentSupervisor watchdog (8)
2026-04-02 22:01:55 -07:00
cd9864b5e8 feat: hardware detection, cf-docuvision service, documents ingestion pipeline
Closes #5, #7, #8, #13

## hardware module (closes #5)
- HardwareSpec, LLMBackendConfig, LLMConfig dataclasses
- VramTier ladder (CPU / 2 / 4 / 6 / 8 / 16 / 24 GB) with select_tier()
- generate_profile() maps HardwareSpec → LLMConfig for llm.yaml generation
- detect_hardware() with nvidia-smi / rocm-smi / system_profiler / cpu fallback
- 31 tests across tiers, generator, and detect

## cf-docuvision service (closes #8)
- FastAPI service wrapping ByteDance/Dolphin-v2 (Qwen2.5-VL backbone)
- POST /extract: image_b64 or image_path + hint → ExtractResponse
- Lazy model loading; JSON-structured output with plain-text fallback
- ProcessSpec managed blocks added to all four GPU profiles (6/8/16/24 GB)
- 14 tests

## documents module (closes #7)
- StructuredDocument, Element, ParsedTable dataclasses (frozen, composable)
- DocuvisionClient: thin HTTP client for cf-docuvision POST /extract
- ingest(): primary cf-docuvision path → LLMRouter vision fallback → empty doc
- CF_DOCUVISION_URL env var for URL override
- 22 tests

## coordinator probe loop (closes #13)
- _run_instance_probe_loop: starting → running on 200; starting → stopped on timeout
- 4 async tests with CancelledError-based tick control
2026-04-02 18:53:25 -07:00
a7290c1240 feat(orch): background health probe loop — starting → running transition
Coordinator now polls all 'starting' instances every 5 s via GET /health.
On 200: state → running. After 300 s without a healthy response: state →
stopped. Closes #10.
2026-04-02 17:18:16 -07:00
bd132851ec fix(orch): tighten VRAM pre-flight to require full max_mb free (not half)
max_mb // 2 was too loose — Qwen2.5-3B needs ~5.9 GB on an 8 GB card
but the threshold only required 3.25 GB free, allowing Ollama to hold
4.5 GB while a load attempt was still dispatched (causing OOM crash).

- node_selector: can_fit = free_mb >= service_max_mb (was // 2)
- coordinator /start: same threshold fix + updated error message
- tests: two new node_selector tests pin the full-ceiling semantics;
  updated stale docstring in coordinator app test
2026-04-02 16:44:36 -07:00
2d095f0090 fix(llm-server): handle transformers 5.x BatchEncoding; use dtype kwarg
- apply_chat_template() returns BatchEncoding in transformers 5.x (not bare tensor);
  extract .input_ids explicitly with fallback for 4.x compat
- Switch from deprecated torch_dtype= to dtype= in from_pretrained()
2026-04-02 16:36:07 -07:00
c78341fc6f feat(orch): replace Ouro/vllm-Docker with generic HF inference server; add ProcessSpec
- Add circuitforge_core/resources/inference/llm_server.py: generic OpenAI-compatible
  FastAPI server for any HuggingFace causal LM (Phi-4-mini-instruct, Qwen2.5-3B-Instruct)
- Add service_manager.py + service_probe.py: ProcessSpec start/stop/is_running support
  (Popen-based; socket probe confirms readiness before marking running)
- Update all 4 public GPU profiles to use ProcessSpec→llm_server instead of Docker vllm:
  6gb (max_mb 5500), 8gb (max_mb 6500), 16gb/24gb (max_mb 9000)
- Model candidates: Phi-4-mini-instruct first (7.2GB), Qwen2.5-3B-Instruct fallback (5.8GB)
- Remove ouro_server.py (Ouro incompatible with transformers 5.x; vllm Docker also incompatible)
- Add 17 tests for ServiceManager ProcessSpec (start/stop/is_running/list/get_url)
2026-04-02 15:33:08 -07:00
27999925cf fix(orch): seed ServiceInstance on first allocate start 2026-04-02 14:22:55 -07:00
e58c3aea23 fix: TTL sweep, immutability, service-scoped release, logger in orch alloc
- ServiceRegistry: add sweep_expired_allocations() to remove stale TTL
  allocations and transition instances to idle; add get_allocation() helper
- AgentSupervisor._run_idle_sweep: call sweep_expired_allocations() before
  idle-timeout check so crashed-caller leaks are cleaned up each sweep tick
- schema._parse_managed: copy raw dict before extracting 'type' key instead
  of mutating caller's dict with pop()
- app.release_allocation: validate allocation belongs to the given service
  path param before releasing; return 404 if mismatch
- router._try_cf_orch_alloc: replace print() with logger.warning(); add
  module-level logger = logging.getLogger(__name__)
- tests: add test_sweep_expired_allocations covering TTL expiry and idle
  state transition
2026-04-02 12:55:38 -07:00
02806359af feat: add Services table to coordinator dashboard 2026-04-02 12:47:27 -07:00
a4ccaaf3e2 fix: address coordinator/idle-sweep quality issues from review
- CRITICAL: idle sweep now calls mark_stopped() after successful HTTP stop,
  preventing repeated stop POSTs on every 3rd tick for the same instance
- CRITICAL: active_allocations() now filters by gpu_id to avoid marking wrong
  instance idle on multi-GPU nodes when an allocation is released
- CRITICAL: VRAM pre-flight guard in ensure_service was dead code — added the
  actual HTTPException(503) before the candidate loop
- IMPORTANT: register() now updates agent_url on re-registration if it changed,
  so relocated agents are tracked correctly
- IMPORTANT: updated test_service_registry.py callers of active_allocations()
  to pass the now-required gpu_id argument
2026-04-02 12:45:31 -07:00
49ab9e4e88 feat: wire ServiceRegistry into coordinator allocate endpoints 2026-04-02 12:30:58 -07:00
c299482e0d feat: add idle sweep to AgentSupervisor 2026-04-02 12:30:28 -07:00
1e168ac636 feat(profiles): add idle_stop_after_s field; set 600s for vllm slot
Add idle_stop_after_s to ServiceProfile (default 0 = never stop).
Set 600s (10 min) timeout on vllm slot in all single-GPU profiles.
Backward compatible; non-vllm services inherit default 0 (no auto-stop).
2026-04-02 12:24:19 -07:00
9754f522d9 feat(orch): add ServiceRegistry — allocation tracking + idle state machine 2026-04-02 12:22:46 -07:00
17a24173f7 feat(llm): add cf_orch allocation support to LLMRouter backends 2026-04-02 12:19:17 -07:00
f741e6a80b fix(orch): hoist service-known check; capture resident_keys once in allocate 2026-04-02 11:45:48 -07:00
defaf39883 feat(core): add CFOrchClient sync+async context manager
Implements CFOrchClient with allocate() (sync contextmanager) and
allocate_async() (async contextmanager) for cf-orch GPU resource
allocation. Releases allocation on exit; ignores 404 on release;
raises RuntimeError on non-2xx allocation response. Exports
CFOrchClient and Allocation from circuitforge_core.resources.

Note: async test uses unittest.mock rather than httpretty — httpretty
only patches stdlib sockets and does not intercept httpx async (anyio)
transport.
2026-04-02 11:44:35 -07:00
8201f6b3e9 feat(orch): add /api/services/{service}/allocate with auto node selection 2026-04-02 11:25:38 -07:00
52d2c5cf38 feat(orch): expose online_agents() and resident_keys() helpers 2026-04-02 11:22:29 -07:00
d600fb6651 refactor(orch): hoist service_max_mb lookup; clarify warm-fallback comments 2026-04-02 11:21:20 -07:00
13eb0c85f1 feat(orch): add NodeSelector — warm-first GPU scoring 2026-04-02 11:18:44 -07:00
aa51794f45 fix(scheduler): join batch worker threads in shutdown()
Previously shutdown() only joined the scheduler loop thread. Batch
worker threads (which decrement _reserved_vram in their finally block)
could still be running when shutdown returned, leaving stale VRAM
accounting. Now snapshots active workers under lock and joins them all.

Snapshot-then-join pattern avoids holding the lock across blocking join
calls (which would deadlock since workers acquire the same lock on exit).
2026-04-01 11:21:30 -07:00
6b8e421eb2 feat(scheduler): acquire/release cf-orch VRAM lease per batch worker
Before running a batch of tasks, the scheduler now requests a VRAM lease
from the cf-orch coordinator (POST /api/leases). The lease is held for the
full batch and released in the finally block so it's always cleaned up even
on error. Falls back gracefully if the coordinator is unreachable.

Adds coordinator_url and service_name params to TaskScheduler.__init__
and get_scheduler() so callers can override the default localhost:7700.
2026-04-01 11:06:16 -07:00
67701f0d29 feat(orch): agent self-registration and coordinator heartbeat loop
coordinator/app.py:
- Add POST /api/nodes — agents POST {node_id, agent_url} to self-register;
  coordinator immediately polls the new agent for GPU info
- Add lifespan context manager that starts/stops AgentSupervisor heartbeat
  loop (previously the loop was never started)

cli.py start:
- Add --node-id flag (default 'local')
- Pre-register the local agent URL (http://127.0.0.1:{agent_port}) so the
  heartbeat loop can poll it immediately on startup
- Drop redundant lease_manager.register_gpu() call — supervisor.poll_agent()
  now does this via the heartbeat after the agent responds

cli.py agent:
- Add --advertise-host flag for NATted/multi-homed nodes
- Fire registration POST to coordinator in a daemon thread (2s delay) so
  uvicorn.run() can start binding immediately; no double uvicorn.run()
2026-03-31 19:20:35 -07:00
7aa0ad7a51 feat(dashboard): add self-hosted coordinator dashboard at GET /
- dashboard.html: node-centric layout — GPU cards with VRAM bars and
  sparklines, active leases table with TTL progress bars, service health
  pill, auto-refreshes every 5s via fetch() against the local JSON API
- All dynamic content set via DOM textContent / createElementNS — no
  innerHTML with user-sourced strings
- coordinator/app.py: serves dashboard.html at GET / (HTMLResponse,
  excluded from OpenAPI schema); HTML read at import time from package dir
- test_dashboard_serves_html: verifies 200, content-type text/html,
  and key route markers present
2026-03-31 18:57:25 -07:00