Commit graph

98 commits

Author SHA1 Message Date
bb2ed3e992 fix: parameterize bare dict type annotations in license module
Some checks failed
CI / test (pull_request) Has been cancelled
2026-04-05 21:19:10 -07:00
f3bc4ac605 feat: add CF_LICENSE_KEY validation via Heimdall (closes #26)
Introduces circuitforge_core.config.license with validate_license() and
get_license_tier(). Both functions are safe to call when CF_LICENSE_KEY
is absent, returning free tier gracefully. Results are cached 30 min per
(key, product) pair. CF_LICENSE_URL env var overrides the default
Heimdall endpoint. Re-exports added to config.__init__. Existing
test_config.py moved into tests/test_config/ package to co-locate with
new test_license.py (10 tests; 204 total passing).
2026-04-05 21:16:57 -07:00
d98d27be3d chore: remove misplaced cf-orch docker workflow (belongs in circuitforge-orch)
Some checks are pending
CI / test (push) Waiting to run
Mirror / mirror (push) Waiting to run
2026-04-05 20:53:13 -07:00
4d858af4d1 Merge pull request 'ci: Forgejo Actions — CI, PyPI release, mirrors (closes #27)' (#29) from feature/ci-cd into main
Some checks are pending
CI / test (push) Waiting to run
Mirror / mirror (push) Waiting to run
2026-04-05 20:51:31 -07:00
874354f235 fix: continue-on-error for mirror steps; guard duplicate Forgejo release creation
Some checks failed
CI / test (pull_request) Has been cancelled
2026-04-05 20:51:18 -07:00
3050179b2f fix: use jq for safe JSON in release step; remove redundant dev deps in ci 2026-04-05 20:51:18 -07:00
378d125ba6 ci: add Forgejo Actions workflows — CI, PyPI release, mirrors, cliff.toml (closes #27) 2026-04-05 20:51:18 -07:00
1cbea29817 Merge pull request 'feat: shared feedback router factory (closes #23)' (#28) from feature/api-feedback into main 2026-04-05 20:50:24 -07:00
f0a9ec5c37 fix: raise 502 on label creation failure; narrow subprocess exception scope 2026-04-05 17:36:52 -07:00
0a15ad9522 feat: add circuitforge_core.api.feedback — shared feedback router factory (closes #23)
Adds make_feedback_router(repo, product, demo_mode_fn) which returns a
FastAPI APIRouter with GET /status and POST / endpoints. Handles Forgejo
label creation/reuse, issue body assembly (including repro steps for bugs),
demo mode gating, and FORGEJO_API_TOKEN presence checks. 12 tests covering
all status/submit paths, mock Forgejo interaction, and body content assertions.
Also adds fastapi>=0.110 and httpx>=0.27 to [dev] optional deps.
2026-04-05 17:31:02 -07:00
c244260d1c feat!: strip resources/ from MIT core — moves to circuitforge-orch (v0.8.0)
BREAKING CHANGE: circuitforge_core.resources is no longer available.
Import CFOrchClient from circuitforge_orch.client instead.
cf-orch CLI entry point is now in the circuitforge-orch package.
2026-04-04 22:34:27 -07:00
2259382d0b refactor: replace coordinator-aware TaskScheduler with Protocol + LocalScheduler (MIT); update LLMRouter import path 2026-04-04 22:26:06 -07:00
090a86ce1b refactor: update LLMRouter lazy import — circuitforge_core.resources.client → circuitforge_orch.client 2026-04-04 22:16:17 -07:00
c1e825c06a Merge pull request 'feat: affiliates + preferences modules v0.7.0 (closes #21, #22)' (#25) from feature/affiliates-module into main 2026-04-04 19:14:24 -07:00
d16bc569cf chore: bump version to 0.7.0 — affiliates + preferences modules 2026-04-04 18:28:52 -07:00
ccd2a35deb test: affiliates integration tests — full wrap_url round-trip 2026-04-04 18:28:27 -07:00
fe19de3d9a feat: affiliates public API surface (__init__.py) 2026-04-04 18:27:45 -07:00
7837fbcad2 feat: affiliates router — wrap_url() with opt-out, BYOK, and CF env-var resolution 2026-04-04 18:20:21 -07:00
73cec07bd2 feat: affiliates disclosure — per-retailer tooltip copy + first-encounter banner constants 2026-04-04 18:14:58 -07:00
4c3f3a95a5 feat: affiliates programs — AffiliateProgram, registry, eBay EPN + Amazon Associates builders 2026-04-04 18:12:45 -07:00
d719ea2309 feat: preferences public helpers — get_user_preference / set_user_preference (closes #22 self-hosted) 2026-04-04 18:10:24 -07:00
0d9d030320 feat: preferences LocalFileStore — YAML-backed single-user preference store 2026-04-04 18:07:35 -07:00
9ee31a09c1 feat: preferences dot-path utilities (get_path, set_path) 2026-04-04 18:04:44 -07:00
e6cd3a2e96 chore: sync __version__ to 0.6.0 (matches pyproject.toml) 2026-04-03 16:48:11 -07:00
cb51ba72bc feat: cf-orch Docker image + Forgejo CI pipeline
Dockerfile.orch — multi-mode image (coordinator | agent):
- coordinator: runs cf-orch coordinator on $CF_ORCH_PORT (default 7700)
- agent: connects to $CF_COORDINATOR_URL, serves $CF_AGENT_GPU_IDS

.forgejo/workflows/docker.yml — publishes on every vN.N.N tag:
- ghcr.io/circuit-forge/cf-orch:latest
- ghcr.io/circuit-forge/cf-orch:vX.Y.Z
- Layer cache via GHA cache backend

Closes #19. Bumps to v0.6.0.
2026-04-03 09:10:29 -07:00
3deae056de feat: local-first LLM config + hosted coordinator auth
LLMRouter env-var auto-config:
- No llm.yaml required — auto-configures from ANTHROPIC_API_KEY,
  OPENAI_API_KEY, or OLLAMA_HOST on first use
- Bare-metal self-hosters can run any CF product with just env vars
- Falls back to FileNotFoundError with actionable message only when
  no env vars are set either

CFOrchClient auth:
- Reads CF_LICENSE_KEY env var (or explicit api_key param)
- Sends Authorization: Bearer <key> on all allocation/release requests
- Required for the hosted public coordinator; no-op for local deployments

HeimdallAuthMiddleware (new):
- FastAPI middleware for cf-orch coordinator
- Enabled by HEIMDALL_URL env var; self-hosted deployments skip it
- 5-min TTL cache (matching Kiwi cloud session) keeps Heimdall off the
  per-allocation hot path
- /api/health exempt; free-tier keys rejected with 403 + reason
- 13 tests covering cache TTL, tier ranking, and middleware gating
2026-04-03 08:32:15 -07:00
9544f695e6 chore: CHANGELOG for v0.5.0 2026-04-02 23:05:22 -07:00
7397e227e2 Merge pull request 'feat: manage.py cross-platform product manager' (#18) from feature/manage-py into main 2026-04-02 23:04:58 -07:00
8d87ed4c9f feat: manage.py cross-platform product manager (closes #6)
- circuitforge_core.manage module — replaces bash-only manage.sh
  - config.py: ManageConfig from manage.toml (TOML via tomllib/tomli)
    app name, default_url, docker compose_file/project, native services
    Falls back to directory name when no manage.toml present
  - docker_mode.py: DockerManager wrapping 'docker compose' (v2 plugin)
    or 'docker-compose' (v1 fallback); docker_available() probe
    Commands: start, stop, restart, status, logs, build
  - native_mode.py: NativeManager with PID file process management
    platformdirs for platform-appropriate PID/log paths
    Windows-compatible log tailing (polling, no tail -f)
    Cross-platform kill: SIGTERM→SIGKILL on Unix, taskkill /F on Windows
  - cli.py: typer CLI — start/stop/restart/status/logs/build/open/install-shims
    Mode auto-detection: Docker available + compose file → docker; else native
    --mode docker|native|auto override
  - templates/manage.sh: bash shim (conda, venv, python3 detection)
  - templates/manage.ps1: PowerShell shim (same detection, Windows)
  - templates/manage.toml.example: annotated config template
  - __main__.py: python -m circuitforge_core.manage entry point

- pyproject.toml: manage extras group (platformdirs, typer)
  cf-manage console script; version bumped to 0.5.0

- 36 tests: config (6), docker_mode (9), native_mode (21)
2026-04-02 23:04:35 -07:00
6e3474b97b chore: CHANGELOG for v0.4.0 2026-04-02 22:13:01 -07:00
d45d4e1de6 Merge pull request 'feat: agent watchdog + Ollama adopt-if-running' (#17) from feature/agent-watchdog into main 2026-04-02 22:12:32 -07:00
7bb6b76bd5 feat: ollama adopt-if-running + health_path in ProcessSpec (#16)
- ProcessSpec: adopt (bool) and health_path (str, default /health) fields
- ServiceManager: adopt=True probes health_path before spawning; is_running()
  uses health probe for adopt services rather than proc table + socket check
- _probe_health() helper: urllib GET on localhost:port+path, returns bool
- Agent /services/{service}/start: returns adopted=True when service was
  already running; coordinator sets state=running immediately (no probe wait)
- ServiceInstance: health_path field (default /health)
- service_registry.upsert_instance(): health_path kwarg
- Probe loop uses inst.health_path instead of hardcoded /health
- coordinator allocate_service: looks up health_path from profile spec via
  _get_health_path() and stores on ServiceInstance
- All GPU profiles (2/4/6/8/16/24 GB + cpu-16/32): ollama managed block
  with adopt=true, health_path=/api/tags, port 11434
- 11 new tests
2026-04-02 22:09:42 -07:00
a54a530493 feat: agent watchdog — persist known nodes + auto-reconnect after coordinator restart
closes #15

- NodeStore: SQLite persistence for known agent nodes
  (~/.local/share/circuitforge/cf-orch-nodes.db)
  - upsert on every register(); prune_stale() for 30-day cleanup
  - survives coordinator restarts — data readable by next process

- AgentSupervisor.restore_from_store(): reload known nodes on startup,
  mark all offline; heartbeat loop brings back any that respond

- AgentSupervisor.register(): persists to NodeStore on every call

- cli.py coordinator: NodeStore wired in; restore_from_store() called
  before uvicorn starts

- cli.py agent: one-shot registration replaced with persistent reconnect
  loop (daemon thread, 30 s interval) — coordinator restart → nodes
  reappear within one cycle with no manual intervention on agent hosts

- 16 new tests: NodeStore (8) + AgentSupervisor watchdog (8)
2026-04-02 22:01:55 -07:00
a36f469d60 chore: CHANGELOG for v0.3.0 2026-04-02 18:56:49 -07:00
1de5ec767c Merge pull request 'feat: hardware detection, cf-docuvision service, documents ingestion pipeline' (#14) from feature/hardware-docuvision into main 2026-04-02 18:55:50 -07:00
cd9864b5e8 feat: hardware detection, cf-docuvision service, documents ingestion pipeline
Closes #5, #7, #8, #13

## hardware module (closes #5)
- HardwareSpec, LLMBackendConfig, LLMConfig dataclasses
- VramTier ladder (CPU / 2 / 4 / 6 / 8 / 16 / 24 GB) with select_tier()
- generate_profile() maps HardwareSpec → LLMConfig for llm.yaml generation
- detect_hardware() with nvidia-smi / rocm-smi / system_profiler / cpu fallback
- 31 tests across tiers, generator, and detect

## cf-docuvision service (closes #8)
- FastAPI service wrapping ByteDance/Dolphin-v2 (Qwen2.5-VL backbone)
- POST /extract: image_b64 or image_path + hint → ExtractResponse
- Lazy model loading; JSON-structured output with plain-text fallback
- ProcessSpec managed blocks added to all four GPU profiles (6/8/16/24 GB)
- 14 tests

## documents module (closes #7)
- StructuredDocument, Element, ParsedTable dataclasses (frozen, composable)
- DocuvisionClient: thin HTTP client for cf-docuvision POST /extract
- ingest(): primary cf-docuvision path → LLMRouter vision fallback → empty doc
- CF_DOCUVISION_URL env var for URL override
- 22 tests

## coordinator probe loop (closes #13)
- _run_instance_probe_loop: starting → running on 200; starting → stopped on timeout
- 4 async tests with CancelledError-based tick control
2026-04-02 18:53:25 -07:00
482c430cdb docs: add CHANGELOG for v0.1.0 and v0.2.0 2026-04-02 17:25:06 -07:00
749e51ccca Merge pull request 'feat(orch): health probe loop + VRAM pre-flight fix' (#12) from feature/orch-llm-server into main 2026-04-02 17:24:09 -07:00
a7290c1240 feat(orch): background health probe loop — starting → running transition
Coordinator now polls all 'starting' instances every 5 s via GET /health.
On 200: state → running. After 300 s without a healthy response: state →
stopped. Closes #10.
2026-04-02 17:18:16 -07:00
bd132851ec fix(orch): tighten VRAM pre-flight to require full max_mb free (not half)
max_mb // 2 was too loose — Qwen2.5-3B needs ~5.9 GB on an 8 GB card
but the threshold only required 3.25 GB free, allowing Ollama to hold
4.5 GB while a load attempt was still dispatched (causing OOM crash).

- node_selector: can_fit = free_mb >= service_max_mb (was // 2)
- coordinator /start: same threshold fix + updated error message
- tests: two new node_selector tests pin the full-ceiling semantics;
  updated stale docstring in coordinator app test
2026-04-02 16:44:36 -07:00
2d095f0090 fix(llm-server): handle transformers 5.x BatchEncoding; use dtype kwarg
- apply_chat_template() returns BatchEncoding in transformers 5.x (not bare tensor);
  extract .input_ids explicitly with fallback for 4.x compat
- Switch from deprecated torch_dtype= to dtype= in from_pretrained()
2026-04-02 16:36:07 -07:00
c78341fc6f feat(orch): replace Ouro/vllm-Docker with generic HF inference server; add ProcessSpec
- Add circuitforge_core/resources/inference/llm_server.py: generic OpenAI-compatible
  FastAPI server for any HuggingFace causal LM (Phi-4-mini-instruct, Qwen2.5-3B-Instruct)
- Add service_manager.py + service_probe.py: ProcessSpec start/stop/is_running support
  (Popen-based; socket probe confirms readiness before marking running)
- Update all 4 public GPU profiles to use ProcessSpec→llm_server instead of Docker vllm:
  6gb (max_mb 5500), 8gb (max_mb 6500), 16gb/24gb (max_mb 9000)
- Model candidates: Phi-4-mini-instruct first (7.2GB), Qwen2.5-3B-Instruct fallback (5.8GB)
- Remove ouro_server.py (Ouro incompatible with transformers 5.x; vllm Docker also incompatible)
- Add 17 tests for ServiceManager ProcessSpec (start/stop/is_running/list/get_url)
2026-04-02 15:33:08 -07:00
27999925cf fix(orch): seed ServiceInstance on first allocate start 2026-04-02 14:22:55 -07:00
c5e12b74f2 Merge pull request 'feat: auto service lifecycle — /allocate, NodeSelector, idle sweep, CFOrchClient' (#9) from feature/orch-auto-lifecycle into main 2026-04-02 14:11:36 -07:00
e58c3aea23 fix: TTL sweep, immutability, service-scoped release, logger in orch alloc
- ServiceRegistry: add sweep_expired_allocations() to remove stale TTL
  allocations and transition instances to idle; add get_allocation() helper
- AgentSupervisor._run_idle_sweep: call sweep_expired_allocations() before
  idle-timeout check so crashed-caller leaks are cleaned up each sweep tick
- schema._parse_managed: copy raw dict before extracting 'type' key instead
  of mutating caller's dict with pop()
- app.release_allocation: validate allocation belongs to the given service
  path param before releasing; return 404 if mismatch
- router._try_cf_orch_alloc: replace print() with logger.warning(); add
  module-level logger = logging.getLogger(__name__)
- tests: add test_sweep_expired_allocations covering TTL expiry and idle
  state transition
2026-04-02 12:55:38 -07:00
1a20b80a50 test: add VRAM pre-flight 503 test for ensure_service 2026-04-02 12:49:50 -07:00
02806359af feat: add Services table to coordinator dashboard 2026-04-02 12:47:27 -07:00
a4ccaaf3e2 fix: address coordinator/idle-sweep quality issues from review
- CRITICAL: idle sweep now calls mark_stopped() after successful HTTP stop,
  preventing repeated stop POSTs on every 3rd tick for the same instance
- CRITICAL: active_allocations() now filters by gpu_id to avoid marking wrong
  instance idle on multi-GPU nodes when an allocation is released
- CRITICAL: VRAM pre-flight guard in ensure_service was dead code — added the
  actual HTTPException(503) before the candidate loop
- IMPORTANT: register() now updates agent_url on re-registration if it changed,
  so relocated agents are tracked correctly
- IMPORTANT: updated test_service_registry.py callers of active_allocations()
  to pass the now-required gpu_id argument
2026-04-02 12:45:31 -07:00
49ab9e4e88 feat: wire ServiceRegistry into coordinator allocate endpoints 2026-04-02 12:30:58 -07:00
c299482e0d feat: add idle sweep to AgentSupervisor 2026-04-02 12:30:28 -07:00