Commit graph

5 commits

Author SHA1 Message Date
e2658f743f feat(scheduler): OrchestratedScheduler for cloud/multi-GPU, configurable via env
Switches to OrchestratedScheduler in cloud mode so concurrent recipe_llm
jobs fan out across all registered cf-orch GPU nodes instead of serializing
on one. Under load this eliminates poll timeouts from queue backup.

USE_ORCH_SCHEDULER env var gives explicit control independent of CLOUD_MODE:
  unset        follow CLOUD_MODE (cloud=orch, local=local)
  true         OrchestratedScheduler always (e.g. multi-GPU local rig)
  false        LocalScheduler always (e.g. cloud single-GPU dev instance)

ImportError fallback: if circuitforge_orch is not installed and orch is
requested, logs a warning and falls back to LocalScheduler gracefully.
2026-04-19 22:11:34 -07:00
ed4595d960 feat(recipes): async L3/L4 recipe job queue with poll endpoint
Adds the recipe_jobs table and background task pipeline for level 3/4
recipe generation. POST ?async=true returns 202 with job_id; clients
poll GET /recipes/jobs/{job_id} until status=done.

Key fix: _enqueue_recipe_job now calls scheduler.enqueue() after
insert_task() to wake the in-memory work queue immediately. Without
this, tasks sat in 'queued' until the scheduler's 30s idle cycle or
an API restart triggered _load_queued_tasks().

- Migration 034: recipe_jobs table (job_id, user_id, status, request,
  result, error) with indexes on job_id and user_id/created_at
- Store: create/get/update_running/complete/fail recipe job methods
- runner.py: recipe_llm task type + _run_recipe_llm handler; MUST
  call fail_recipe_job() before re-raising so status stays consistent
- CLOUD_MODE guard: falls back to sync generation (scheduler only
  polls shared settings DB, not per-user DBs)
- L4 wildcard is covered by the same req.level in (3, 4) dispatch
2026-04-19 21:44:27 -07:00
dda8be48c9 feat: wire cf-orch agent sidecar and scheduler coordinator integration (closes #7)
- compose.override.yml: cf-orch agent sidecar (port 7702) self-registers with
  coordinator at COORDINATOR_URL; advertise-host configurable via CF_ORCH_ADVERTISE_HOST
- scheduler.py: pass coordinator_url=settings.COORDINATOR_URL and service_name="kiwi"
  so VRAM leases appear as "kiwi" on the orchestrator dashboard
- environment.yml: add psutil>=5.9 (required by cf-orch agent eviction executor)
- .env.example: document CF_ORCH_ADVERTISE_HOST
2026-04-02 22:57:21 -07:00
33a5cdec37 feat: cloud auth bypass, VRAM leasing, barcode EXIF fix, pipeline improvements
- cloud_session.py: CLOUD_AUTH_BYPASS_IPS with CIDR support; X-Real-IP for
  Docker bridge NAT-aware client IP resolution; local-dev DB path under
  CLOUD_DATA_ROOT for bypass sessions
- compose.cloud.yml: thread CLOUD_AUTH_BYPASS_IPS from shell env; document
  Docker bridge CIDR requirement in .env.example
- nginx.cloud.conf + nginx.conf: client_max_body_size 20m for barcode uploads
- barcode_scanner.py: EXIF orientation correction (PIL ImageOps.exif_transpose)
  before cv2 decode; rotation coverage extended to [90, 180, 270, 45, 135]
  to catch sideways barcodes the 270° case was missing
- llm_recipe.py: CF-core VRAM lease acquire/release wrapping LLMRouter calls
- tasks/runner.py + config.py: COORDINATOR_URL + recipe_llm VRAM budget (4GB)
- recipes.py: per-request Store creation inside asyncio.to_thread worker to
  avoid SQLite check_same_thread violations
- download_datasets.py: HF_PARQUET_FILES strategy for repos without dataset
  builders (lishuyang/recipepairs direct parquet download)
- derive_substitutions.py: use recipepairs_recipes.parquet for ingredient
  lookup; numpy array detection; JSON category parsing
- test_build_flavorgraph_index.py: rewritten for CSV-based index format
- pyproject.toml: add Pillow>=10.0 for EXIF rotation support
2026-04-01 16:06:23 -07:00
636bffda5a feat(tasks): add background task scheduler for LLM expiry fallback
Uses circuitforge_core.tasks.scheduler. VRAM detection via cf-orch when
available, falling back to unlimited. Adds expiry_llm_fallback task type
to background-predict expiry dates for items the LUT doesn't cover.
2026-03-31 09:25:48 -07:00