- platforms/: eBay platform adapter (snipe integration layer) - docs/: developer guide, module reference, getting-started docs - scripts/: utility scripts for development and deployment
2 KiB
resources
VRAM allocation engine and GPU profile registry. Works alongside the tasks module to prevent GPU OOM errors across concurrent LLM workloads.
from circuitforge_core.resources import ResourceCoordinator, VRAMSlot
Architecture
The resource coordinator runs as a sidecar alongside each product (via compose.override.yml) and registers with the cf-orch coordinator at http://10.1.10.71:7700. The coordinator maintains a global view of VRAM allocation across all products and all GPUs.
Product A (kiwi) ─┐
Product B (peregrine) ─┤ → cf-orch coordinator → GPU 0 (24GB)
Product C (snipe) ─┘ → GPU 1 (8GB)
VRAM allocation
VRAMSlot represents a lease on a fixed VRAM budget:
slot = VRAMSlot(service="kiwi", task_type="recipe_llm", vram_gb=4.0)
async with coordinator.lease(slot):
result = await run_inference(prompt)
# VRAM released automatically on context exit
If the requested VRAM is not available, the coordinator queues the request. Tasks are executed in FIFO order within each priority class.
Eviction engine
When a high-priority task needs VRAM that is held by a lower-priority task, the eviction engine signals the lower-priority task to checkpoint and pause. Eviction is cooperative, not forced — tasks must implement the checkpoint() callback.
GPU profile registry
The registry maps GPU models to capability profiles:
from circuitforge_core.resources import get_gpu_profile
profile = get_gpu_profile("RTX 4090")
# GpuProfile(vram_gb=24.0, fp16=True, int8=True, int4=True, max_batch=32)
Profiles are used by the LLM router to determine which model quantizations a GPU can run.
Local fallback
When the cf-orch coordinator is not reachable (local dev without the sidecar), the resource coordinator falls back to a local-only mode: tasks run sequentially with no cross-product coordination. This is safe for development but should not be used in production if multiple products are running concurrently on the same GPU.