- Add circuitforge_core/resources/inference/llm_server.py: generic OpenAI-compatible FastAPI server for any HuggingFace causal LM (Phi-4-mini-instruct, Qwen2.5-3B-Instruct) - Add service_manager.py + service_probe.py: ProcessSpec start/stop/is_running support (Popen-based; socket probe confirms readiness before marking running) - Update all 4 public GPU profiles to use ProcessSpec→llm_server instead of Docker vllm: 6gb (max_mb 5500), 8gb (max_mb 6500), 16gb/24gb (max_mb 9000) - Model candidates: Phi-4-mini-instruct first (7.2GB), Qwen2.5-3B-Instruct fallback (5.8GB) - Remove ouro_server.py (Ouro incompatible with transformers 5.x; vllm Docker also incompatible) - Add 17 tests for ServiceManager ProcessSpec (start/stop/is_running/list/get_url) |
||
|---|---|---|
| .. | ||
| config | ||
| db | ||
| llm | ||
| pipeline | ||
| resources | ||
| tasks | ||
| tiers | ||
| vision | ||
| wizard | ||
| __init__.py | ||