Switches recipe generation service type from 'cf-text' to 'vllm' so the coordinator can route to quantized small models (Qwen2.5-3B, Phi-4-mini) rather than the full text backend. Passes CF_APP_NAME for per-product VRAM/request analytics in the coordinator dashboard. - llm_recipe.py: _SERVICE_TYPE = 'vllm'; _MODEL_CANDIDATES list; passes model_candidates and pipeline= to CFOrchClient.allocate() - compose.cloud.yml: CF_APP_NAME=kiwi env var for coordinator attribution |
||
|---|---|---|
| .. | ||
| api | ||
| core | ||
| db | ||
| models | ||
| services | ||
| staples | ||
| static | ||
| styles | ||
| tasks | ||
| utils | ||
| __init__.py | ||
| cloud_session.py | ||
| main.py | ||
| tiers.py | ||