Evaluate and integrate task-model assignment routing via cf-orch /api/inference/task #54
Labels
No labels
accessibility
backlog
browser-pool
bug
cloud
enhancement
feature
infra
paid-tier
performance
ux
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: Circuit-Forge/snipe#54
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Background
cf-orch #60 shipped a three-layer task-model assignment system:
assignments.yaml):product.task → model_idmodel_registry.yaml):model_id → {service_type, vram_mb, alias, ...}This exposes
POST /api/inference/taskfor task-based LLM/model routing without hardcoded model IDs in product code.Spec:
circuitforge-plans/circuitforge-orch/superpowers/specs/2026-05-13-task-model-assignments-design.mdEvaluation needed for Snipe
Snipe's GPU/LLM integration points are:
mcp/server.py— MCP server, may have inference callsmcp/gpu_scoring.py— GPU scoring, may select models for trust scoringAction item: Review these files to determine:
trust_score,listing_analysis,seller_research)Proposed tasks if LLM is used
Acceptance Criteria
mcp/server.pyandmcp/gpu_scoring.pyfor cf-orch inference callsassignments.yamlentries and migrate call sitesRelated
circuitforge-plans/circuitforge-orch/superpowers/specs/2026-05-13-task-model-assignments-design.mdAudit findings
Files specified in the issue
app/mcp/server.py— no LLM calls. Pure HTTP proxy to the FastAPI backend (/api/search,/api/enrich,/api/saved-searches). No cf-orch integration.app/mcp/gpu_scoring.py— entirely deterministic. Regex pattern matching + weighted math for VRAM/arch scoring. No LLM, no network calls. No migration needed.Actual LLM call sites (different files)
The issue pointed at the wrong files. The real LLM calls are:
app/tasks/runner.py:140—trust_photo_analysistask usesLLMRouter().complete(images=[...])(vision LLM, moondream2, 2 GB VRAM budget). This is a migration candidate.app/llm/query_translator.py:157—QueryTranslator.translate()callsLLMRouter.complete()to convert natural-language queries to eBaySearchFilters. This is also a migration candidate.Both use the Snipe LLMRouter shim (
app/llm/router.py) which wrapscircuitforge_core.llm.LLMRouterwith tri-level config resolution.Proposed task assignments
If we proceed with cf-orch task routing:
Overlap with #43
trust_photo_analysisis also tracked under #43 ("Wire Snipe photo analysis to cf-orch"). That issue should be the implementation home for the vision task migration; this issue can coverquery_translationseparately or both can be addressed together.Recommendation
Update this issue scope to cover the two actual call sites above. GPU scoring and the MCP server layer need no changes.
Implemented
Migrated
app/llm/query_translator.pyto dual-backend routing:POST /api/inference/task(product=snipe, task=query_translation), calls the assigned cf-text service, and releases the allocation in afinallyblock to guarantee the VRAM lease is freed regardless of errors.assignments.yamlin cf-orch updated with:Also moved
httpxfrom dev-only to main dependencies (it was already used inmcp/server.py).Commits: snipe
1bf95bb, cf-orch8909ae3. 213 tests passing.Vision/photo-analysis piece remains in #43 (depends on cf-docuvision service wiring).