Circuit-Forge/snipe

Fork 0

Evaluate and integrate task-model assignment routing via cf-orch /api/inference/task #54

New issue

Closed

opened 2026-05-13 07:38:33 -07:00 by pyr0ball · 2 comments

pyr0ball commented

2026-05-13 07:38:33 -07:00

Owner

Background

cf-orch #60 shipped a three-layer task-model assignment system:

Layer 1 (assignments.yaml): product.task → model_id
Layer 2 (model_registry.yaml): model_id → {service_type, vram_mb, alias, ...}
Layer 3: per-node catalogs (existing, unchanged)

This exposes POST /api/inference/task for task-based LLM/model routing without hardcoded model IDs in product code.

Spec: circuitforge-plans/circuitforge-orch/superpowers/specs/2026-05-13-task-model-assignments-design.md

Evaluation needed for Snipe

Snipe's GPU/LLM integration points are:

mcp/server.py — MCP server, may have inference calls
mcp/gpu_scoring.py — GPU scoring, may select models for trust scoring

Action item: Review these files to determine:

Does Snipe currently make LLM inference calls via cf-orch?
If yes, what tasks map to those calls? (e.g., trust_score, listing_analysis, seller_research)
Should GPU scoring be task-assigned, or is it deterministic (no LLM needed)?

Proposed tasks if LLM is used

snipe:
  trust_score:
    model_id: ibm-granite--granite-4.1-8b
    description: Seller/listing trust score inference
  listing_analysis:
    model_id: ibm-granite--granite-4.1-8b
    description: Listing text analysis

Acceptance Criteria

Audit mcp/server.py and mcp/gpu_scoring.py for cf-orch inference calls
If LLM calls found: add assignments.yaml entries and migrate call sites
If no LLM calls: close as N/A with a comment confirming

cf-orch #60 (task-model assignment layer — already shipped)
circuitforge-plans/circuitforge-orch/superpowers/specs/2026-05-13-task-model-assignments-design.md

## Background cf-orch #60 shipped a three-layer task-model assignment system: - **Layer 1** (`assignments.yaml`): `product.task → model_id` - **Layer 2** (`model_registry.yaml`): `model_id → {service_type, vram_mb, alias, ...}` - **Layer 3**: per-node catalogs (existing, unchanged) This exposes `POST /api/inference/task` for task-based LLM/model routing without hardcoded model IDs in product code. Spec: `circuitforge-plans/circuitforge-orch/superpowers/specs/2026-05-13-task-model-assignments-design.md` ## Evaluation needed for Snipe Snipe's GPU/LLM integration points are: - `mcp/server.py` — MCP server, may have inference calls - `mcp/gpu_scoring.py` — GPU scoring, may select models for trust scoring **Action item:** Review these files to determine: 1. Does Snipe currently make LLM inference calls via cf-orch? 2. If yes, what tasks map to those calls? (e.g., `trust_score`, `listing_analysis`, `seller_research`) 3. Should GPU scoring be task-assigned, or is it deterministic (no LLM needed)? ### Proposed tasks if LLM is used ```yaml snipe: trust_score: model_id: ibm-granite--granite-4.1-8b description: Seller/listing trust score inference listing_analysis: model_id: ibm-granite--granite-4.1-8b description: Listing text analysis ``` ## Acceptance Criteria - [ ] Audit `mcp/server.py` and `mcp/gpu_scoring.py` for cf-orch inference calls - [ ] If LLM calls found: add `assignments.yaml` entries and migrate call sites - [ ] If no LLM calls: close as N/A with a comment confirming ## Related - cf-orch #60 (task-model assignment layer — already shipped) - `circuitforge-plans/circuitforge-orch/superpowers/specs/2026-05-13-task-model-assignments-design.md`

pyr0ball commented

2026-05-13 15:12:35 -07:00

Author

Owner

Audit findings

Files specified in the issue

app/mcp/server.py — no LLM calls. Pure HTTP proxy to the FastAPI backend (/api/search, /api/enrich, /api/saved-searches). No cf-orch integration.

app/mcp/gpu_scoring.py — entirely deterministic. Regex pattern matching + weighted math for VRAM/arch scoring. No LLM, no network calls. No migration needed.

Actual LLM call sites (different files)

The issue pointed at the wrong files. The real LLM calls are:

app/tasks/runner.py:140 — trust_photo_analysis task uses LLMRouter().complete(images=[...]) (vision LLM, moondream2, 2 GB VRAM budget). This is a migration candidate.
app/llm/query_translator.py:157 — QueryTranslator.translate() calls LLMRouter.complete() to convert natural-language queries to eBay SearchFilters. This is also a migration candidate.

Both use the Snipe LLMRouter shim (app/llm/router.py) which wraps circuitforge_core.llm.LLMRouter with tri-level config resolution.

Proposed task assignments

If we proceed with cf-orch task routing:

snipe:
  trust_photo_analysis:
    model_id: moondream2  # or whichever vision model is in registry
    description: Vision analysis of eBay listing photos
  query_translation:
    model_id: ibm-granite--granite-4.1-8b
    description: NL → eBay search params translation

Overlap with #43

trust_photo_analysis is also tracked under #43 ("Wire Snipe photo analysis to cf-orch"). That issue should be the implementation home for the vision task migration; this issue can cover query_translation separately or both can be addressed together.

Recommendation

Update this issue scope to cover the two actual call sites above. GPU scoring and the MCP server layer need no changes.

## Audit findings ### Files specified in the issue **`app/mcp/server.py`** — no LLM calls. Pure HTTP proxy to the FastAPI backend (`/api/search`, `/api/enrich`, `/api/saved-searches`). No cf-orch integration. **`app/mcp/gpu_scoring.py`** — entirely deterministic. Regex pattern matching + weighted math for VRAM/arch scoring. No LLM, no network calls. No migration needed. ### Actual LLM call sites (different files) The issue pointed at the wrong files. The real LLM calls are: 1. **`app/tasks/runner.py:140`** — `trust_photo_analysis` task uses `LLMRouter().complete(images=[...])` (vision LLM, moondream2, 2 GB VRAM budget). This is a migration candidate. 2. **`app/llm/query_translator.py:157`** — `QueryTranslator.translate()` calls `LLMRouter.complete()` to convert natural-language queries to eBay `SearchFilters`. This is also a migration candidate. Both use the Snipe LLMRouter shim (`app/llm/router.py`) which wraps `circuitforge_core.llm.LLMRouter` with tri-level config resolution. ### Proposed task assignments If we proceed with cf-orch task routing: ```yaml snipe: trust_photo_analysis: model_id: moondream2 # or whichever vision model is in registry description: Vision analysis of eBay listing photos query_translation: model_id: ibm-granite--granite-4.1-8b description: NL → eBay search params translation ``` ### Overlap with #43 `trust_photo_analysis` is also tracked under #43 ("Wire Snipe photo analysis to cf-orch"). That issue should be the implementation home for the vision task migration; this issue can cover `query_translation` separately or both can be addressed together. ### Recommendation Update this issue scope to cover the two actual call sites above. GPU scoring and the MCP server layer need no changes.

pyr0ball commented

2026-05-13 15:22:33 -07:00

Author

Owner

Implemented

Migrated app/llm/query_translator.py to dual-backend routing:

CF_ORCH_URL set (cloud/premium): allocates via POST /api/inference/task (product=snipe, task=query_translation), calls the assigned cf-text service, and releases the allocation in a finally block to guarantee the VRAM lease is freed regardless of errors.
CF_ORCH_URL absent (local installs): existing LLMRouter path unchanged — ollama/vllm/API key config continues to work.

assignments.yaml in cf-orch updated with:

- product: snipe
  task: query_translation
  model_id: granite-4.1-8b
  description: Natural-language query to structured eBay search params

Also moved httpx from dev-only to main dependencies (it was already used in mcp/server.py).

Commits: snipe 1bf95bb, cf-orch 8909ae3. 213 tests passing.

Vision/photo-analysis piece remains in #43 (depends on cf-docuvision service wiring).

## Implemented Migrated `app/llm/query_translator.py` to dual-backend routing: - **CF_ORCH_URL set (cloud/premium):** allocates via `POST /api/inference/task` (`product=snipe, task=query_translation`), calls the assigned cf-text service, and releases the allocation in a `finally` block to guarantee the VRAM lease is freed regardless of errors. - **CF_ORCH_URL absent (local installs):** existing LLMRouter path unchanged — ollama/vllm/API key config continues to work. `assignments.yaml` in cf-orch updated with: ```yaml - product: snipe task: query_translation model_id: granite-4.1-8b description: Natural-language query to structured eBay search params ``` Also moved `httpx` from dev-only to main dependencies (it was already used in `mcp/server.py`). Commits: snipe `1bf95bb`, cf-orch `8909ae3`. 213 tests passing. Vision/photo-analysis piece remains in #43 (depends on cf-docuvision service wiring).