feat: add VLM vision model to NAS and bench_models.yaml (moondream2 or SmolVLM) #44
Labels
No labels
architecture
backlog
enhancement
module:documents
module:hardware
module:manage
module:pipeline
module:voice
priority:backlog
priority:high
priority:medium
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: Circuit-Forge/circuitforge-core#44
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What
cf-vision currently only supports SigLIP (classify + embed). To enable caption-quality benchmarking and real VQA routing, we need a generative VLM on the NAS.
Candidates
vikhyatk/moondream2— ~2 GB fp16, fast, good for documents; VLMBackend already supports itHuggingFaceTB/SmolVLM-Instruct(256M) — ~500 MB, extremely fast, good routing baselineTasks
/Library/Assets/LLM/vision/on the NASmoondream2(or SmolVLM) entry tobench_models.yamlwithservice: cf-visionvision-captiontask tobench_tasks.yamlwithquality: pattern_matchchecking that response contains image-description vocabularynavi.yamlprofilecf-visionmanaged block to support--backend vlmvariant (likely a second service entry on port 8007)_VRAM_TABLEinvlm.pyif not already presentNotes
SigLIP (port 8006) handles classify+embed; VLM backend (port 8007) would handle caption+VQA. Run as separate services so they don't compete for VRAM.