feat: cf-text — direct text generation service module #41

New issue

Closed

opened 2026-04-08 21:56:06 -07:00 by pyr0ball · 0 comments

pyr0ball commented

2026-04-08 21:56:06 -07:00

Owner

Summary

Scaffold cf-text as a shared circuitforge-core service module that provides direct access to text generation models without routing through ollama or vllm.

Motivation

Products like Peregrine (interview prep, cover letters, AI suggestions) and Kiwi (recipe suggestions, expiry advice) need text generation that:

Bypasses ollama/vllm overhead for lighter inference tasks
Works with local models directly (llama.cpp, transformers, ctransformers)
Is registerable with cf-orch as a cf-text service type with its own VRAM budget and max_concurrent limit
Supports streaming responses for UI responsiveness

Proposed interface

# circuitforge_core/resources/text/client.py
cf_text.generate(prompt, model=None, stream=False, max_tokens=512)
cf_text.chat(messages, model=None, stream=False)

Service profile fields (cf-orch)

max_mb: per-model (3B Q4 ≈ 2048, 7B Q4 ≈ 4096)
preferred_compute_cap: 7.5 minimum (INT8 tensor cores)
max_concurrent: 2–3 depending on node
shared: true — multiple products can share a running instance

Consumers

peregrine — interview prep responses, cover letter generation, job match suggestions
kiwi — recipe suggestions, ingredient substitutions, expiry advice

Notes

Should follow same pattern as cf-stt / cf-vision (managed process, health endpoint)
Add node profile entries to heimdall/strahl/navi/huginn yamls once scaffolded
Wire into cf-orch service registry alongside existing voice/vision services

## Summary Scaffold `cf-text` as a shared circuitforge-core service module that provides direct access to text generation models without routing through ollama or vllm. ## Motivation Products like Peregrine (interview prep, cover letters, AI suggestions) and Kiwi (recipe suggestions, expiry advice) need text generation that: - Bypasses ollama/vllm overhead for lighter inference tasks - Works with local models directly (llama.cpp, transformers, ctransformers) - Is registerable with cf-orch as a `cf-text` service type with its own VRAM budget and `max_concurrent` limit - Supports streaming responses for UI responsiveness ## Proposed interface ```python # circuitforge_core/resources/text/client.py cf_text.generate(prompt, model=None, stream=False, max_tokens=512) cf_text.chat(messages, model=None, stream=False) ``` ## Service profile fields (cf-orch) - `max_mb`: per-model (3B Q4 ≈ 2048, 7B Q4 ≈ 4096) - `preferred_compute_cap`: 7.5 minimum (INT8 tensor cores) - `max_concurrent`: 2–3 depending on node - `shared: true` — multiple products can share a running instance ## Consumers - `peregrine` — interview prep responses, cover letter generation, job match suggestions - `kiwi` — recipe suggestions, ingredient substitutions, expiry advice ## Notes - Should follow same pattern as cf-stt / cf-vision (managed process, health endpoint) - Add node profile entries to heimdall/strahl/navi/huginn yamls once scaffolded - Wire into cf-orch service registry alongside existing voice/vision services