feat: cf-text — direct text generation service module #41

Closed
opened 2026-04-08 21:56:06 -07:00 by pyr0ball · 0 comments
Owner

Summary

Scaffold cf-text as a shared circuitforge-core service module that provides direct access to text generation models without routing through ollama or vllm.

Motivation

Products like Peregrine (interview prep, cover letters, AI suggestions) and Kiwi (recipe suggestions, expiry advice) need text generation that:

  • Bypasses ollama/vllm overhead for lighter inference tasks
  • Works with local models directly (llama.cpp, transformers, ctransformers)
  • Is registerable with cf-orch as a cf-text service type with its own VRAM budget and max_concurrent limit
  • Supports streaming responses for UI responsiveness

Proposed interface

# circuitforge_core/resources/text/client.py
cf_text.generate(prompt, model=None, stream=False, max_tokens=512)
cf_text.chat(messages, model=None, stream=False)

Service profile fields (cf-orch)

  • max_mb: per-model (3B Q4 ≈ 2048, 7B Q4 ≈ 4096)
  • preferred_compute_cap: 7.5 minimum (INT8 tensor cores)
  • max_concurrent: 2–3 depending on node
  • shared: true — multiple products can share a running instance

Consumers

  • peregrine — interview prep responses, cover letter generation, job match suggestions
  • kiwi — recipe suggestions, ingredient substitutions, expiry advice

Notes

  • Should follow same pattern as cf-stt / cf-vision (managed process, health endpoint)
  • Add node profile entries to heimdall/strahl/navi/huginn yamls once scaffolded
  • Wire into cf-orch service registry alongside existing voice/vision services
## Summary Scaffold `cf-text` as a shared circuitforge-core service module that provides direct access to text generation models without routing through ollama or vllm. ## Motivation Products like Peregrine (interview prep, cover letters, AI suggestions) and Kiwi (recipe suggestions, expiry advice) need text generation that: - Bypasses ollama/vllm overhead for lighter inference tasks - Works with local models directly (llama.cpp, transformers, ctransformers) - Is registerable with cf-orch as a `cf-text` service type with its own VRAM budget and `max_concurrent` limit - Supports streaming responses for UI responsiveness ## Proposed interface ```python # circuitforge_core/resources/text/client.py cf_text.generate(prompt, model=None, stream=False, max_tokens=512) cf_text.chat(messages, model=None, stream=False) ``` ## Service profile fields (cf-orch) - `max_mb`: per-model (3B Q4 ≈ 2048, 7B Q4 ≈ 4096) - `preferred_compute_cap`: 7.5 minimum (INT8 tensor cores) - `max_concurrent`: 2–3 depending on node - `shared: true` — multiple products can share a running instance ## Consumers - `peregrine` — interview prep responses, cover letter generation, job match suggestions - `kiwi` — recipe suggestions, ingredient substitutions, expiry advice ## Notes - Should follow same pattern as cf-stt / cf-vision (managed process, health endpoint) - Add node profile entries to heimdall/strahl/navi/huginn yamls once scaffolded - Wire into cf-orch service registry alongside existing voice/vision services
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Circuit-Forge/circuitforge-core#41
No description provided.