feat: cf-orch dispatch for training jobs (Phase 3) #54

Open
opened 2026-05-01 12:13:46 -07:00 by pyr0ball · 0 comments
Owner

Context

Currently all training runs locally. For large models (7B+ SFT, voice finetuning) the job queue should be able to dispatch to cf-orch nodes.

Work

  • Extend app/train/train.py job dispatch to support remote targets
  • Job config: {target: "local" | "cf-orch", node_filter: {...}}
  • Progress streaming from remote cf-orch node via coordinator SSE proxy
  • Node selection: cf-orch allocates a GPU with sufficient VRAM for the job type

Depends on

  • #43 (train job queue)
  • cf-orch multi-GPU allocation support
## Context Currently all training runs locally. For large models (7B+ SFT, voice finetuning) the job queue should be able to dispatch to cf-orch nodes. ## Work - Extend `app/train/train.py` job dispatch to support remote targets - Job config: `{target: "local" | "cf-orch", node_filter: {...}}` - Progress streaming from remote cf-orch node via coordinator SSE proxy - Node selection: cf-orch allocates a GPU with sufficient VRAM for the job type ## Depends on - #43 (train job queue) - cf-orch multi-GPU allocation support
pyr0ball added this to the v2 — Pipeline Architecture milestone 2026-05-01 12:13:46 -07:00
pyr0ball added the
phase-3
backend
labels 2026-05-01 12:13:46 -07:00
Sign in to join this conversation.
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Circuit-Forge/avocet#54
No description provided.