voice benchmark: parallel model scoring to fan out across cluster nodes #39

Open
opened 2026-04-22 10:44:31 -07:00 by pyr0ball · 0 comments
Owner

Problem

The voice benchmark (scripts/benchmark_voice.py run --cforch) runs models sequentially — allocate, score all 6 prompts, release, repeat. Even with 3 nodes online (Heimdall, Navi, Strahl), only one GPU is ever busy at a time.

Goal

Fan out model scoring across available cluster nodes in parallel so all GPUs are utilized simultaneously.

Proposed approach

  • Launch N concurrent workers (e.g. via concurrent.futures.ThreadPoolExecutor or asyncio.gather)
  • Each worker: allocate cf-text for one model on any available node, score all prompts, release
  • Workers run concurrently — coordinator picks the best available node per allocation
  • Collect results and merge into the same report format

Constraints

  • Must still respect --max-vram filtering before queuing
  • Worker count should be bounded (suggest: min(len(models), num_online_gpus) or a --parallel N flag)
  • Results order in the report should match the ranked catalog order, not arrival order
  • try/finally lease release must be preserved per worker

Context

Cluster currently: Heimdall (2x RTX 4000 8 GB), Navi (RTX 4000 8 GB), Strahl (RTX 2060 6 GB). With 4 GPUs available, a 8-model run could complete in ~2x instead of 8x the single-model time.

## Problem The voice benchmark (`scripts/benchmark_voice.py run --cforch`) runs models sequentially — allocate, score all 6 prompts, release, repeat. Even with 3 nodes online (Heimdall, Navi, Strahl), only one GPU is ever busy at a time. ## Goal Fan out model scoring across available cluster nodes in parallel so all GPUs are utilized simultaneously. ## Proposed approach - Launch N concurrent workers (e.g. via `concurrent.futures.ThreadPoolExecutor` or `asyncio.gather`) - Each worker: allocate cf-text for one model on any available node, score all prompts, release - Workers run concurrently — coordinator picks the best available node per allocation - Collect results and merge into the same report format ## Constraints - Must still respect `--max-vram` filtering before queuing - Worker count should be bounded (suggest: min(len(models), num_online_gpus) or a `--parallel N` flag) - Results order in the report should match the ranked catalog order, not arrival order - `try/finally` lease release must be preserved per worker ## Context Cluster currently: Heimdall (2x RTX 4000 8 GB), Navi (RTX 4000 8 GB), Strahl (RTX 2060 6 GB). With 4 GPUs available, a 8-model run could complete in ~2x instead of 8x the single-model time.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Circuit-Forge/avocet#39
No description provided.