voice benchmark: parallel model scoring to fan out across cluster nodes #39
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
The voice benchmark (
scripts/benchmark_voice.py run --cforch) runs models sequentially — allocate, score all 6 prompts, release, repeat. Even with 3 nodes online (Heimdall, Navi, Strahl), only one GPU is ever busy at a time.Goal
Fan out model scoring across available cluster nodes in parallel so all GPUs are utilized simultaneously.
Proposed approach
concurrent.futures.ThreadPoolExecutororasyncio.gather)Constraints
--max-vramfiltering before queuing--parallel Nflag)try/finallylease release must be preserved per workerContext
Cluster currently: Heimdall (2x RTX 4000 8 GB), Navi (RTX 4000 8 GB), Strahl (RTX 2060 6 GB). With 4 GPUs available, a 8-model run could complete in ~2x instead of 8x the single-model time.