Migrate voice server from Flask to FastAPI + uvicorn #2

New issue

Open

opened 2026-04-06 22:25:18 -07:00 by pyr0ball · 0 comments

pyr0ball commented

2026-04-06 22:25:18 -07:00

Owner

Summary

Both voice_server.py and voice_server_enhanced.py use Flask. CF convention is FastAPI + uvicorn. Flask's dev server is single-threaded and not production-safe for concurrent audio uploads.

Specific issues with current Flask code

app.run(debug=False) is Flask's dev server — not production-safe
BadRequest exceptions raised manually; FastAPI handles with 422 + Pydantic validation
Whisper inference blocks the request handler thread synchronously
/wake-word/detections poll endpoint would be better as WebSocket or SSE (awkward in Flask)
Global mutable state (global whisper_model, global ha_client) needs to become dependency-injected instances — easier to wire in FastAPI

Migration approach

Replace Flask app with FastAPI app
Move sync Whisper inference to run_in_executor (non-blocking)
Replace polling endpoint with WebSocket or SSE for wake word events
Replace manual BadRequest with Pydantic request models
Wire uvicorn as the runner (in manage.sh)

References

cf-dev review finding: Priority 2
Peregrine FastAPI patterns as reference

## Summary Both `voice_server.py` and `voice_server_enhanced.py` use Flask. CF convention is FastAPI + uvicorn. Flask's dev server is single-threaded and not production-safe for concurrent audio uploads. ## Specific issues with current Flask code - `app.run(debug=False)` is Flask's dev server — not production-safe - `BadRequest` exceptions raised manually; FastAPI handles with 422 + Pydantic validation - Whisper inference blocks the request handler thread synchronously - `/wake-word/detections` poll endpoint would be better as WebSocket or SSE (awkward in Flask) - Global mutable state (`global whisper_model`, `global ha_client`) needs to become dependency-injected instances — easier to wire in FastAPI ## Migration approach 1. Replace Flask app with FastAPI app 2. Move sync Whisper inference to `run_in_executor` (non-blocking) 3. Replace polling endpoint with WebSocket or SSE for wake word events 4. Replace manual `BadRequest` with Pydantic request models 5. Wire uvicorn as the runner (in manage.sh) ## References - cf-dev review finding: Priority 2 - Peregrine FastAPI patterns as reference