Two-phase streaming architecture:
Phase 1 (sync thread): IngredientClassifier builds element profiles +
gap list from SQLite — thread-safe, no async context needed
Phase 2 (async): LLMRecipeGenerator.stream_generate() yields tokens via
cf-orch warm vllm (existing /stream-token path) or AsyncOpenAI against
Ollama if the coordinator is unavailable
Backend (app/services/recipe/llm_recipe.py):
- stream_generate() async generator; _try_alloc_for_stream() sync helper
- _stream_openai_compat() static method handles __auto__ model resolution
- LLMRecipeGenerator(None) is safe for streaming (store not used)
Endpoint (app/api/endpoints/recipes.py):
- ?stream=true on POST /recipes/suggest returns StreamingResponse
- X-Accel-Buffering: no prevents nginx buffering without nginx.conf edits
Frontend (api.ts, recipes.ts, RecipesView.vue):
- suggestRecipeStream() uses fetch + ReadableStream (POST; EventSource
only supports GET)
- streamSuggest() action in recipes store builds request internally
- RecipesView.streamRecipe() silently falls back to native SSE when
cf-orch token fetch fails rather than surfacing an error
|
||
|---|---|---|
| .. | ||
| api | ||
| core | ||
| db | ||
| mcp | ||
| models | ||
| services | ||
| staples | ||
| static | ||
| styles | ||
| tasks | ||
| utils | ||
| __init__.py | ||
| cloud_session.py | ||
| main.py | ||
| tiers.py | ||