Multiple concurrent users browsing the 3.2M recipe corpus would cause FTS5 page
cache contention and slow per-request queries. Solution: pre-compute counts for
all category/subcategory keyword sets into a small SQLite cache.
- browse_counts_cache.py: refresh(), load_into_memory(), is_stale() helpers
- config.py: BROWSE_COUNTS_PATH setting (default DATA_DIR/browse_counts.db)
- main.py: warms in-memory cache on startup; runs nightly refresh task every 24h
- infer_recipe_tags.py: auto-refreshes cache after a successful tag run so the
app picks up updated FTS counts without a restart
Switches to OrchestratedScheduler in cloud mode so concurrent recipe_llm
jobs fan out across all registered cf-orch GPU nodes instead of serializing
on one. Under load this eliminates poll timeouts from queue backup.
USE_ORCH_SCHEDULER env var gives explicit control independent of CLOUD_MODE:
unset follow CLOUD_MODE (cloud=orch, local=local)
true OrchestratedScheduler always (e.g. multi-GPU local rig)
false LocalScheduler always (e.g. cloud single-GPU dev instance)
ImportError fallback: if circuitforge_orch is not installed and orch is
requested, logs a warning and falls back to LocalScheduler gracefully.
Add E2E_TEST_USER_ID setting (opt-in via env); session bootstrap logs
at DEBUG instead of INFO for the known test user so test runs don't
inflate session counts. Still visible with DEBUG=true.
- .env.example: document ANTHROPIC_API_KEY, OPENAI_API_KEY, OLLAMA_HOST,
OLLAMA_MODEL, CF_ORCH_URL, CF_LICENSE_KEY with usage comments
- config.py: expose CF_LICENSE_KEY in Settings for startup visibility
- pyproject.toml: pin circuitforge-core >= 0.6.0 (env-var auto-config +
CFOrchClient bearer auth land in 0.6.0)
Bare-metal self-hosters can now run Kiwi with only OLLAMA_HOST set and
zero yaml config. Paid+ users set CF_ORCH_URL + CF_LICENSE_KEY for
managed cloud GPU inference.