Corpus DB hot-reload: pick up weekly Purple Carrot additions without container restart #144

Open
opened 2026-05-22 09:08:58 -07:00 by pyr0ball · 0 comments
Owner

Context

The weekly Purple Carrot harvest pipeline (cron: Sunday 23:00) scrapes the current menu and upserts ~23 new recipes into /Library/Assets/kiwi/kiwi.db via ingest_purplecarrot.py. The cloud container currently mounts the corpus DB read-only at startup, so new rows are invisible until the container restarts.

Problem

  • compose.cloud.yml mounts /Library/Assets/kiwi as a read-only bind mount
  • SQLite is opened once at startup; the in-process connection does not see external writes
  • Weekly harvests silently add data that no cloud user ever sees

Proposed solution

  1. Make the corpus mount read-write — change ro to rw in compose.cloud.yml for the corpus volume. The harvest script already writes to this path from the host; the container just needs to read the updated DB.

  2. Add a /api/admin/reload-corpus endpoint (local-dev / bypass-IP only, never exposed to cloud users) that calls ATTACH DATABASE again or closes and reopens the corpus connection — effectively hot-swapping the attached DB without restarting the container.

    Alternative: a lighter-weight option is a weekly docker restart kiwi-cloud-api-1 scheduled 15 min after the harvest (23:15 Sunday), which is simpler but causes a brief outage window.

  3. Wire into harvest script — add step 4 to weekly_harvest.sh that either calls the reload endpoint or triggers the container restart.

Acceptance criteria

  • Corpus DB mount is read-write in compose.cloud.yml
  • New recipes ingested by ingest_purplecarrot.py are visible to cloud users within 15 min of the harvest run, without a manual restart
  • Reload mechanism is not accessible to unauthenticated cloud users
  • weekly_harvest.sh step 4 handles the reload automatically

Notes

  • The corpus DB is read-only from the app perspective (no writes from the API) — only the pipeline writes to it. Making the mount rw is safe.
  • If going the restart route, add restart: unless-stopped guard so the container comes back automatically.
  • Related: if we add other data sources to the corpus (e.g. Open Food Facts recipes, Ottolenghi scraper), the same reload mechanism covers them all.
## Context The weekly Purple Carrot harvest pipeline (cron: Sunday 23:00) scrapes the current menu and upserts ~23 new recipes into `/Library/Assets/kiwi/kiwi.db` via `ingest_purplecarrot.py`. The cloud container currently mounts the corpus DB read-only at startup, so new rows are invisible until the container restarts. ## Problem - `compose.cloud.yml` mounts `/Library/Assets/kiwi` as a read-only bind mount - SQLite is opened once at startup; the in-process connection does not see external writes - Weekly harvests silently add data that no cloud user ever sees ## Proposed solution 1. **Make the corpus mount read-write** — change `ro` to `rw` in `compose.cloud.yml` for the corpus volume. The harvest script already writes to this path from the host; the container just needs to read the updated DB. 2. **Add a `/api/admin/reload-corpus` endpoint** (local-dev / bypass-IP only, never exposed to cloud users) that calls `ATTACH DATABASE` again or closes and reopens the corpus connection — effectively hot-swapping the attached DB without restarting the container. Alternative: a lighter-weight option is a weekly `docker restart kiwi-cloud-api-1` scheduled 15 min after the harvest (23:15 Sunday), which is simpler but causes a brief outage window. 3. **Wire into harvest script** — add step 4 to `weekly_harvest.sh` that either calls the reload endpoint or triggers the container restart. ## Acceptance criteria - [ ] Corpus DB mount is read-write in `compose.cloud.yml` - [ ] New recipes ingested by `ingest_purplecarrot.py` are visible to cloud users within 15 min of the harvest run, without a manual restart - [ ] Reload mechanism is not accessible to unauthenticated cloud users - [ ] `weekly_harvest.sh` step 4 handles the reload automatically ## Notes - The corpus DB is read-only from the app perspective (no writes from the API) — only the pipeline writes to it. Making the mount `rw` is safe. - If going the restart route, add `restart: unless-stopped` guard so the container comes back automatically. - Related: if we add other data sources to the corpus (e.g. Open Food Facts recipes, Ottolenghi scraper), the same reload mechanism covers them all.
pyr0ball added the
enhancement
label 2026-06-01 12:11:30 -07:00
pyr0ball added this to the Post-Launch milestone 2026-06-01 12:11:30 -07:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Circuit-Forge/kiwi#144
No description provided.