Design: per-user database isolation for cloud instances #4

Closed
opened 2026-05-05 21:35:55 -07:00 by pyr0ball · 0 comments
Owner

Context

The current cloud instance uses a single shared SQLite database (pagepiper.db) and a single shared vec DB (pagepiper_vecs.db) for all users. This is fine for single-user self-hosting, but a multi-user cloud deployment needs per-user isolation so one user cannot read another's documents.

Design questions to resolve

Isolation strategy:

  • Option A: Per-user DB files — pagepiper_{user_id}.db and pagepiper_vecs_{user_id}.db stored in a per-user data directory. Simplest; maps well to the existing single-user design. Data dir becomes PAGEPIPER_DATA_DIR/{user_id}/.
  • Option B: Shared DB with user_id foreign key on all tables — all queries filtered by user_id. Simpler ops (one file to back up), but requires careful query auditing to prevent cross-user leaks.
  • Option C: Schema-level isolation inside one SQLite file (ATTACH or separate schema per user) — not well supported in SQLite.

Recommendation: Option A. SQLite file-per-user fits the existing architecture and eliminates the possibility of cross-user data leaks from missing WHERE clauses. PAGEPIPER_DATA_DIR becomes {base}/{user_id}/ created on first login.

Scope

  • app/config.py: make DB_PATH and VEC_DB_PATH per-request (derived from authenticated user ID), not module-level constants
  • app/api/library.py, app/api/chat.py, app/api/search.py: thread DB path through from auth context
  • app/services/bm25_index.py: BM25Index must be per-user (currently a shared singleton)
  • app/services/retriever.py: same
  • Auth: Heimdall JWT provides user_id — wire it through as the isolation key
  • Storage layout: PAGEPIPER_DATA_DIR/{user_id}/pagepiper.db, pagepiper_vecs.db, uploads/, books/

Dependencies

  • Heimdall JWT validation middleware (already in cf-core)
  • This design naturally enables per-user quotas (max docs, max storage) at the Heimdall tier level
## Context The current cloud instance uses a single shared SQLite database (`pagepiper.db`) and a single shared vec DB (`pagepiper_vecs.db`) for all users. This is fine for single-user self-hosting, but a multi-user cloud deployment needs per-user isolation so one user cannot read another's documents. ## Design questions to resolve **Isolation strategy:** - Option A: Per-user DB files — `pagepiper_{user_id}.db` and `pagepiper_vecs_{user_id}.db` stored in a per-user data directory. Simplest; maps well to the existing single-user design. Data dir becomes `PAGEPIPER_DATA_DIR/{user_id}/`. - Option B: Shared DB with `user_id` foreign key on all tables — all queries filtered by `user_id`. Simpler ops (one file to back up), but requires careful query auditing to prevent cross-user leaks. - Option C: Schema-level isolation inside one SQLite file (ATTACH or separate schema per user) — not well supported in SQLite. **Recommendation:** Option A. SQLite file-per-user fits the existing architecture and eliminates the possibility of cross-user data leaks from missing WHERE clauses. PAGEPIPER_DATA_DIR becomes `{base}/{user_id}/` created on first login. ## Scope - `app/config.py`: make `DB_PATH` and `VEC_DB_PATH` per-request (derived from authenticated user ID), not module-level constants - `app/api/library.py`, `app/api/chat.py`, `app/api/search.py`: thread DB path through from auth context - `app/services/bm25_index.py`: BM25Index must be per-user (currently a shared singleton) - `app/services/retriever.py`: same - Auth: Heimdall JWT provides `user_id` — wire it through as the isolation key - Storage layout: `PAGEPIPER_DATA_DIR/{user_id}/pagepiper.db`, `pagepiper_vecs.db`, `uploads/`, `books/` ## Dependencies - Heimdall JWT validation middleware (already in cf-core) - This design naturally enables per-user quotas (max docs, max storage) at the Heimdall tier level
pyr0ball added this to the Beta milestone 2026-05-06 09:03:30 -07:00
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Circuit-Forge/pagepiper#4
No description provided.