Design: encryption at rest for cloud user data #5

Closed
opened 2026-05-05 21:35:55 -07:00 by pyr0ball · 0 comments
Owner

Context

Cloud user documents (PDFs, EPUBs) and extracted text chunks stored in SQLite are currently unencrypted on the host filesystem. For users storing proprietary or sensitive documents (e.g. internal rulebooks, private manuscripts), encryption at rest is a reasonable expectation.

Scope

What needs encrypting:

  • pagepiper.db — document metadata and extracted text chunks (the most sensitive: full extracted text)
  • pagepiper_vecs.db — embedding vectors (lower sensitivity, but linked to user content)
  • Uploaded files in uploads/ — raw PDFs and EPUBs

What does NOT need encrypting:

  • The BM25 index (in-memory, rebuilt from DB on startup)
  • Logs

Options

Option A: SQLCipher for DB files

  • Drop-in SQLite replacement with AES-256 encryption
  • Key derived from user passphrase or a server-held per-user key
  • Pro: transparent to existing SQLite code (sqlcipher3 Python binding)
  • Con: adds a native dependency; key management complexity; not compatible with sqlite-vec virtual tables (needs investigation)

Option B: Filesystem-level encryption

  • Per-user data directory encrypted with fscrypt (Linux) or eCryptFS
  • Transparent to all app code — no DB changes needed
  • Pro: covers all files including uploads; no app-level changes
  • Con: requires root to set up per-user directories; key management at OS level

Option C: Application-level field encryption

  • Encrypt the text column in page_chunks using a per-user key before insert, decrypt on read
  • Pro: no native deps, works with existing SQLite
  • Con: BM25 full-text search breaks (can't index encrypted text); significant code changes

Recommendation: Option B (fscrypt) for cloud instances behind the managed operator model. Option A (SQLCipher) as a stretch goal for self-hosters who want encryption without OS-level setup. Option C is not viable due to BM25 incompatibility.

Notes

  • Key management is the hard part: where does the per-user encryption key live? Options: derived from Heimdall JWT secret (server-held), user passphrase (zero-knowledge but loses server-side recoverability), or HSM.
  • Evaluate sqlite-vec compatibility with SQLCipher before committing to Option A
  • File as a follow-on to per-user DB isolation (#N) — isolation must land first
## Context Cloud user documents (PDFs, EPUBs) and extracted text chunks stored in SQLite are currently unencrypted on the host filesystem. For users storing proprietary or sensitive documents (e.g. internal rulebooks, private manuscripts), encryption at rest is a reasonable expectation. ## Scope **What needs encrypting:** - `pagepiper.db` — document metadata and extracted text chunks (the most sensitive: full extracted text) - `pagepiper_vecs.db` — embedding vectors (lower sensitivity, but linked to user content) - Uploaded files in `uploads/` — raw PDFs and EPUBs **What does NOT need encrypting:** - The BM25 index (in-memory, rebuilt from DB on startup) - Logs ## Options **Option A: SQLCipher for DB files** - Drop-in SQLite replacement with AES-256 encryption - Key derived from user passphrase or a server-held per-user key - Pro: transparent to existing SQLite code (sqlcipher3 Python binding) - Con: adds a native dependency; key management complexity; not compatible with sqlite-vec virtual tables (needs investigation) **Option B: Filesystem-level encryption** - Per-user data directory encrypted with fscrypt (Linux) or eCryptFS - Transparent to all app code — no DB changes needed - Pro: covers all files including uploads; no app-level changes - Con: requires root to set up per-user directories; key management at OS level **Option C: Application-level field encryption** - Encrypt the `text` column in `page_chunks` using a per-user key before insert, decrypt on read - Pro: no native deps, works with existing SQLite - Con: BM25 full-text search breaks (can't index encrypted text); significant code changes **Recommendation:** Option B (fscrypt) for cloud instances behind the managed operator model. Option A (SQLCipher) as a stretch goal for self-hosters who want encryption without OS-level setup. Option C is not viable due to BM25 incompatibility. ## Notes - Key management is the hard part: where does the per-user encryption key live? Options: derived from Heimdall JWT secret (server-held), user passphrase (zero-knowledge but loses server-side recoverability), or HSM. - Evaluate sqlite-vec compatibility with SQLCipher before committing to Option A - File as a follow-on to per-user DB isolation (#N) — isolation must land first
pyr0ball added this to the Beta milestone 2026-05-06 09:03:30 -07:00
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Circuit-Forge/pagepiper#5
No description provided.