turnstone/app/glean/doc_upload.py
pyr0ball aa80f307fe refactor: rename ingest → glean throughout codebase
Renames the app/ingest/ package to app/glean/ and updates all
references across Python modules, shell scripts, Vue components,
tests, and documentation.

Intentionally preserved:
- SQLite column name ingest_time (avoids schema migration)
- RetrievedEntry.ingest_time field (maps to the column above)
- Any public-facing JSON keys that reference ingest_time

Changes by category:
- app/ingest/ → app/glean/ (full package move, all parsers)
- app/tasks/ingest_scheduler.py → app/tasks/glean_scheduler.py
- scripts/ingest_corpus.py → scripts/glean_corpus.py
- tests/test_ingest_*.py → tests/test_glean_*.py
- Docstrings, log messages, comments: ingest → glean
- Env var: TURNSTONE_INGEST_INTERVAL → TURNSTONE_GLEAN_INTERVAL
- Shell scripts: glean.log, glean_corpus.py references
- README.md: multi-source ingest → multi-source glean
- .env.example: updated env var name
- patterns/: new diagnostic patterns from 2026-05-20 SSH incident
  (service_crash_loop, pkg_daemon_restart, ssh_forward_conflict)
- SourcesView.vue: pipeline label updated
- All test import paths updated to app.glean.*

285 tests passing.
2026-05-20 23:02:55 -07:00

43 lines
1.3 KiB
Python

"""Upload adapter: processes file bytes and writes to context store — MIT licensed."""
from __future__ import annotations
import sqlite3
import uuid
from pathlib import Path
from typing import Any
from app.context.chunker import process_upload
from app.context.store import add_document, add_fact
def glean_upload(db_path: Path, filename: str, content: bytes) -> dict[str, Any]:
"""Process an uploaded file and write to context store. Returns result summary."""
doc_type, facts, chunks = process_upload(filename, content)
doc = add_document(
db_path,
filename=filename,
doc_type=doc_type,
full_text=content.decode("utf-8", errors="replace"),
file_size=len(content),
)
for fact in facts:
add_fact(db_path, fact.category, fact.key, fact.value, source="upload")
conn = sqlite3.connect(str(db_path))
conn.execute("PRAGMA journal_mode=WAL")
for i, chunk_text in enumerate(chunks):
conn.execute(
"INSERT INTO context_chunks(id, document_id, chunk_index, text) VALUES (?,?,?,?)",
(str(uuid.uuid4()), doc.id, i, chunk_text),
)
conn.commit()
conn.close()
return {
"document_id": doc.id,
"doc_type": doc_type,
"facts_written": len(facts),
"chunks_written": len(chunks),
}