feat: GET /api/library/sample-chunks — corpus sampling endpoint for Avocet embed bench #6
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context
Avocet's embedding model comparison harness (avocet#59) needs to pull a representative sample of text chunks from Pagepiper to use as a comparison corpus. There is currently no endpoint that returns raw page-level chunks without requiring a search query.
Endpoint
Returns up to
Npage-level chunks sampled from the database (random or ROWID order is fine — not search-ranked).Response shape
Notes
limitdefaults to 50, max 200.Acceptance criteria
GET /api/library/sample-chunks?limit=20returns up to 20 chunks withchunk_id,doc_id,page_number,textfields.[](not 500) if the library is empty.