Community subcategory tagging — two-layer system for user-contributed recipe categorization #118

Closed
opened 2026-04-22 11:41:13 -07:00 by pyr0ball · 1 comment
Owner

Problem

FTS5 subcategory keyword matching fails for recipes where the corpus (food.com) does not use regional/technical terminology. Example: "naples|neapolitan|pizza napoletana|ragù" returns 0 hits even though Neapolitan-style recipes exist — they just use generic titles.

Solution: Two-Layer System

Layer A — Per-recipe community tags (immediate)

Users can tag individual recipes with a domain/category/subcategory from the existing taxonomy. Tags are stored in community PostgreSQL under a pseudonym. Browse counts for a subcategory merge FTS results + accepted community tags. Even a handful of community tags makes a zero-count subcategory browseable.

Flow: "Categorize this" button on recipe cards → pick subcategory from taxonomy picker → stored in community DB → reflected in browse counts on next refresh.

Schema (community Postgres):

CREATE TABLE recipe_tags (
    id           BIGSERIAL PRIMARY KEY,
    recipe_id    INTEGER NOT NULL,
    domain       TEXT NOT NULL,
    category     TEXT NOT NULL,
    subcategory  TEXT,
    pseudonym    TEXT NOT NULL,
    upvotes      INTEGER NOT NULL DEFAULT 0,
    created_at   TIMESTAMPTZ NOT NULL DEFAULT now()
);

Acceptance: Tag shows immediately on submitter's view; appears in public browse counts after reaching a vote threshold (e.g. 2 independent users).

Layer C — Taxonomy keyword proposals (power user, longer term)

Trusted users can propose new keyword strings to add to browser_domains.py subcategory lists. One accepted proposal fixes thousands of recipes in the next tag run rather than tagging one at a time.

Flow: "These keywords are missing" → propose keyword additions → admin reviews → merged into browser_domains.py → infer_recipe_tags.py run → browse_counts refreshed.

Implementation Order

  1. Layer A schema + community Postgres migration
  2. GET /api/v1/recipes/community-tags/{recipe_id} — fetch tags for a recipe
  3. POST /api/v1/recipes/community-tags — submit a tag
  4. POST /api/v1/recipes/community-tags/{id}/upvote — vote on a tag
  5. Browse counts cache merges FTS + accepted community tags on refresh
  6. UI: "Categorize this" button on browser recipe cards
  7. Layer C: keyword proposal endpoint + admin review flow (separate PR)

Notes

  • Tags are keyed by recipe_id (integer corpus ID), not recipe title
  • Pseudonym system already exists — reuse for attribution
  • All tiers can submit tags (community contribution is a core value)
  • Upvote threshold prevents single-user spam; start at 2, tune from data
  • Layer C is a power-user / admin flow, not MVP
## Problem FTS5 subcategory keyword matching fails for recipes where the corpus (food.com) does not use regional/technical terminology. Example: `"naples|neapolitan|pizza napoletana|ragù"` returns 0 hits even though Neapolitan-style recipes exist — they just use generic titles. ## Solution: Two-Layer System ### Layer A — Per-recipe community tags (immediate) Users can tag individual recipes with a domain/category/subcategory from the existing taxonomy. Tags are stored in community PostgreSQL under a pseudonym. Browse counts for a subcategory merge FTS results + accepted community tags. Even a handful of community tags makes a zero-count subcategory browseable. **Flow:** "Categorize this" button on recipe cards → pick subcategory from taxonomy picker → stored in community DB → reflected in browse counts on next refresh. **Schema (community Postgres):** ```sql CREATE TABLE recipe_tags ( id BIGSERIAL PRIMARY KEY, recipe_id INTEGER NOT NULL, domain TEXT NOT NULL, category TEXT NOT NULL, subcategory TEXT, pseudonym TEXT NOT NULL, upvotes INTEGER NOT NULL DEFAULT 0, created_at TIMESTAMPTZ NOT NULL DEFAULT now() ); ``` **Acceptance:** Tag shows immediately on submitter's view; appears in public browse counts after reaching a vote threshold (e.g. 2 independent users). ### Layer C — Taxonomy keyword proposals (power user, longer term) Trusted users can propose new keyword strings to add to `browser_domains.py` subcategory lists. One accepted proposal fixes thousands of recipes in the next tag run rather than tagging one at a time. **Flow:** "These keywords are missing" → propose keyword additions → admin reviews → merged into browser_domains.py → infer_recipe_tags.py run → browse_counts refreshed. ## Implementation Order 1. Layer A schema + community Postgres migration 2. `GET /api/v1/recipes/community-tags/{recipe_id}` — fetch tags for a recipe 3. `POST /api/v1/recipes/community-tags` — submit a tag 4. `POST /api/v1/recipes/community-tags/{id}/upvote` — vote on a tag 5. Browse counts cache merges FTS + accepted community tags on refresh 6. UI: "Categorize this" button on browser recipe cards 7. Layer C: keyword proposal endpoint + admin review flow (separate PR) ## Notes - Tags are keyed by `recipe_id` (integer corpus ID), not recipe title - Pseudonym system already exists — reuse for attribution - All tiers can submit tags (community contribution is a core value) - Upvote threshold prevents single-user spam; start at 2, tune from data - Layer C is a power-user / admin flow, not MVP
Author
Owner

Partial blocker: kiwi#119 (community recipe submission + dedup/clustering) blocks the "create if not exists" sub-feature only. Corpus-recipe tagging (the main feature) can ship independently.

**Partial blocker:** kiwi#119 (community recipe submission + dedup/clustering) blocks the "create if not exists" sub-feature only. Corpus-recipe tagging (the main feature) can ship independently.
pyr0ball added this to the Public Launch milestone 2026-04-24 16:09:32 -07:00
pyr0ball added the
enhancement
label 2026-04-24 16:12:30 -07:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Circuit-Forge/kiwi#118
No description provided.