Community recipe submission: dedup detection and variation clustering #119
Labels
No labels
accessibility
backlog
beta-feedback
bug
duplicate
enhancement
feature-request
help wanted
invalid
needs-design
needs-triage
question
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: Circuit-Forge/kiwi#119
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context
Blocks the "create if not exists" sub-feature of kiwi#118 (community subcategory tagging). Before a user can submit a new recipe to the community pool, we need to prevent near-duplicate proliferation and model the variation relationship between recipes.
Problem
Without dedup/clustering, the community pool accumulates 40 versions of "chocolate chip cookies" and the tagging system becomes noise. The corpus already has 3.2M recipes — most dishes a user wants to contribute probably exist already under a different title.
Proposed Approach
Layer 1 — FTS title search (instant, at submission time)
Before accepting a submission, search corpus + community pool by title. Show top 5 matches: "These recipes look similar — is yours different?" The user can tag an existing recipe instead of creating a duplicate.
Layer 2 — Ingredient Jaccard overlap (cheap, in-process)
For top FTS title hits, compute
|intersection| / |union|oningredient_namesJSON arrays. Jaccard ≥ 0.7 → flag as "very similar". Present similarity tier in UI (very similar / somewhat similar / different) to help the user decide.Layer 3 — Variation clustering (schema)
Some recipes are legitimately different but belong to the same dish family (NY Style vs Neapolitan Pizza). Community-submitted recipes should be able to declare themselves a variation of a corpus or community recipe via a
similar_to_refFK. Browse can then surface or group variations.Schema Sketch
Acceptance Criteria
similar_to_refvariation linkBlocks
kiwi#118 "create if not exists" sub-feature only. The corpus-recipe tagging path in kiwi#118 can ship independently.