kiwi/tests/pipeline/test_build_flavorgraph_index.py
pyr0ball 33a5cdec37 feat: cloud auth bypass, VRAM leasing, barcode EXIF fix, pipeline improvements
- cloud_session.py: CLOUD_AUTH_BYPASS_IPS with CIDR support; X-Real-IP for
  Docker bridge NAT-aware client IP resolution; local-dev DB path under
  CLOUD_DATA_ROOT for bypass sessions
- compose.cloud.yml: thread CLOUD_AUTH_BYPASS_IPS from shell env; document
  Docker bridge CIDR requirement in .env.example
- nginx.cloud.conf + nginx.conf: client_max_body_size 20m for barcode uploads
- barcode_scanner.py: EXIF orientation correction (PIL ImageOps.exif_transpose)
  before cv2 decode; rotation coverage extended to [90, 180, 270, 45, 135]
  to catch sideways barcodes the 270° case was missing
- llm_recipe.py: CF-core VRAM lease acquire/release wrapping LLMRouter calls
- tasks/runner.py + config.py: COORDINATOR_URL + recipe_llm VRAM budget (4GB)
- recipes.py: per-request Store creation inside asyncio.to_thread worker to
  avoid SQLite check_same_thread violations
- download_datasets.py: HF_PARQUET_FILES strategy for repos without dataset
  builders (lishuyang/recipepairs direct parquet download)
- derive_substitutions.py: use recipepairs_recipes.parquet for ingredient
  lookup; numpy array detection; JSON category parsing
- test_build_flavorgraph_index.py: rewritten for CSV-based index format
- pyproject.toml: add Pillow>=10.0 for EXIF rotation support
2026-04-01 16:06:23 -07:00

39 lines
1.5 KiB
Python

import csv
import tempfile
from pathlib import Path
def _write_csv(path: Path, rows: list[dict], fieldnames: list[str]) -> None:
with open(path, "w", newline="") as f:
w = csv.DictWriter(f, fieldnames=fieldnames)
w.writeheader()
w.writerows(rows)
def test_parse_flavorgraph_node():
from scripts.pipeline.build_flavorgraph_index import parse_ingredient_nodes
with tempfile.TemporaryDirectory() as tmp:
nodes_path = Path(tmp) / "nodes.csv"
edges_path = Path(tmp) / "edges.csv"
_write_csv(nodes_path, [
{"node_id": "1", "name": "beef", "node_type": "ingredient"},
{"node_id": "2", "name": "pyrazine", "node_type": "compound"},
{"node_id": "3", "name": "mushroom", "node_type": "ingredient"},
], ["node_id", "name", "node_type"])
_write_csv(edges_path, [
{"id_1": "1", "id_2": "2", "score": "0.8"},
{"id_1": "3", "id_2": "2", "score": "0.7"},
], ["id_1", "id_2", "score"])
ingredient_to_compounds, compound_names = parse_ingredient_nodes(nodes_path, edges_path)
assert "beef" in ingredient_to_compounds
assert "mushroom" in ingredient_to_compounds
# compound node_id "2" maps to name "pyrazine"
beef_compounds = ingredient_to_compounds["beef"]
assert any(compound_names.get(c) == "pyrazine" for c in beef_compounds)
mushroom_compounds = ingredient_to_compounds["mushroom"]
assert any(compound_names.get(c) == "pyrazine" for c in mushroom_compounds)