# Email Sync — Testing Checklist Generated from audit of `scripts/imap_sync.py`. ## Bugs fixed (2026-02-23) - [x] Gmail label with spaces not quoted for IMAP SELECT → `_quote_folder()` added - [x] `_quote_folder` didn't escape internal double-quotes → RFC 3501 escaping added - [x] `signal is None` in `_scan_unmatched_leads` allowed classifier failures through → now skips - [x] Email with no Message-ID re-inserted on every sync → `_parse_message` returns `None` when ID missing - [x] `todo_attached` missing from early-return dict in `sync_all` → added - [x] Body phrase check truncated at 800 chars (rejection footers missed) → bumped to 1500 - [x] `_DONT_FORGET_VARIANTS` missing left single quotation mark `\u2018` → added --- ## Unit tests — phrase filter - [ ] `_has_rejection_or_ats_signal` — rejection phrase at char 1501 (boundary) - [ ] `_has_rejection_or_ats_signal` — right single quote `\u2019` in "don't forget" - [ ] `_has_rejection_or_ats_signal` — left single quote `\u2018` in "don't forget" - [ ] `_has_rejection_or_ats_signal` — ATS subject phrase only checked against subject, not body - [ ] `_has_rejection_or_ats_signal` — spam subject prefix `@` match - [ ] `_has_rejection_or_ats_signal` — `"UNFORTUNATELY"` (uppercase → lowercased correctly) - [ ] `_has_rejection_or_ats_signal` — phrase in body quoted thread (beyond 1500 chars) is not blocked ## Unit tests — folder quoting - [ ] `_quote_folder("TO DO JOBS")` → `'"TO DO JOBS"'` - [ ] `_quote_folder("INBOX")` → `"INBOX"` (no spaces, no quotes added) - [ ] `_quote_folder('My "Jobs"')` → `'"My \\"Jobs\\""'` - [ ] `_search_folder` — folder doesn't exist → returns `[]`, no exception - [ ] `_search_folder` — special folder `"[Gmail]/All Mail"` (brackets + slash) ## Unit tests — message-ID dedup - [ ] `_get_existing_message_ids` — NULL message_id in DB excluded from set - [ ] `_get_existing_message_ids` — empty string `""` excluded from set - [ ] `_get_existing_message_ids` — job with no contacts returns empty set - [ ] `_parse_message` — email with no Message-ID header returns `None` - [ ] `_parse_message` — email with RFC2047-encoded subject decodes correctly - [ ] No email is inserted twice across two sync runs (integration) ## Unit tests — classifier & signal - [ ] `classify_stage_signal` — returns one of 5 labels or `None` - [ ] `classify_stage_signal` — returns `None` on LLM error - [ ] `classify_stage_signal` — returns `"neutral"` when no label matched in LLM output - [ ] `classify_stage_signal` — strips `` blocks - [ ] `_scan_unmatched_leads` — skips when `signal is None` - [ ] `_scan_unmatched_leads` — skips when `signal == "rejected"` - [ ] `_scan_unmatched_leads` — proceeds when `signal == "neutral"` - [ ] `extract_lead_info` — returns `(None, None)` on bad JSON - [ ] `extract_lead_info` — returns `(None, None)` on LLM error ## Integration tests — TODO label scan - [ ] `_scan_todo_label` — `todo_label` empty string → returns 0 - [ ] `_scan_todo_label` — `todo_label` missing from config → returns 0 - [ ] `_scan_todo_label` — folder doesn't exist on IMAP server → returns 0, no crash - [ ] `_scan_todo_label` — email matches company + action keyword → contact attached - [ ] `_scan_todo_label` — email matches company but no action keyword → skipped - [ ] `_scan_todo_label` — email matches no company term → skipped - [ ] `_scan_todo_label` — duplicate message-ID → not re-inserted - [ ] `_scan_todo_label` — stage_signal set when classifier returns non-neutral - [ ] `_scan_todo_label` — body fallback (company only in body[:300]) → still matches - [ ] `_scan_todo_label` — email handled by `sync_job_emails` first not re-added by label scan ## Integration tests — unmatched leads - [ ] `_scan_unmatched_leads` — genuine lead inserted with synthetic URL `email://domain/hash` - [ ] `_scan_unmatched_leads` — same email not re-inserted on second sync run - [ ] `_scan_unmatched_leads` — duplicate synthetic URL skipped - [ ] `_scan_unmatched_leads` — `extract_lead_info` returns `(None, None)` → no insertion - [ ] `_scan_unmatched_leads` — rejection phrase in body → blocked before LLM - [ ] `_scan_unmatched_leads` — rejection phrase in quoted thread > 1500 chars → passes filter (acceptable) ## Integration tests — full sync - [ ] `sync_all` with no active jobs → returns dict with all 6 keys incl. `todo_attached: 0` - [ ] `sync_all` return dict shape identical on all code paths - [ ] `sync_all` with `job_ids` filter → only syncs those jobs - [ ] `sync_all` `dry_run=True` → no DB writes - [ ] `sync_all` `on_stage` callback fires: "connecting", "job N/M", "scanning todo label", "scanning leads" - [ ] `sync_all` IMAP connection error → caught, returned in `errors` list - [ ] `sync_all` per-job exception → other jobs still sync ## Config / UI - [ ] Settings UI field for `todo_label` (currently YAML-only) - [ ] Warn in sync summary when `todo_label` folder not found on server - [ ] Clear error message when `config/email.yaml` is missing - [ ] `test_email_classify.py --verbose` shows correct blocking phrase for each BLOCK ## Backlog — Known issues - [ ] **The Ladders emails confuse the classifier** — promotional/job alert emails from `@theladders.com` are matching the recruitment keyword filter and being treated as leads. Fix: add a sender-based skip rule in `_scan_unmatched_leads` for known job board senders (similar to how LinkedIn Alert emails are short-circuited before the LLM classifier). Senders to exclude: `@theladders.com`, and audit for others (Glassdoor alerts, Indeed digest, ZipRecruiter, etc.). --- ## Performance & edge cases - [ ] Email with 10 000-char body → truncated to 4000 chars, no crash - [ ] Email with binary attachment → `_parse_message` returns valid dict, no crash - [ ] Email with multiple `text/plain` MIME parts → first part taken - [ ] `get_all_message_ids` with 100 000 rows → completes in < 1s