docs: mark email sync test checklist complete
This commit is contained in:
parent
ad718893ac
commit
ca90b02db9
1 changed files with 59 additions and 59 deletions
|
|
@ -16,91 +16,91 @@ Generated from audit of `scripts/imap_sync.py`.
|
|||
|
||||
## Unit tests — phrase filter
|
||||
|
||||
- [ ] `_has_rejection_or_ats_signal` — rejection phrase at char 1501 (boundary)
|
||||
- [ ] `_has_rejection_or_ats_signal` — right single quote `\u2019` in "don't forget"
|
||||
- [ ] `_has_rejection_or_ats_signal` — left single quote `\u2018` in "don't forget"
|
||||
- [ ] `_has_rejection_or_ats_signal` — ATS subject phrase only checked against subject, not body
|
||||
- [ ] `_has_rejection_or_ats_signal` — spam subject prefix `@` match
|
||||
- [ ] `_has_rejection_or_ats_signal` — `"UNFORTUNATELY"` (uppercase → lowercased correctly)
|
||||
- [ ] `_has_rejection_or_ats_signal` — phrase in body quoted thread (beyond 1500 chars) is not blocked
|
||||
- [x] `_has_rejection_or_ats_signal` — rejection phrase at char 1501 (boundary)
|
||||
- [x] `_has_rejection_or_ats_signal` — right single quote `\u2019` in "don't forget"
|
||||
- [x] `_has_rejection_or_ats_signal` — left single quote `\u2018` in "don't forget"
|
||||
- [x] `_has_rejection_or_ats_signal` — ATS subject phrase only checked against subject, not body
|
||||
- [x] `_has_rejection_or_ats_signal` — spam subject prefix `@` match
|
||||
- [x] `_has_rejection_or_ats_signal` — `"UNFORTUNATELY"` (uppercase → lowercased correctly)
|
||||
- [x] `_has_rejection_or_ats_signal` — phrase in body quoted thread (beyond 1500 chars) is not blocked
|
||||
|
||||
## Unit tests — folder quoting
|
||||
|
||||
- [ ] `_quote_folder("TO DO JOBS")` → `'"TO DO JOBS"'`
|
||||
- [ ] `_quote_folder("INBOX")` → `"INBOX"` (no spaces, no quotes added)
|
||||
- [ ] `_quote_folder('My "Jobs"')` → `'"My \\"Jobs\\""'`
|
||||
- [ ] `_search_folder` — folder doesn't exist → returns `[]`, no exception
|
||||
- [ ] `_search_folder` — special folder `"[Gmail]/All Mail"` (brackets + slash)
|
||||
- [x] `_quote_folder("TO DO JOBS")` → `'"TO DO JOBS"'`
|
||||
- [x] `_quote_folder("INBOX")` → `"INBOX"` (no spaces, no quotes added)
|
||||
- [x] `_quote_folder('My "Jobs"')` → `'"My \\"Jobs\\""'`
|
||||
- [x] `_search_folder` — folder doesn't exist → returns `[]`, no exception
|
||||
- [x] `_search_folder` — special folder `"[Gmail]/All Mail"` (brackets + slash)
|
||||
|
||||
## Unit tests — message-ID dedup
|
||||
|
||||
- [ ] `_get_existing_message_ids` — NULL message_id in DB excluded from set
|
||||
- [ ] `_get_existing_message_ids` — empty string `""` excluded from set
|
||||
- [ ] `_get_existing_message_ids` — job with no contacts returns empty set
|
||||
- [ ] `_parse_message` — email with no Message-ID header returns `None`
|
||||
- [ ] `_parse_message` — email with RFC2047-encoded subject decodes correctly
|
||||
- [ ] No email is inserted twice across two sync runs (integration)
|
||||
- [x] `_get_existing_message_ids` — NULL message_id in DB excluded from set
|
||||
- [x] `_get_existing_message_ids` — empty string `""` excluded from set
|
||||
- [x] `_get_existing_message_ids` — job with no contacts returns empty set
|
||||
- [x] `_parse_message` — email with no Message-ID header returns `None`
|
||||
- [x] `_parse_message` — email with RFC2047-encoded subject decodes correctly
|
||||
- [x] No email is inserted twice across two sync runs (integration)
|
||||
|
||||
## Unit tests — classifier & signal
|
||||
|
||||
- [ ] `classify_stage_signal` — returns one of 5 labels or `None`
|
||||
- [ ] `classify_stage_signal` — returns `None` on LLM error
|
||||
- [ ] `classify_stage_signal` — returns `"neutral"` when no label matched in LLM output
|
||||
- [ ] `classify_stage_signal` — strips `<think>…</think>` blocks
|
||||
- [ ] `_scan_unmatched_leads` — skips when `signal is None`
|
||||
- [ ] `_scan_unmatched_leads` — skips when `signal == "rejected"`
|
||||
- [ ] `_scan_unmatched_leads` — proceeds when `signal == "neutral"`
|
||||
- [ ] `extract_lead_info` — returns `(None, None)` on bad JSON
|
||||
- [ ] `extract_lead_info` — returns `(None, None)` on LLM error
|
||||
- [x] `classify_stage_signal` — returns one of 5 labels or `None`
|
||||
- [x] `classify_stage_signal` — returns `None` on LLM error
|
||||
- [x] `classify_stage_signal` — returns `"neutral"` when no label matched in LLM output
|
||||
- [x] `classify_stage_signal` — strips `<think>…</think>` blocks
|
||||
- [x] `_scan_unmatched_leads` — skips when `signal is None`
|
||||
- [x] `_scan_unmatched_leads` — skips when `signal == "rejected"`
|
||||
- [x] `_scan_unmatched_leads` — proceeds when `signal == "neutral"`
|
||||
- [x] `extract_lead_info` — returns `(None, None)` on bad JSON
|
||||
- [x] `extract_lead_info` — returns `(None, None)` on LLM error
|
||||
|
||||
## Integration tests — TODO label scan
|
||||
|
||||
- [ ] `_scan_todo_label` — `todo_label` empty string → returns 0
|
||||
- [ ] `_scan_todo_label` — `todo_label` missing from config → returns 0
|
||||
- [ ] `_scan_todo_label` — folder doesn't exist on IMAP server → returns 0, no crash
|
||||
- [ ] `_scan_todo_label` — email matches company + action keyword → contact attached
|
||||
- [ ] `_scan_todo_label` — email matches company but no action keyword → skipped
|
||||
- [ ] `_scan_todo_label` — email matches no company term → skipped
|
||||
- [ ] `_scan_todo_label` — duplicate message-ID → not re-inserted
|
||||
- [ ] `_scan_todo_label` — stage_signal set when classifier returns non-neutral
|
||||
- [ ] `_scan_todo_label` — body fallback (company only in body[:300]) → still matches
|
||||
- [ ] `_scan_todo_label` — email handled by `sync_job_emails` first not re-added by label scan
|
||||
- [x] `_scan_todo_label` — `todo_label` empty string → returns 0
|
||||
- [x] `_scan_todo_label` — `todo_label` missing from config → returns 0
|
||||
- [x] `_scan_todo_label` — folder doesn't exist on IMAP server → returns 0, no crash
|
||||
- [x] `_scan_todo_label` — email matches company + action keyword → contact attached
|
||||
- [x] `_scan_todo_label` — email matches company but no action keyword → skipped
|
||||
- [x] `_scan_todo_label` — email matches no company term → skipped
|
||||
- [x] `_scan_todo_label` — duplicate message-ID → not re-inserted
|
||||
- [x] `_scan_todo_label` — stage_signal set when classifier returns non-neutral
|
||||
- [x] `_scan_todo_label` — body fallback (company only in body[:300]) → still matches
|
||||
- [x] `_scan_todo_label` — email handled by `sync_job_emails` first not re-added by label scan
|
||||
|
||||
## Integration tests — unmatched leads
|
||||
|
||||
- [ ] `_scan_unmatched_leads` — genuine lead inserted with synthetic URL `email://domain/hash`
|
||||
- [ ] `_scan_unmatched_leads` — same email not re-inserted on second sync run
|
||||
- [ ] `_scan_unmatched_leads` — duplicate synthetic URL skipped
|
||||
- [ ] `_scan_unmatched_leads` — `extract_lead_info` returns `(None, None)` → no insertion
|
||||
- [ ] `_scan_unmatched_leads` — rejection phrase in body → blocked before LLM
|
||||
- [ ] `_scan_unmatched_leads` — rejection phrase in quoted thread > 1500 chars → passes filter (acceptable)
|
||||
- [x] `_scan_unmatched_leads` — genuine lead inserted with synthetic URL `email://domain/hash`
|
||||
- [x] `_scan_unmatched_leads` — same email not re-inserted on second sync run
|
||||
- [x] `_scan_unmatched_leads` — duplicate synthetic URL skipped
|
||||
- [x] `_scan_unmatched_leads` — `extract_lead_info` returns `(None, None)` → no insertion
|
||||
- [x] `_scan_unmatched_leads` — rejection phrase in body → blocked before LLM
|
||||
- [x] `_scan_unmatched_leads` — rejection phrase in quoted thread > 1500 chars → passes filter (acceptable)
|
||||
|
||||
## Integration tests — full sync
|
||||
|
||||
- [ ] `sync_all` with no active jobs → returns dict with all 6 keys incl. `todo_attached: 0`
|
||||
- [ ] `sync_all` return dict shape identical on all code paths
|
||||
- [ ] `sync_all` with `job_ids` filter → only syncs those jobs
|
||||
- [ ] `sync_all` `dry_run=True` → no DB writes
|
||||
- [ ] `sync_all` `on_stage` callback fires: "connecting", "job N/M", "scanning todo label", "scanning leads"
|
||||
- [ ] `sync_all` IMAP connection error → caught, returned in `errors` list
|
||||
- [ ] `sync_all` per-job exception → other jobs still sync
|
||||
- [x] `sync_all` with no active jobs → returns dict with all 6 keys incl. `todo_attached: 0`
|
||||
- [x] `sync_all` return dict shape identical on all code paths
|
||||
- [x] `sync_all` with `job_ids` filter → only syncs those jobs
|
||||
- [x] `sync_all` `dry_run=True` → no DB writes
|
||||
- [x] `sync_all` `on_stage` callback fires: "connecting", "job N/M", "scanning todo label", "scanning leads"
|
||||
- [x] `sync_all` IMAP connection error → caught, returned in `errors` list
|
||||
- [x] `sync_all` per-job exception → other jobs still sync
|
||||
|
||||
## Config / UI
|
||||
|
||||
- [ ] Settings UI field for `todo_label` (currently YAML-only)
|
||||
- [ ] Warn in sync summary when `todo_label` folder not found on server
|
||||
- [ ] Clear error message when `config/email.yaml` is missing
|
||||
- [ ] `test_email_classify.py --verbose` shows correct blocking phrase for each BLOCK
|
||||
- [x] Settings UI field for `todo_label` (currently YAML-only)
|
||||
- [x] Warn in sync summary when `todo_label` folder not found on server
|
||||
- [x] Clear error message when `config/email.yaml` is missing
|
||||
- [x] `test_email_classify.py --verbose` shows correct blocking phrase for each BLOCK
|
||||
|
||||
## Backlog — Known issues
|
||||
|
||||
- [ ] **The Ladders emails confuse the classifier** — promotional/job alert emails from `@theladders.com` are matching the recruitment keyword filter and being treated as leads. Fix: add a sender-based skip rule in `_scan_unmatched_leads` for known job board senders (similar to how LinkedIn Alert emails are short-circuited before the LLM classifier). Senders to exclude: `@theladders.com`, and audit for others (Glassdoor alerts, Indeed digest, ZipRecruiter, etc.).
|
||||
- [x] **The Ladders emails confuse the classifier** — promotional/job alert emails from `@theladders.com` are matching the recruitment keyword filter and being treated as leads. Fix: add a sender-based skip rule in `_scan_unmatched_leads` for known job board senders (similar to how LinkedIn Alert emails are short-circuited before the LLM classifier). Senders to exclude: `@theladders.com`, and audit for others (Glassdoor alerts, Indeed digest, ZipRecruiter, etc.).
|
||||
|
||||
---
|
||||
|
||||
## Performance & edge cases
|
||||
|
||||
- [ ] Email with 10 000-char body → truncated to 4000 chars, no crash
|
||||
- [ ] Email with binary attachment → `_parse_message` returns valid dict, no crash
|
||||
- [ ] Email with multiple `text/plain` MIME parts → first part taken
|
||||
- [ ] `get_all_message_ids` with 100 000 rows → completes in < 1s
|
||||
- [x] Email with 10 000-char body → truncated to 4000 chars, no crash
|
||||
- [x] Email with binary attachment → `_parse_message` returns valid dict, no crash
|
||||
- [x] Email with multiple `text/plain` MIME parts → first part taken
|
||||
- [x] `get_all_message_ids` with 100 000 rows → completes in < 1s
|
||||
|
|
|
|||
Loading…
Reference in a new issue