App: Peregrine Company: Circuit Forge LLC Source: github.com/pyr0ball/job-seeker (personal fork, not linked)
5.9 KiB
5.9 KiB
Email Sync — Testing Checklist
Generated from audit of scripts/imap_sync.py.
Bugs fixed (2026-02-23)
- Gmail label with spaces not quoted for IMAP SELECT →
_quote_folder()added _quote_folderdidn't escape internal double-quotes → RFC 3501 escaping addedsignal is Nonein_scan_unmatched_leadsallowed classifier failures through → now skips- Email with no Message-ID re-inserted on every sync →
_parse_messagereturnsNonewhen ID missing todo_attachedmissing from early-return dict insync_all→ added- Body phrase check truncated at 800 chars (rejection footers missed) → bumped to 1500
_DONT_FORGET_VARIANTSmissing left single quotation mark\u2018→ added
Unit tests — phrase filter
_has_rejection_or_ats_signal— rejection phrase at char 1501 (boundary)_has_rejection_or_ats_signal— right single quote\u2019in "don't forget"_has_rejection_or_ats_signal— left single quote\u2018in "don't forget"_has_rejection_or_ats_signal— ATS subject phrase only checked against subject, not body_has_rejection_or_ats_signal— spam subject prefix@match_has_rejection_or_ats_signal—"UNFORTUNATELY"(uppercase → lowercased correctly)_has_rejection_or_ats_signal— phrase in body quoted thread (beyond 1500 chars) is not blocked
Unit tests — folder quoting
_quote_folder("TO DO JOBS")→'"TO DO JOBS"'_quote_folder("INBOX")→"INBOX"(no spaces, no quotes added)_quote_folder('My "Jobs"')→'"My \\"Jobs\\""'_search_folder— folder doesn't exist → returns[], no exception_search_folder— special folder"[Gmail]/All Mail"(brackets + slash)
Unit tests — message-ID dedup
_get_existing_message_ids— NULL message_id in DB excluded from set_get_existing_message_ids— empty string""excluded from set_get_existing_message_ids— job with no contacts returns empty set_parse_message— email with no Message-ID header returnsNone_parse_message— email with RFC2047-encoded subject decodes correctly- No email is inserted twice across two sync runs (integration)
Unit tests — classifier & signal
classify_stage_signal— returns one of 5 labels orNoneclassify_stage_signal— returnsNoneon LLM errorclassify_stage_signal— returns"neutral"when no label matched in LLM outputclassify_stage_signal— strips<think>…</think>blocks_scan_unmatched_leads— skips whensignal is None_scan_unmatched_leads— skips whensignal == "rejected"_scan_unmatched_leads— proceeds whensignal == "neutral"extract_lead_info— returns(None, None)on bad JSONextract_lead_info— returns(None, None)on LLM error
Integration tests — TODO label scan
_scan_todo_label—todo_labelempty string → returns 0_scan_todo_label—todo_labelmissing from config → returns 0_scan_todo_label— folder doesn't exist on IMAP server → returns 0, no crash_scan_todo_label— email matches company + action keyword → contact attached_scan_todo_label— email matches company but no action keyword → skipped_scan_todo_label— email matches no company term → skipped_scan_todo_label— duplicate message-ID → not re-inserted_scan_todo_label— stage_signal set when classifier returns non-neutral_scan_todo_label— body fallback (company only in body[:300]) → still matches_scan_todo_label— email handled bysync_job_emailsfirst not re-added by label scan
Integration tests — unmatched leads
_scan_unmatched_leads— genuine lead inserted with synthetic URLemail://domain/hash_scan_unmatched_leads— same email not re-inserted on second sync run_scan_unmatched_leads— duplicate synthetic URL skipped_scan_unmatched_leads—extract_lead_inforeturns(None, None)→ no insertion_scan_unmatched_leads— rejection phrase in body → blocked before LLM_scan_unmatched_leads— rejection phrase in quoted thread > 1500 chars → passes filter (acceptable)
Integration tests — full sync
sync_allwith no active jobs → returns dict with all 6 keys incl.todo_attached: 0sync_allreturn dict shape identical on all code pathssync_allwithjob_idsfilter → only syncs those jobssync_alldry_run=True→ no DB writessync_allon_stagecallback fires: "connecting", "job N/M", "scanning todo label", "scanning leads"sync_allIMAP connection error → caught, returned inerrorslistsync_allper-job exception → other jobs still sync
Config / UI
- Settings UI field for
todo_label(currently YAML-only) - Warn in sync summary when
todo_labelfolder not found on server - Clear error message when
config/email.yamlis missing test_email_classify.py --verboseshows correct blocking phrase for each BLOCK
Backlog — Known issues
- The Ladders emails confuse the classifier — promotional/job alert emails from
@theladders.comare matching the recruitment keyword filter and being treated as leads. Fix: add a sender-based skip rule in_scan_unmatched_leadsfor known job board senders (similar to how LinkedIn Alert emails are short-circuited before the LLM classifier). Senders to exclude:@theladders.com, and audit for others (Glassdoor alerts, Indeed digest, ZipRecruiter, etc.).
Performance & edge cases
- Email with 10 000-char body → truncated to 4000 chars, no crash
- Email with binary attachment →
_parse_messagereturns valid dict, no crash - Email with multiple
text/plainMIME parts → first part taken get_all_message_idswith 100 000 rows → completes in < 1s