Compare commits

..

3 commits

Author SHA1 Message Date
d7c8a8bca6 docs(readme): landing page rewrite — corrected tagline, hero screenshot, platform table, sniping engine roadmap, split license
Some checks are pending
CI / Python tests (push) Waiting to run
CI / Frontend typecheck + tests (push) Waiting to run
Mirror / mirror (push) Waiting to run
2026-05-06 08:51:37 -07:00
108f63b4f2 fix(browser-pool): replace queue with thread-local storage to fix Playwright cross-thread crash (#53)
Playwright's sync API binds its greenlet event loop to the creating thread.
Sharing pre-warmed slots across threads caused "cannot switch to a different
thread" panics under uvicorn. New design: each worker thread owns its own
Playwright instance created lazily on first fetch_html() call. A registry
dict keyed by thread-id lets stop() close all slots at shutdown. Removes
ThreadPoolExecutor warmup and idle-cleanup daemon thread entirely.
2026-05-04 09:27:20 -07:00
bccedb1fe5 fix(trust): treat feedback_ratio=0.0 as missing data for buyer-only/returning sellers (#52)
eBay omits the 12-month positive percentage for returning sellers and
buyer-only accounts with no recent sales. Previously ratio=0.0 with
count>0 triggered established_bad_actor; now it returns None from the
scorer (score_is_partial=True) and emits a soft no_recent_seller_data
flag instead. ratio=0.0 with count=0 is still treated as no-history.
2026-05-04 09:24:27 -07:00
7 changed files with 403 additions and 543 deletions

358
README.md
View file

@ -1,29 +1,81 @@
# Snipe — Auction Sniping & Listing Intelligence
<!-- Logo coming soon — replace docs/snipe-logo.svg when final icon ships -->
<div align="center">
<img src="docs/snipe-logo.svg" alt="Snipe logo" width="120" />
> *Part of the Circuit Forge LLC "AI for the tasks you hate most" suite.*
# Snipe
**Status:** Active — eBay listing intelligence MVP complete; Mercari search + trust scoring live. Auction sniping engine and additional platforms are next.
**Auction intelligence and sniping for people who don't trust the platform.**
**[Documentation](https://docs.circuitforge.tech/snipe/)** · [circuitforge.tech](https://circuitforge.tech)
[![License: MIT / BSL 1.1](https://img.shields.io/badge/license-MIT%20%2F%20BSL%201.1-blue)](LICENSE)
[![Status: Beta](https://img.shields.io/badge/status-beta-yellow)]()
[![Forgejo](https://img.shields.io/badge/primary%20repo-Forgejo-orange)](https://git.opensourcesolarpunk.com/Circuit-Forge/snipe)
[![Docs](https://img.shields.io/badge/docs-docs.circuitforge.tech%2Fsnipe-green)](https://docs.circuitforge.tech/snipe)
## Quick install (self-hosted)
*Part of the Circuit Forge LLC suite — "AI for the tasks the system made hard on purpose."*
</div>
**Requirements:** Docker with Compose plugin, Git. No API keys needed to get started.
---
![Snipe hero screenshot — search results with trust score badges, STEAL price flags, and red flag indicators](docs/screenshots/hero.png)
---
## Why Snipe?
Auction platforms are designed to make you act fast and trust blindly. The closing countdown, the hidden price history, the new-account seller with one feedback — all of it is structured against the buyer.
Snipe inverts that. Before you place a bid, you get a trust score built from five independently sourced signals: seller account age, feedback volume, feedback ratio, price versus recent completed sales, and category history. A hard-coded red flag for new accounts or bad actors overrides the composite. Soft flags surface buried damage disclosures, duplicate photos, and listings that have been sitting unsold for weeks. When the listing is priced well below market, you see a STEAL badge — sourced from eBay Marketplace Insights, not from the seller's description.
The sniping engine — precise last-second bid submission with NTP (network time protocol) synchronization and soft-close handling — is next on the roadmap. The intelligence layer is live now.
---
## Features
### Listing intelligence (live)
- **Trust scoring** — five-signal composite score (0100) per listing: account age, feedback count, feedback ratio, price vs. market, category history
- **Red flag detection** — hard flags for new accounts and established bad actors; soft flags for damage keywords, evasive language, duplicate photos, long-on-market listings, and significant price drops
- **Price vs. market** — listing price compared against completed-sale medians via eBay Marketplace Insights API (Browse API fallback)
- **Keyword filtering** — must-include (AND / ANY / OR-groups), must-exclude, category, price range; OR-groups expand into multiple targeted queries so eBay relevance doesn't silently drop variants
- **Saved searches** — one-click re-run that restores all filter settings
- **Background enrichment** — seller account age scraped via Playwright + Xvfb (Kasada/Cloudflare-safe headed Chromium); on-demand re-score per listing without re-searching
- **LLM query builder** — describe what you want in plain language; an LLM builds the search terms (paid tier)
- **Vision photo assessment** — condition scoring from listing photos via moondream2 locally or Claude vision (paid/cloud); VRAM-aware scheduling via circuitforge-core task scheduler
- **Affiliate link builder** — eBay Partner Network wrapping with user BYOK support and per-retailer disclosure
### Platforms
| Platform | Search | Trust scoring | Completed-sale comps |
|----------|--------|---------------|----------------------|
| **eBay** | Browse API + Playwright fallback | All 5 signals | Marketplace Insights + Browse fallback |
| **Mercari** | Playwright scraper | 3/5 signals (partial) | Phase 3 |
| CT Bids, HiBid, AuctionZip, Invaluable, GovPlanet, Bidsquare, Proxibid | Planned | Planned | Planned |
### Auction sniping engine (roadmap)
- NTP-synchronized last-second bid submission
- Soft-close detection and strategy adjustment
- Proxy bid ladder with configurable max
- Human approval gate before any bid executes
- Post-win workflow: payment routing, shipping coordination, provenance documentation
---
## Quick Start
**Requirements:** Docker with Compose plugin, Git. No API keys required to get started.
```bash
# One-line install — clones to ~/snipe by default
bash <(curl -fsSL https://git.opensourcesolarpunk.com/Circuit-Forge/snipe/raw/branch/main/install.sh)
# Or clone manually and run the script:
git clone https://git.opensourcesolarpunk.com/Circuit-Forge/snipe.git
bash snipe/install.sh
```
Then open **http://localhost:8509**.
### Manual setup (if you prefer)
### Manual setup
Snipe's API image is built from a parent context that includes `circuitforge-core`. Both repos must sit as siblings in the same directory:
Snipe's API image builds from a parent context that includes `circuitforge-core`. Both repos must sit as siblings:
```
workspace/
@ -36,286 +88,86 @@ mkdir snipe-workspace && cd snipe-workspace
git clone https://git.opensourcesolarpunk.com/Circuit-Forge/snipe.git
git clone https://git.opensourcesolarpunk.com/Circuit-Forge/circuitforge-core.git
cd snipe
cp .env.example .env # edit if you have eBay API credentials (optional)
cp .env.example .env # add eBay API credentials if you have them (optional)
./manage.sh start
```
### Optional: eBay API credentials
Snipe works without any credentials using its Playwright scraper fallback. Adding eBay API credentials unlocks faster searches and inline seller account age (no extra scrape needed):
Snipe works without credentials using its Playwright scraper fallback. Adding credentials unlocks faster searches and inline seller account age without an extra scrape:
1. Register at [developer.ebay.com](https://developer.ebay.com/my/keys)
2. Copy your Production **App ID** and **Cert ID** into `.env`
3. Restart: `./manage.sh restart`
3. `./manage.sh restart`
---
## What it does
## Tiers
Snipe has two layers that work together:
| Tier | What you get |
|------|-------------|
| **Free** | eBay + Mercari search, full trust scoring, keyword filtering, saved searches — local LLM only |
| **Paid** | LLM query builder, background saved-search monitoring with alerts, cloud LLM option |
| **Premium** | Vision photo condition assessment, fine-tuned trust models, multi-user |
| **Ultra** | Human-in-the-loop operator — handles CAPTCHAs, phone calls, anything automation can't |
**Layer 1 — Listing intelligence (MVP, implemented)**
Before you bid, Snipe tells you whether a listing is worth your time. It fetches eBay listings, scores each seller's trustworthiness across five signals, flags suspicious pricing relative to completed sales, and surfaces red flags like new accounts, cosmetic damage buried in titles, and listings that have been sitting unsold for weeks.
**Layer 2 — Auction sniping (roadmap)**
Snipe manages the bid itself: monitors listings across platforms, schedules last-second bids, handles soft-close extensions, and guides you through the post-win logistics (payment routing, shipping coordination, provenance documentation for antiques).
The name is the origin of the word "sniping" — common snipes are notoriously elusive birds, secretive and camouflaged, that flush suddenly from cover. Shooting one required extreme patience, stillness, and a precise last-second shot. That's the auction strategy.
License key format: `CFG-SNPE-XXXX-XXXX-XXXX`
---
## Screenshots
**Landing page — no account required**
![Snipe landing hero showing search bar and three feature tiles: Seller trust score, Price vs. market, Red flag detection](docs/screenshots/01-hero.png)
**Search results with trust scores**
![Search results for vintage film camera listings, each card showing a trust score badge, seller feedback, price, and market comparison](docs/screenshots/02-results.png)
**STEAL badge — price significantly below market**
![Listing cards with STEAL badge highlighting listings priced well below completed sales median](docs/screenshots/03-steal-badge.png)
> Red flag and Triple Red screenshots coming — captured opportunistically from real scammy listings.
---
## Implemented: Listing Intelligence
### Supported platforms
| Platform | Search | Trust scoring | Completed-sales comps |
|----------|--------|---------------|-----------------------|
| **eBay** | ✅ Browse API + Playwright fallback | ✅ All 5 signals | ✅ Marketplace Insights + Browse fallback |
| **Mercari** | ✅ Playwright scraper | ✅ Partial (3/5 signals) | ⏳ Phase 3 |
Switch between platforms via the tab picker in the search UI. All platforms share the same Playwright + Xvfb scraping stack (Cloudflare/Kasada-safe headed Chromium).
### eBay Listing Intelligence
### Search & filtering
- Full-text eBay search via Browse API (with Playwright scraper fallback when no API credentials configured)
- Price range, must-include keywords (AND / ANY / OR-groups mode), must-exclude terms, eBay category filter
- OR-group mode expands keyword combinations into multiple targeted queries and deduplicates results — eBay relevance won't silently drop variants
- Pages-to-fetch control: each Browse API page returns up to 200 listings
- Saved searches with one-click re-run that restores all filter settings
### Seller trust scoring
Five signals, each scored 020, composited to 0100:
| Signal | What it measures |
|--------|-----------------|
| `account_age` | Days since eBay account registration |
| `feedback_count` | Total feedback received |
| `feedback_ratio` | Positive feedback percentage |
| `price_vs_market` | Listing price vs. median of recent completed sales |
| `category_history` | Whether seller has history selling in this category |
Scores are marked **partial** when signals are unavailable (e.g. account age not yet enriched). Partial scores are displayed with a visual indicator rather than penalizing the seller for missing data.
### Red flags
Hard filters that override the composite score:
- `new_account` — account registered within 7 days
- `established_bad_actor` — feedback ratio < 80% with 20+ reviews
Soft flags surfaced as warnings:
- `account_under_30_days` — account under 30 days old
- `low_feedback_count` — fewer than 10 reviews
- `suspicious_price` — listing price below 50% of market median *(suppressed automatically when the search returns a heterogeneous price distribution — e.g. mixed laptop generations — to prevent false positives)*
- `duplicate_photo` — same image found on another listing (perceptual hash)
- `scratch_dent_mentioned` — title keywords indicating cosmetic damage, functional problems, or evasive language (see below)
- `long_on_market` — listing has been seen 5+ times over 14+ days without selling
- `significant_price_drop` — current price more than 20% below first-seen price
### Scratch & dent title detection
Scans listing titles for signals the item may have undisclosed damage or problems:
- **Explicit damage**: scratch, scuff, dent, crack, chip, blemish, worn
- **Condition catch-alls**: as is, for parts, parts only, spares or repair
- **Evasive redirects**: "see description", "read description", "see photos for" (seller hiding damage detail in listing body)
- **Functional problems**: "not working", "stopped working", "no power", "dead on arrival", "powers on but", "faulty", "broken screen/hinge/port"
- **DIY/repair listings**: "needs repair", "needs tlc", "project laptop", "for repair", "sold as is"
### Seller enrichment
- **Inline (API adapter)**: account age filled from Browse API `registrationDate` field
- **Background (scraper)**: `/itm/` listing pages scraped for seller "Joined" date via Playwright + Xvfb (Kasada-safe headed Chromium)
- **On-demand**: ↻ button on any listing card triggers `POST /api/enrich` — runs enrichment and re-scores without waiting for a second search
- **Category history**: derived from the seller's accumulated listing data (Browse API `categories` field); improves with every search, no extra API calls
### Affiliate link builder
Listing cards surface eBay affiliate-wrapped URLs. Uses `circuitforge_core.affiliates.wrap_url` — resolution order: user opted out → plain URL; user has BYOK affiliate ID → their ID; CF env var set (`EBAY_AFFILIATE_ID`) → CF's ID; otherwise plain URL. Users can configure their own eBay Partner Network ID or opt out entirely in Settings.
Disclosure tooltip appears on first encounter per-session and on each wrapped link (per-retailer copy from `get_disclosure_text`).
### Feedback FAB
In-app feedback button (bottom-right FAB) opens a modal: title, description, optional screenshot. Posts to the CF feedback endpoint. Status probed on load; FAB hidden if endpoint unreachable.
### Vision task scheduling
Photo condition assessment tasks queued through `circuitforge_core.tasks.TaskScheduler` — VRAM-aware slot management shared with any other LLM workloads on the same host. Runs moondream2 locally (free tier) or Claude vision (paid/cloud). Results stored per-listing and update the trust score card.
### Market price comparison
Completed sales fetched via eBay Marketplace Insights API (with Browse API fallback for app tiers that don't have Insights access). Median stored per query hash, used to score `price_vs_market` across all listings in a search.
### Adapters
| Adapter | When used | Signals available |
|---------|-----------|-------------------|
| Browse API (`api`) | eBay API credentials configured | All signals; account age inline |
| Playwright scraper (`scraper`) | No credentials / forced | All signals except account age (async BTF enrichment) |
| `auto` (default) | — | API if credentials present, scraper otherwise |
### Mercari Listing Intelligence
Search Mercari US via headed Chromium + playwright-stealth, bypassing Cloudflare Turnstile. Uses the same `BrowserPool` as the eBay scraper.
**Trust signal coverage:**
| Signal | Source | Available |
|--------|--------|-----------|
| `feedback_count` | `NumSales` on listing page | ✅ |
| `feedback_ratio` | `ReviewStarsWrapper[data-stars]` ÷ 5 | ✅ |
| `price_vs_market` | Computed from comps (Phase 3) | ⏳ |
| `account_age_days` | Seller profile page (not yet fetched) | ❌ |
| `category_history` | Not exposed in Mercari HTML | ❌ |
All Mercari scores are marked **partial** (`score_is_partial=True`) because account age and category history are unavailable. The trust scorer handles partial scores correctly — missing signals don't penalise the seller.
**Design note:** `seller_platform_id` stores the Mercari `product_id` (e.g. `m86032668393`) rather than the seller username, because seller identity isn't available from search results HTML. `get_seller()` resolves the product ID by fetching the individual listing page.
---
## Stack
| Layer | Tech | Port |
|-------|------|------|
| Frontend | Vue 3 + Pinia + UnoCSS + Vite (nginx) | 8509 |
| API | FastAPI (uvicorn) | 8510 |
| Scraper | Playwright + playwright-stealth + Xvfb | — |
| DB | SQLite (`data/snipe.db`) | — |
| Core | circuitforge-core (editable install) | — |
## Running
```bash
./manage.sh start # start all services
./manage.sh stop # stop
./manage.sh restart # restart
./manage.sh logs # tail logs
./manage.sh open # open in browser
```
Cloud stack (shared DB, multi-user):
```bash
docker compose -f compose.cloud.yml -p snipe-cloud up -d
docker compose -f compose.cloud.yml -p snipe-cloud build api # after Python changes
```
---
## Stack
| Layer | Technology | Port |
|-------|-----------|------|
| Frontend | Vue 3 + Pinia + UnoCSS + Vite (served via nginx) | 8509 |
| API | FastAPI (uvicorn) | 8510 |
| Scraper | Playwright + playwright-stealth + Xvfb (Kasada/Cloudflare-safe headed Chromium) | — |
| Database | SQLite (`data/snipe.db`) | — |
| Core | circuitforge-core (editable install) | — |
The scraper stack uses headed Chromium via Xvfb (X virtual framebuffer) with playwright-stealth for all platform access. Headless and `requests`-based approaches are blocked by eBay and Mercari.
---
## Roadmap
## Documentation
### Intelligence features
| Issue | Feature |
|-------|---------|
| [#5](https://git.opensourcesolarpunk.com/Circuit-Forge/snipe/issues/5) | UPC/product lookup → LLM-crafted search terms (paid tier) |
| [#12](https://git.opensourcesolarpunk.com/Circuit-Forge/snipe/issues/12) | Background saved-search monitoring with configurable alerts |
| [#21](https://git.opensourcesolarpunk.com/Circuit-Forge/snipe/issues/21) | Vision classification pipeline — condition scoring, listing quality, fraud signals |
| [#43](https://git.opensourcesolarpunk.com/Circuit-Forge/snipe/issues/43) | Wire photo analysis task to cf-orch (VRAM-aware scheduling) |
| [#51](https://git.opensourcesolarpunk.com/Circuit-Forge/snipe/issues/51) | Reranker: semantic filter before trust scoring |
| [#52](https://git.opensourcesolarpunk.com/Circuit-Forge/snipe/issues/52) | Trust score fix: exclude buyer-only feedback from `feedback_count` |
| [#41](https://git.opensourcesolarpunk.com/Circuit-Forge/snipe/issues/41) | Additional theme variants — solarized, high-contrast, colorblind-safe |
### Platform expansion
| Issue | Feature |
|-------|---------|
| ✅ shipped | Mercari US — search + partial trust scoring |
| [#53](https://git.opensourcesolarpunk.com/Circuit-Forge/snipe/issues/53) | BrowserPool thread-safety — eliminate per-request cold-start (~10s) |
| [#10](https://git.opensourcesolarpunk.com/Circuit-Forge/snipe/issues/10) | CT Bids, HiBid, AuctionZip, Invaluable, GovPlanet, Bidsquare, Proxibid |
| [#46](https://git.opensourcesolarpunk.com/Circuit-Forge/snipe/issues/46) | Broadcast trust score verdicts to Fediverse communities via ActivityPub |
### Cloud / infrastructure
| Issue | Feature |
|-------|---------|
| [#7](https://git.opensourcesolarpunk.com/Circuit-Forge/snipe/issues/7) | Shared image hash DB — requires explicit opt-in consent (CF privacy-by-architecture) |
| [#45](https://git.opensourcesolarpunk.com/Circuit-Forge/snipe/issues/45) | Migrate shared seller/comps DB from SQLite to Postgres |
### Auction sniping engine
| Issue | Feature |
|-------|---------|
| [#9](https://git.opensourcesolarpunk.com/Circuit-Forge/snipe/issues/9) | Bid scheduling + snipe execution (NTP-synchronized, soft-close handling, human approval gate) |
| [#13](https://git.opensourcesolarpunk.com/Circuit-Forge/snipe/issues/13) | Post-win workflow: payment routing, shipping coordination, provenance documentation |
### Already shipped
| Issue | Feature |
|-------|---------|
| [#1](https://git.opensourcesolarpunk.com/Circuit-Forge/snipe/issues/1) | SSE live score push — enriched data appears without re-search |
| [#2](https://git.opensourcesolarpunk.com/Circuit-Forge/snipe/issues/2) | eBay OAuth for full trust score access via Trading API |
| [#4](https://git.opensourcesolarpunk.com/Circuit-Forge/snipe/issues/4) | Community blocklist + batch eBay Trust & Safety reporting |
| [#6](https://git.opensourcesolarpunk.com/Circuit-Forge/snipe/issues/6) | Shared seller/scammer/comps DB across cloud users |
| [#8](https://git.opensourcesolarpunk.com/Circuit-Forge/snipe/issues/8) | "Triple Red" easter egg |
| [#11](https://git.opensourcesolarpunk.com/Circuit-Forge/snipe/issues/11) | Vision-based photo condition assessment — moondream2 / Claude vision |
| [#27](https://git.opensourcesolarpunk.com/Circuit-Forge/snipe/issues/27) | MCP server for Snipe search and scoring |
| [#29](https://git.opensourcesolarpunk.com/Circuit-Forge/snipe/issues/29) | LLM query builder — describe what to find, AI builds the search |
| [#47](https://git.opensourcesolarpunk.com/Circuit-Forge/snipe/issues/47) | Browser pool — pre-warm Chromium to cut scrape cold-start |
| [#48](https://git.opensourcesolarpunk.com/Circuit-Forge/snipe/issues/48) | Search result caching — skip redundant scrapes for repeated queries |
| [#49](https://git.opensourcesolarpunk.com/Circuit-Forge/snipe/issues/49) | Async search endpoint — return job ID immediately, scrape in background |
| [#50](https://git.opensourcesolarpunk.com/Circuit-Forge/snipe/issues/50) | Currency preference — display prices in user's preferred currency |
Full documentation at **[docs.circuitforge.tech/snipe](https://docs.circuitforge.tech/snipe)** — setup guide, trust scoring algorithm, platform adapter reference, API docs, and self-hosting notes.
---
## Primary platforms (full vision)
## Forgejo-primary
- **eBay** — general + collectibles *(search + trust scoring: implemented)*
- **Mercari** — US resale marketplace *(search + partial trust scoring: implemented; comps Phase 3)*
- **CT Bids** — Connecticut state surplus and municipal auctions
- **GovPlanet / IronPlanet** — government surplus equipment
- **AuctionZip** — antique auction house aggregator (1,000+ houses)
- **Invaluable / LiveAuctioneers** — fine art and antiques
- **Bidsquare** — antiques and collectibles
- **HiBid** — estate auctions
- **Proxibid** — industrial and collector auctions
Snipe is developed and maintained on Forgejo at [git.opensourcesolarpunk.com/Circuit-Forge/snipe](https://git.opensourcesolarpunk.com/Circuit-Forge/snipe). GitHub and Codeberg are read-only mirrors. File issues and submit pull requests on Forgejo.
## Why auctions are hard
---
Online auctions are frustrating because:
- Winning requires being present at the exact closing moment — sometimes 2 AM
- Platforms vary wildly: some allow proxy bids, some don't; closing times extend on activity
- Scammers exploit auction urgency — new accounts, stolen photos, pressure to pay outside platform
- Price history is hidden — you don't know if an item is underpriced or a trap
- Sellers hide damage in descriptions rather than titles to avoid automated filters
- Shipping logistics for large / fragile antiques require coordination with the auction house
- Provenance documentation is inconsistent across auction houses
## Contributing
## Bidding strategy engine (planned)
Bug reports and feature requests: open an issue on Forgejo. The discovery pipeline (scrapers, adapters, signal extraction) is MIT-licensed — pull requests welcome. AI trust-scoring features are BSL 1.1 — contributions are accepted but the license terms apply.
- **Hard snipe**: submit bid N seconds before close (default: 8s)
- **Soft-close handling**: detect if platform extends on last-minute bids; adjust strategy
- **Proxy ladder**: set max and let the engine bid in increments, reserve snipe for final window
- **Reserve detection**: identify likely reserve price from bid history patterns
- **Comparable sales**: pull recent auction results for same/similar items across platforms
---
## Post-win workflow (planned)
## License
1. Payment method routing (platform-specific: CC, wire, check)
2. Shipping quote requests to approved carriers (freight / large items via uShip; parcel via FedEx/UPS)
3. Condition report request from auction house
4. Provenance packet generation (for antiques / fine art resale or insurance)
5. Add to inventory (for dealers / collectors tracking portfolio value)
Snipe uses a dual license:
## Product code (license key)
| Component | License |
|-----------|---------|
| Discovery pipeline — scrapers, platform adapters, search, keyword filtering | [MIT](LICENSE-MIT) |
| LLM trust-scoring, query builder, vision assessment, AI features | [BSL 1.1](LICENSE-BSL) — free for personal non-commercial self-hosting; commercial use requires a paid license; converts to MIT after 4 years |
`CFG-SNPE-XXXX-XXXX-XXXX`
Privacy · Safety · Accessibility — co-equal, non-negotiable.
## Tech notes
- Shared `circuitforge-core` scaffold (DB, LLM router, tier system, config)
- Platform adapters: eBay (Browse API + scraper) and Mercari (scraper); AuctionZip, Invaluable, HiBid, CT Bids planned (Playwright + API where available)
- Bid execution: Playwright automation with precise timing (NTP-synchronized)
- Soft-close detection: platform-specific rules engine
- Comparable sales: eBay completed listings via Marketplace Insights API + Browse API fallback
- Vision module: condition assessment from listing photos — moondream2 / Claude vision (paid tier stub in `app/trust/photo.py`)
- **Kasada/Cloudflare bypass**: headed Chromium via Xvfb with playwright-stealth; all scraping uses this path — headless and `requests`-based approaches are blocked by eBay and Mercari. Xvfb started with `-ac` (no X11 auth required in Docker), display range `:200+` to avoid host socket conflicts.
[circuitforge.tech](https://circuitforge.tech)

View file

@ -1,60 +1,58 @@
"""Pre-warmed Chromium browser pool for the eBay scraper.
"""Thread-local Playwright browser manager for the eBay scraper.
Eliminates cold-start latency (5-10s per call) by keeping a small pool of
long-lived Playwright browser instances with fresh contexts ready to serve.
Each uvicorn worker thread that calls fetch_html() gets its own Playwright
instance, browser, and context created lazily on first use. This avoids
the "cannot switch to a different thread" error that arises when Playwright
sync API instances are shared across threads (they bind their greenlet event
loop to the creating thread).
Key design:
- Pool slots: ``(xvfb_proc, pw_instance, browser, context, display_num, last_used_ts)``
One headed Chromium browser per slot keeps the Kasada fingerprint clean.
- Display numbering: :200-:399 (avoids host :0 and low-numbered kernel socket conflicts).
- Thread safety: ``queue.Queue`` with blocking get (timeout=3s before fresh fallback).
- Replenishment: after each use, the dirty context is closed and a new context is
opened on the *same* browser, then returned to the queue. Browser launch overhead
is only paid at startup and during idle-cleanup replenishment.
- Idle cleanup: daemon thread closes slots idle for >5 minutes to avoid memory leaks
when the service is quiet.
- Graceful degradation: if Playwright / Xvfb is unavailable (host-side test env),
``fetch_html`` falls back to launching a fresh browser per call same behavior
as before this module existed.
- Thread-local: _thread_local.slot holds the _PooledBrowser for the current
thread. No slot is ever handed to another thread.
- Lazy creation: slots are created on first fetch_html() call per thread, not
at startup. start() is a lightweight lifecycle marker only.
- Registry: _slot_registry (keyed by thread-id) lets stop() close every active
slot across all threads without walking thread-local storage.
- Replenishment: after each use the dirty context is closed and a fresh one
opened on the same browser. Browser launch overhead is paid at most once
per worker thread lifetime.
- Graceful degradation: if Playwright / Xvfb is unavailable, fetch_html falls
back to _fetch_fresh (identical behavior to before this module existed).
Pool size is controlled via ``BROWSER_POOL_SIZE`` env var (default: 2).
Pool size is read from BROWSER_POOL_SIZE env var (default: 2) but is now a
soft limit used only for documentation; actual concurrency is bounded by
uvicorn's thread count.
"""
from __future__ import annotations
import itertools
import logging
import os
import queue
import subprocess
import threading
import time
from concurrent.futures import ThreadPoolExecutor, as_completed
from dataclasses import dataclass, field
from typing import Optional
log = logging.getLogger(__name__)
# Display counter shared by pool warmup and _fetch_fresh fallback.
# Range :200-:399 avoids low-numbered displays that may be pre-occupied by
# the host X server or lingering kernel sockets from previous runs.
_pool_display_counter = itertools.cycle(range(200, 400))
_IDLE_TIMEOUT_SECS = 300 # 5 minutes
_CLEANUP_INTERVAL_SECS = 60
_QUEUE_TIMEOUT_SECS = 3.0
_CHROMIUM_ARGS = ["--no-sandbox", "--disable-dev-shm-usage"]
_XVFB_ARGS = ["-screen", "0", "1280x800x24", "-ac"] # -ac: disable X auth (safe in isolated Docker)
_XVFB_ARGS = ["-screen", "0", "1280x800x24", "-ac"]
_USER_AGENT = (
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 "
"(KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36"
)
_VIEWPORT = {"width": 1280, "height": 800}
# Thread-local storage: each thread gets its own _PooledBrowser slot.
_thread_local = threading.local()
@dataclass
class _PooledBrowser:
"""One slot in the browser pool."""
"""One browser slot, bound to a single thread."""
xvfb: subprocess.Popen
pw: object # playwright instance (sync_playwright().__enter__())
browser: object # playwright Browser
@ -63,13 +61,13 @@ class _PooledBrowser:
last_used_ts: float = field(default_factory=time.time)
def _launch_slot() -> "_PooledBrowser":
def _launch_slot() -> _PooledBrowser:
"""Launch a new Xvfb display + headed Chromium browser + fresh context.
Raises on failure callers must catch and handle gracefully.
Must be called from the thread that will use the slot.
"""
from playwright.sync_api import sync_playwright
from playwright_stealth import Stealth # noqa: F401 — imported here to confirm availability
from playwright_stealth import Stealth # noqa: F401
display_num = next(_pool_display_counter)
display = f":{display_num}"
@ -81,7 +79,6 @@ def _launch_slot() -> "_PooledBrowser":
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
)
# Small grace period for Xvfb to bind the display socket.
time.sleep(0.3)
pw = sync_playwright().start()
@ -112,7 +109,7 @@ def _launch_slot() -> "_PooledBrowser":
def _close_slot(slot: _PooledBrowser) -> None:
"""Cleanly close a pool slot: context → browser → Playwright → Xvfb."""
"""Cleanly close a slot: context → browser → Playwright → Xvfb."""
try:
slot.ctx.close()
except Exception:
@ -133,11 +130,7 @@ def _close_slot(slot: _PooledBrowser) -> None:
def _replenish_slot(slot: _PooledBrowser) -> _PooledBrowser:
"""Close the used context and open a fresh one on the same browser.
Returns a new _PooledBrowser sharing the same xvfb/pw/browser but with a
clean context avoids paying browser launch overhead on every fetch.
"""
"""Close the used context and open a fresh one on the same browser."""
try:
slot.ctx.close()
except Exception:
@ -158,26 +151,27 @@ def _replenish_slot(slot: _PooledBrowser) -> _PooledBrowser:
class BrowserPool:
"""Thread-safe pool of pre-warmed Playwright browser contexts."""
"""Thread-local Playwright browser manager.
Each thread that calls fetch_html() owns its own browser instance.
No slots are shared between threads.
"""
def __init__(self, size: int = 2) -> None:
self._size = size
self._q: queue.Queue[_PooledBrowser] = queue.Queue()
self._lock = threading.Lock()
self._started = False
self._stopped = False
self._playwright_available: Optional[bool] = None # cached after first check
self._playwright_available: Optional[bool] = None
# Registry of all active slots keyed by thread id — used only by stop().
self._slot_registry: dict[int, _PooledBrowser] = {}
# ------------------------------------------------------------------
# Lifecycle
# ------------------------------------------------------------------
def start(self) -> None:
"""Pre-warm N browser slots in background threads.
Non-blocking: returns immediately; slots appear in the queue as they
finish launching. Safe to call multiple times (no-op after first).
"""
"""Mark the pool as started. Slots are created lazily per thread."""
with self._lock:
if self._started:
return
@ -190,43 +184,19 @@ class BrowserPool:
)
return
def _warm_one(_: int) -> None:
try:
slot = _launch_slot()
self._q.put(slot)
log.debug("BrowserPool: slot :%d ready", slot.display_num)
except Exception as exc:
log.warning("BrowserPool: pre-warm failed: %s", exc)
with ThreadPoolExecutor(max_workers=self._size) as ex:
futures = [ex.submit(_warm_one, i) for i in range(self._size)]
# Don't wait — executor exits after submitting, threads continue.
# Actually ThreadPoolExecutor.__exit__ waits for completion, which
# is fine: pre-warming completes in background relative to FastAPI
# startup because this whole method is called from a thread.
for f in as_completed(futures):
pass # propagate exceptions via logging, not raises
_idle_cleaner = threading.Thread(
target=self._idle_cleanup_loop, daemon=True, name="browser-pool-idle-cleaner"
)
_idle_cleaner.start()
log.info("BrowserPool: started with %d slots", self._q.qsize())
log.info("BrowserPool: started (thread-local mode, size hint=%d)", self._size)
def stop(self) -> None:
"""Drain and close all pool slots. Called at FastAPI shutdown."""
"""Close all active slots across all threads."""
with self._lock:
self._stopped = True
registry_snapshot = dict(self._slot_registry)
closed = 0
while True:
try:
slot = self._q.get_nowait()
for slot in registry_snapshot.values():
_close_slot(slot)
closed += 1
except queue.Empty:
break
self._slot_registry.clear()
log.info("BrowserPool: stopped, closed %d slot(s)", closed)
# ------------------------------------------------------------------
@ -242,28 +212,13 @@ class BrowserPool:
) -> str:
"""Navigate to *url* and return the rendered HTML.
Borrows a browser context from the pool (blocks up to 3s), uses it to
fetch the page, then replenishes the slot with a fresh context.
Falls back to a fully fresh browser if the pool is empty after the
timeout or if Playwright is unavailable.
Args:
wait_for_selector: CSS/data-testid selector to wait for before capturing
HTML (e.g. ``"[data-testid='SearchResults']"``). When set, the fixed
*wait_for_timeout_ms* sleep is skipped the page is captured as soon
as the selector appears (or after 15s timeout, whichever comes first).
wait_for_timeout_ms: static post-navigation sleep in ms when
*wait_for_selector* is None. Default 2000; set higher (e.g. 8000)
for sites with JS challenge pages (Cloudflare Turnstile).
Uses the calling thread's browser slot (creates one if needed).
Falls back to a fresh browser if Playwright is unavailable or the
slot fails.
"""
time.sleep(delay)
slot: Optional[_PooledBrowser] = None
try:
slot = self._q.get(timeout=_QUEUE_TIMEOUT_SECS)
except queue.Empty:
log.debug("BrowserPool: pool empty after %.1fs — using fresh browser", _QUEUE_TIMEOUT_SECS)
slot = self._get_or_create_thread_slot()
if slot is not None:
try:
@ -272,32 +227,65 @@ class BrowserPool:
wait_for_selector=wait_for_selector,
wait_for_timeout_ms=wait_for_timeout_ms,
)
# Replenish: close dirty context, open fresh one, return to queue.
try:
fresh_slot = _replenish_slot(slot)
self._q.put(fresh_slot)
self._register_slot(fresh_slot)
except Exception as exc:
log.warning("BrowserPool: replenish failed, slot discarded: %s", exc)
_close_slot(slot)
self._unregister_slot()
return html
except Exception as exc:
log.warning("BrowserPool: pooled fetch failed (%s) — closing slot", exc)
_close_slot(slot)
# Fall through to fresh browser below.
self._unregister_slot()
# Fallback: fresh browser (same code as old scraper._fetch_url).
return self._fetch_fresh(
url,
wait_for_selector=wait_for_selector,
wait_for_timeout_ms=wait_for_timeout_ms,
)
# ------------------------------------------------------------------
# Thread-local slot management
# ------------------------------------------------------------------
def _get_or_create_thread_slot(self) -> Optional[_PooledBrowser]:
"""Return the calling thread's slot, creating it if absent."""
if not self._check_playwright():
return None
slot: Optional[_PooledBrowser] = getattr(_thread_local, "slot", None)
if slot is not None:
return slot
try:
slot = _launch_slot()
self._register_slot(slot)
log.debug("BrowserPool: launched slot :%d for thread %d",
slot.display_num, threading.get_ident())
return slot
except Exception as exc:
log.warning("BrowserPool: slot launch failed: %s", exc)
return None
def _register_slot(self, slot: _PooledBrowser) -> None:
"""Bind slot to the calling thread (both thread-local and registry)."""
_thread_local.slot = slot
with self._lock:
self._slot_registry[threading.get_ident()] = slot
def _unregister_slot(self) -> None:
"""Remove the calling thread's slot from thread-local and registry."""
_thread_local.slot = None
with self._lock:
self._slot_registry.pop(threading.get_ident(), None)
# ------------------------------------------------------------------
# Internal helpers
# ------------------------------------------------------------------
def _check_playwright(self) -> bool:
"""Return True if Playwright and Xvfb are importable/runnable."""
if self._playwright_available is not None:
return self._playwright_available
try:
@ -315,7 +303,6 @@ class BrowserPool:
wait_for_selector: Optional[str] = None,
wait_for_timeout_ms: int = 2000,
) -> str:
"""Open a new page on *slot.ctx*, navigate to *url*, return HTML."""
from playwright_stealth import Stealth
page = slot.ctx.new_page()
@ -326,7 +313,7 @@ class BrowserPool:
try:
page.wait_for_selector(wait_for_selector, timeout=15_000)
except Exception:
pass # selector didn't appear; return whatever loaded
pass
else:
page.wait_for_timeout(wait_for_timeout_ms)
return page.content()
@ -342,7 +329,6 @@ class BrowserPool:
wait_for_selector: Optional[str] = None,
wait_for_timeout_ms: int = 2000,
) -> str:
"""Launch a fully fresh browser, fetch *url*, close everything."""
import subprocess as _subprocess
try:
@ -364,7 +350,7 @@ class BrowserPool:
stdout=_subprocess.DEVNULL,
stderr=_subprocess.DEVNULL,
)
time.sleep(0.3) # wait for Xvfb to bind the display socket before Chromium starts
time.sleep(0.3)
try:
with sync_playwright() as pw:
browser = pw.chromium.launch(
@ -383,7 +369,7 @@ class BrowserPool:
try:
page.wait_for_selector(wait_for_selector, timeout=15_000)
except Exception:
pass # selector didn't appear; return whatever loaded
pass
else:
page.wait_for_timeout(wait_for_timeout_ms)
html = page.content()
@ -394,32 +380,6 @@ class BrowserPool:
return html
def _idle_cleanup_loop(self) -> None:
"""Daemon thread: drain slots idle for >5 minutes every 60 seconds."""
while not self._stopped:
time.sleep(_CLEANUP_INTERVAL_SECS)
if self._stopped:
break
now = time.time()
idle_cutoff = now - _IDLE_TIMEOUT_SECS
# Drain the entire queue, keep non-idle slots, close idle ones.
kept: list[_PooledBrowser] = []
closed = 0
while True:
try:
slot = self._q.get_nowait()
except queue.Empty:
break
if slot.last_used_ts < idle_cutoff:
_close_slot(slot)
closed += 1
else:
kept.append(slot)
for slot in kept:
self._q.put(slot)
if closed:
log.info("BrowserPool: idle cleanup closed %d slot(s)", closed)
# ---------------------------------------------------------------------------
# Module-level singleton
@ -430,11 +390,7 @@ _pool_lock = threading.Lock()
def get_pool() -> BrowserPool:
"""Return the module-level BrowserPool singleton (creates it if needed).
Pool size is read from ``BROWSER_POOL_SIZE`` env var (default: 2).
Call ``get_pool().start()`` at FastAPI startup to pre-warm slots.
"""
"""Return the module-level BrowserPool singleton (creates it if needed)."""
global _pool
if _pool is None:
with _pool_lock:

View file

@ -126,7 +126,12 @@ class Aggregator:
# Hard filters
if seller and seller.account_age_days is not None and seller.account_age_days < HARD_FILTER_AGE_DAYS:
red_flags.append("new_account")
if seller and seller.feedback_ratio < HARD_FILTER_BAD_RATIO_THRESHOLD:
if seller and seller.feedback_ratio == 0.0 and seller.feedback_count > 0:
# 12-month ratio missing from page — returning seller or buyer-only account.
# Score will be partial (metadata._feedback_ratio returns None). Soft flag
# only: do NOT fire established_bad_actor on what is likely missing data.
red_flags.append("no_recent_seller_data")
elif seller and seller.feedback_ratio < HARD_FILTER_BAD_RATIO_THRESHOLD:
if HARD_FILTER_BAD_RATIO_MIN_COUNT < seller.feedback_count <= HARD_FILTER_BAD_RATIO_MAX_COUNT:
# Moderate-volume account with consistently bad ratio → hard flag.
red_flags.append("established_bad_actor")

View file

@ -44,7 +44,13 @@ class MetadataScorer:
if count < 200: return 15
return 20
def _feedback_ratio(self, ratio: float, count: int) -> int:
def _feedback_ratio(self, ratio: float, count: int) -> Optional[int]:
# ratio=0.0 with count>0 means the 12-month percentage wasn't on the page —
# eBay omits the ratio for returning/buyer-only sellers with no recent sales.
# Treat as missing rather than "literally 0% positive" (which eBay doesn't allow
# on active accounts — those get suspended long before reaching 0%).
if ratio == 0.0 and count > 0:
return None
if ratio < 0.80 and count > 20: return 0
if ratio < 0.90: return 5
if ratio < 0.95: return 10

View file

@ -1,16 +1,15 @@
"""Tests for app.platforms.ebay.browser_pool.
"""Tests for app.platforms.ebay.browser_pool (thread-local design).
All tests run without real Chromium / Xvfb / Playwright.
Playwright, Xvfb subprocess calls, and Stealth are mocked throughout.
"""
from __future__ import annotations
import queue
import subprocess
import threading
import time
from typing import Any
from unittest.mock import MagicMock, patch, call
from unittest.mock import MagicMock, patch
import pytest
@ -19,40 +18,35 @@ import pytest
# ---------------------------------------------------------------------------
def _reset_pool_singleton():
"""Force the module-level _pool singleton back to None."""
import app.platforms.ebay.browser_pool as _mod
_mod._pool = None
# ---------------------------------------------------------------------------
# Fixtures
# ---------------------------------------------------------------------------
def _reset_thread_local():
import app.platforms.ebay.browser_pool as _mod
_mod._thread_local.slot = None
@pytest.fixture(autouse=True)
def reset_singleton():
"""Reset the singleton before and after every test."""
def reset_pool():
_reset_pool_singleton()
_reset_thread_local()
yield
_reset_pool_singleton()
_reset_thread_local()
def _make_fake_slot():
"""Build a mock _PooledBrowser with all necessary attributes."""
from app.platforms.ebay.browser_pool import _PooledBrowser
xvfb = MagicMock(spec=subprocess.Popen)
pw = MagicMock()
browser = MagicMock()
ctx = MagicMock()
slot = _PooledBrowser(
xvfb=xvfb,
pw=pw,
browser=browser,
ctx=ctx,
display_num=100,
last_used_ts=time.time(),
return _PooledBrowser(
xvfb=xvfb, pw=pw, browser=browser, ctx=ctx,
display_num=100, last_used_ts=time.time(),
)
return slot
# ---------------------------------------------------------------------------
@ -62,9 +56,7 @@ def _make_fake_slot():
class TestGetPoolSingleton:
def test_returns_same_instance(self):
from app.platforms.ebay.browser_pool import get_pool, BrowserPool
p1 = get_pool()
p2 = get_pool()
assert p1 is p2
assert get_pool() is get_pool()
def test_returns_browser_pool_instance(self):
from app.platforms.ebay.browser_pool import get_pool, BrowserPool
@ -72,14 +64,12 @@ class TestGetPoolSingleton:
def test_default_size_is_two(self):
from app.platforms.ebay.browser_pool import get_pool
pool = get_pool()
assert pool._size == 2
assert get_pool()._size == 2
def test_custom_size_from_env(self, monkeypatch):
monkeypatch.setenv("BROWSER_POOL_SIZE", "5")
from app.platforms.ebay.browser_pool import get_pool
pool = get_pool()
assert pool._size == 5
assert get_pool()._size == 5
# ---------------------------------------------------------------------------
@ -88,17 +78,15 @@ class TestGetPoolSingleton:
class TestLifecycle:
def test_start_is_noop_when_playwright_unavailable(self):
"""Pool should handle missing Playwright gracefully — no error raised."""
from app.platforms.ebay.browser_pool import BrowserPool
pool = BrowserPool(size=2)
with patch.object(pool, "_check_playwright", return_value=False):
pool.start() # must not raise
# Pool queue is empty — no slots launched.
assert pool._q.empty()
pool.start()
assert pool._started is True
assert pool._slot_registry == {}
def test_start_only_runs_once(self):
"""Calling start() twice must not double-warm."""
from app.platforms.ebay.browser_pool import BrowserPool
pool = BrowserPool(size=1)
@ -107,47 +95,46 @@ class TestLifecycle:
pool.start()
assert pool._started is True
def test_stop_drains_queue(self):
"""stop() should close every slot in the queue."""
def test_stop_closes_all_registry_slots(self):
from app.platforms.ebay.browser_pool import BrowserPool
pool = BrowserPool(size=2)
slot1 = _make_fake_slot()
slot2 = _make_fake_slot()
pool._q.put(slot1)
pool._q.put(slot2)
pool._slot_registry[1001] = slot1
pool._slot_registry[1002] = slot2
with patch("app.platforms.ebay.browser_pool._close_slot") as mock_close:
pool.stop()
assert mock_close.call_count == 2
assert pool._q.empty()
assert pool._slot_registry == {}
assert pool._stopped is True
def test_stop_on_empty_pool_is_safe(self):
def test_stop_on_empty_registry_is_safe(self):
from app.platforms.ebay.browser_pool import BrowserPool
pool = BrowserPool(size=2)
pool.stop() # must not raise
BrowserPool(size=2).stop()
# ---------------------------------------------------------------------------
# fetch_html — pool hit path
# fetch_html — thread-local slot hit path
# ---------------------------------------------------------------------------
class TestFetchHtmlPoolHit:
def test_uses_pooled_slot_and_replenishes(self):
"""fetch_html should borrow a slot, call _fetch_with_slot, replenish."""
class TestFetchHtmlSlotHit:
def test_uses_existing_slot_and_replenishes(self):
from app.platforms.ebay.browser_pool import BrowserPool
import app.platforms.ebay.browser_pool as _mod
pool = BrowserPool(size=1)
slot = _make_fake_slot()
pool._q.put(slot)
_mod._thread_local.slot = slot
fresh_slot = _make_fake_slot()
with (
patch.object(pool, "_fetch_with_slot", return_value="<html>ok</html>") as mock_fetch,
patch("app.platforms.ebay.browser_pool._replenish_slot", return_value=fresh_slot) as mock_replenish,
patch("app.platforms.ebay.browser_pool._replenish_slot", return_value=fresh_slot),
patch.object(pool, "_register_slot") as mock_register,
patch("time.sleep"),
):
html = pool.fetch_html("https://www.ebay.com/sch/i.html?_nkw=test", delay=0)
@ -157,21 +144,19 @@ class TestFetchHtmlPoolHit:
slot, "https://www.ebay.com/sch/i.html?_nkw=test",
wait_for_selector=None, wait_for_timeout_ms=2000,
)
mock_replenish.assert_called_once_with(slot)
# Fresh slot returned to queue
assert pool._q.get_nowait() is fresh_slot
mock_register.assert_called_once_with(fresh_slot)
def test_delay_is_respected(self):
"""fetch_html must call time.sleep(delay)."""
from app.platforms.ebay.browser_pool import BrowserPool
import app.platforms.ebay.browser_pool as _mod
pool = BrowserPool(size=1)
slot = _make_fake_slot()
pool._q.put(slot)
_mod._thread_local.slot = _make_fake_slot()
with (
patch.object(pool, "_fetch_with_slot", return_value="<html/>"),
patch("app.platforms.ebay.browser_pool._replenish_slot", return_value=_make_fake_slot()),
patch.object(pool, "_register_slot"),
patch("app.platforms.ebay.browser_pool.time") as mock_time,
):
pool.fetch_html("https://example.com", delay=1.5)
@ -180,22 +165,19 @@ class TestFetchHtmlPoolHit:
# ---------------------------------------------------------------------------
# fetch_html — pool empty / fallback path
# fetch_html — no slot / fallback path
# ---------------------------------------------------------------------------
class TestFetchHtmlFallback:
def test_falls_back_to_fresh_browser_when_pool_empty(self):
"""When pool is empty after timeout, _fetch_fresh should be called."""
def test_falls_back_when_no_slot_and_playwright_unavailable(self):
from app.platforms.ebay.browser_pool import BrowserPool
pool = BrowserPool(size=1)
# Queue is empty — no slots available.
# No thread-local slot; playwright unavailable → _get_or_create returns None.
with (
patch.object(pool, "_get_or_create_thread_slot", return_value=None),
patch.object(pool, "_fetch_fresh", return_value="<html>fresh</html>") as mock_fresh,
patch("time.sleep"),
# Make Queue.get raise Empty after a short wait.
patch.object(pool._q, "get", side_effect=queue.Empty),
):
html = pool.fetch_html("https://www.ebay.com/sch/i.html?_nkw=widget", delay=0)
@ -206,17 +188,18 @@ class TestFetchHtmlFallback:
)
def test_falls_back_when_pooled_fetch_raises(self):
"""If _fetch_with_slot raises, the slot is closed and _fetch_fresh is used."""
from app.platforms.ebay.browser_pool import BrowserPool
import app.platforms.ebay.browser_pool as _mod
pool = BrowserPool(size=1)
slot = _make_fake_slot()
pool._q.put(slot)
_mod._thread_local.slot = slot
with (
patch.object(pool, "_fetch_with_slot", side_effect=RuntimeError("Chromium crashed")),
patch.object(pool, "_fetch_fresh", return_value="<html>recovered</html>") as mock_fresh,
patch("app.platforms.ebay.browser_pool._close_slot") as mock_close,
patch.object(pool, "_unregister_slot"),
patch("time.sleep"),
):
html = pool.fetch_html("https://www.ebay.com/", delay=0)
@ -226,19 +209,107 @@ class TestFetchHtmlFallback:
mock_fresh.assert_called_once()
# ---------------------------------------------------------------------------
# Thread-local slot management
# ---------------------------------------------------------------------------
class TestThreadLocalSlotManagement:
def test_get_or_create_returns_existing_slot(self):
import app.platforms.ebay.browser_pool as _mod
from app.platforms.ebay.browser_pool import BrowserPool
pool = BrowserPool(size=1)
pool._playwright_available = True
existing = _make_fake_slot()
_mod._thread_local.slot = existing
result = pool._get_or_create_thread_slot()
assert result is existing
def test_get_or_create_launches_new_slot_when_absent(self):
import app.platforms.ebay.browser_pool as _mod
from app.platforms.ebay.browser_pool import BrowserPool
pool = BrowserPool(size=1)
pool._playwright_available = True
_mod._thread_local.slot = None
new_slot = _make_fake_slot()
with (
patch("app.platforms.ebay.browser_pool._launch_slot", return_value=new_slot),
patch.object(pool, "_register_slot") as mock_register,
):
result = pool._get_or_create_thread_slot()
assert result is new_slot
mock_register.assert_called_once_with(new_slot)
def test_get_or_create_returns_none_when_playwright_unavailable(self):
from app.platforms.ebay.browser_pool import BrowserPool
pool = BrowserPool(size=1)
pool._playwright_available = False
assert pool._get_or_create_thread_slot() is None
def test_register_slot_sets_thread_local_and_registry(self):
import app.platforms.ebay.browser_pool as _mod
from app.platforms.ebay.browser_pool import BrowserPool
pool = BrowserPool(size=1)
slot = _make_fake_slot()
pool._register_slot(slot)
assert _mod._thread_local.slot is slot
assert threading.get_ident() in pool._slot_registry
def test_unregister_slot_clears_thread_local_and_registry(self):
import app.platforms.ebay.browser_pool as _mod
from app.platforms.ebay.browser_pool import BrowserPool
pool = BrowserPool(size=1)
slot = _make_fake_slot()
pool._register_slot(slot)
pool._unregister_slot()
assert getattr(_mod._thread_local, "slot", None) is None
assert threading.get_ident() not in pool._slot_registry
def test_different_threads_get_independent_slots(self):
from app.platforms.ebay.browser_pool import BrowserPool
pool = BrowserPool(size=2)
pool._playwright_available = True
slots_seen: list = []
errors: list = []
def worker():
new_slot = _make_fake_slot()
with patch("app.platforms.ebay.browser_pool._launch_slot", return_value=new_slot):
s = pool._get_or_create_thread_slot()
slots_seen.append(s)
t1 = threading.Thread(target=worker)
t2 = threading.Thread(target=worker)
t1.start(); t2.start()
t1.join(); t2.join()
assert len(slots_seen) == 2
# Each thread got its own slot object (they may differ or coincidentally share
# the same mock; what matters is both threads succeeded without interference).
assert all(s is not None for s in slots_seen)
# ---------------------------------------------------------------------------
# ImportError graceful fallback
# ---------------------------------------------------------------------------
class TestImportErrorHandling:
def test_check_playwright_returns_false_on_import_error(self):
"""_check_playwright should cache False when playwright is not installed."""
from app.platforms.ebay.browser_pool import BrowserPool
pool = BrowserPool(size=2)
with patch.dict("sys.modules", {"playwright": None, "playwright_stealth": None}):
# Force re-check by clearing the cached value.
pool._playwright_available = None
result = pool._check_playwright()
@ -246,12 +317,11 @@ class TestImportErrorHandling:
assert pool._playwright_available is False
def test_start_logs_warning_when_playwright_missing(self, caplog):
"""start() should log a warning and not crash when Playwright is absent."""
import logging
from app.platforms.ebay.browser_pool import BrowserPool
pool = BrowserPool(size=1)
pool._playwright_available = False # simulate missing
pool._playwright_available = False
with patch.object(pool, "_check_playwright", return_value=False):
with caplog.at_level(logging.WARNING, logger="app.platforms.ebay.browser_pool"):
@ -260,87 +330,14 @@ class TestImportErrorHandling:
assert any("not available" in r.message for r in caplog.records)
def test_fetch_fresh_raises_runtime_error_when_playwright_missing(self):
"""_fetch_fresh must raise RuntimeError (not ImportError) when PW absent."""
from app.platforms.ebay.browser_pool import BrowserPool
pool = BrowserPool(size=1)
with patch.dict("sys.modules", {"playwright": None, "playwright.sync_api": None}):
with pytest.raises(RuntimeError, match="Playwright not installed"):
pool._fetch_fresh("https://www.ebay.com/")
# ---------------------------------------------------------------------------
# Idle cleanup
# ---------------------------------------------------------------------------
class TestIdleCleanup:
def test_idle_cleanup_closes_stale_slots(self):
"""_idle_cleanup_loop should close slots whose last_used_ts is too old."""
from app.platforms.ebay.browser_pool import BrowserPool, _IDLE_TIMEOUT_SECS
pool = BrowserPool(size=2)
stale_slot = _make_fake_slot()
stale_slot.last_used_ts = time.time() - (_IDLE_TIMEOUT_SECS + 60)
fresh_slot = _make_fake_slot()
fresh_slot.last_used_ts = time.time()
pool._q.put(stale_slot)
pool._q.put(fresh_slot)
closed_slots = []
def fake_close(s):
closed_slots.append(s)
with patch("app.platforms.ebay.browser_pool._close_slot", side_effect=fake_close):
# Run one cleanup tick directly (not the full loop).
now = time.time()
idle_cutoff = now - _IDLE_TIMEOUT_SECS
kept = []
while True:
try:
s = pool._q.get_nowait()
except queue.Empty:
break
if s.last_used_ts < idle_cutoff:
fake_close(s)
else:
kept.append(s)
for s in kept:
pool._q.put(s)
assert stale_slot in closed_slots
assert fresh_slot not in closed_slots
assert pool._q.qsize() == 1
def test_idle_cleanup_loop_stops_when_pool_stopped(self):
"""Cleanup daemon should exit when _stopped is True."""
from app.platforms.ebay.browser_pool import BrowserPool, _CLEANUP_INTERVAL_SECS
pool = BrowserPool(size=1)
pool._stopped = True
# The loop should return after one iteration of the while check.
# Use a very short sleep mock so the test doesn't actually wait 60s.
sleep_calls = []
def fake_sleep(secs):
sleep_calls.append(secs)
with patch("app.platforms.ebay.browser_pool.time") as mock_time:
mock_time.time.return_value = time.time()
mock_time.sleep.side_effect = fake_sleep
# Run in a thread with a short timeout to confirm it exits.
t = threading.Thread(target=pool._idle_cleanup_loop)
t.start()
t.join(timeout=2.0)
assert not t.is_alive(), "idle cleanup loop did not exit when _stopped=True"
# ---------------------------------------------------------------------------
# _replenish_slot helper
# ---------------------------------------------------------------------------
@ -355,12 +352,8 @@ class TestReplenishSlot:
browser.new_context.return_value = new_ctx
slot = _PooledBrowser(
xvfb=MagicMock(),
pw=MagicMock(),
browser=browser,
ctx=old_ctx,
display_num=101,
last_used_ts=time.time() - 10,
xvfb=MagicMock(), pw=MagicMock(), browser=browser,
ctx=old_ctx, display_num=101, last_used_ts=time.time() - 10,
)
result = _replenish_slot(slot)
@ -370,7 +363,6 @@ class TestReplenishSlot:
assert result.ctx is new_ctx
assert result.browser is browser
assert result.xvfb is slot.xvfb
# last_used_ts is refreshed
assert result.last_used_ts > slot.last_used_ts
@ -391,7 +383,6 @@ class TestCloseSlot:
xvfb=xvfb, pw=pw, browser=browser, ctx=ctx,
display_num=102, last_used_ts=time.time(),
)
_close_slot(slot)
ctx.close.assert_called_once()
@ -401,7 +392,6 @@ class TestCloseSlot:
xvfb.wait.assert_called_once()
def test_close_slot_ignores_exceptions(self):
"""_close_slot must not raise even if components throw."""
from app.platforms.ebay.browser_pool import _close_slot, _PooledBrowser
xvfb = MagicMock(spec=subprocess.Popen)
@ -418,7 +408,6 @@ class TestCloseSlot:
xvfb=xvfb, pw=pw, browser=browser, ctx=ctx,
display_num=103, last_used_ts=time.time(),
)
_close_slot(slot) # must not raise
@ -428,7 +417,6 @@ class TestCloseSlot:
class TestScraperUsesPool:
def test_fetch_url_delegates_to_pool(self):
"""ScrapedEbayAdapter._fetch_url must use the pool, not launch its own browser."""
from app.platforms.ebay.browser_pool import BrowserPool
from app.platforms.ebay.scraper import ScrapedEbayAdapter
from app.db.store import Store
@ -440,7 +428,6 @@ class TestScraperUsesPool:
fake_pool.fetch_html.return_value = "<html>pooled</html>"
with patch("app.platforms.ebay.browser_pool.get_pool", return_value=fake_pool):
# Clear the cache so fetch_url actually hits the pool.
import app.platforms.ebay.scraper as scraper_mod
scraper_mod._html_cache.clear()
html = adapter._fetch_url("https://www.ebay.com/sch/i.html?_nkw=test")
@ -451,7 +438,6 @@ class TestScraperUsesPool:
)
def test_fetch_url_uses_cache_before_pool(self):
"""_fetch_url should return cached HTML without hitting the pool."""
from app.platforms.ebay.scraper import ScrapedEbayAdapter, _html_cache, _HTML_CACHE_TTL
from app.db.store import Store
@ -467,6 +453,4 @@ class TestScraperUsesPool:
assert html == "<html>cached</html>"
fake_pool.fetch_html.assert_not_called()
# Cleanup
_html_cache.pop(url, None)

View file

@ -296,3 +296,37 @@ def test_non_retailer_does_not_suppress_duplicate_photo():
)
result = agg.aggregate(_ALL_20.copy(), photo_hash_duplicate=True, seller=seller)
assert "duplicate_photo" in result.red_flags_json
# ── #52: buyer-only / returning seller (ratio=0.0, count>0) ──────────────────
def test_zero_ratio_with_count_gives_no_recent_seller_data_flag():
"""Seller with 117 lifetime feedbacks (buyer-only) has ratio=0.0 parsed from page.
Must get no_recent_seller_data soft flag, NOT established_bad_actor."""
agg = Aggregator()
scores = {k: 10 for k in ["account_age", "feedback_count",
"feedback_ratio", "price_vs_market", "category_history"]}
buyer_only = Seller(
platform="ebay", platform_seller_id="u", username="jjcpryz",
account_age_days=1200, feedback_count=117, feedback_ratio=0.0,
category_history_json="{}",
)
result = agg.aggregate(scores, photo_hash_duplicate=False, seller=buyer_only)
assert "no_recent_seller_data" in result.red_flags_json
assert "established_bad_actor" not in result.red_flags_json
def test_established_bad_actor_still_fires_for_genuinely_bad_ratio():
"""ratio=0.75 (not zero) with moderate count → established_bad_actor still fires."""
agg = Aggregator()
scores = {k: 10 for k in ["account_age", "feedback_count",
"feedback_ratio", "price_vs_market", "category_history"]}
bad = Seller(
platform="ebay", platform_seller_id="u", username="u",
account_age_days=500, feedback_count=100, feedback_ratio=0.75,
category_history_json="{}",
)
result = agg.aggregate(scores, photo_hash_duplicate=False, seller=bad)
assert "established_bad_actor" in result.red_flags_json
assert "no_recent_seller_data" not in result.red_flags_json

View file

@ -43,3 +43,26 @@ def test_no_market_data_returns_none():
scores = scorer.score(_seller(), market_median=None, listing_price=950.0)
# None signals "data unavailable" — aggregator will set score_is_partial=True
assert scores["price_vs_market"] is None
def test_zero_ratio_with_nonzero_count_returns_none():
"""ratio=0.0 with count>0 means eBay didn't show a 12-month percentage.
Must return None (missing data) not 0 (catastrophically bad)."""
scorer = MetadataScorer()
scores = scorer.score(
_seller(feedback_ratio=0.0, feedback_count=117),
market_median=None, listing_price=500.0,
)
assert scores["feedback_ratio"] is None
def test_zero_ratio_with_zero_count_scores_low():
"""feedback_ratio=0.0 with count=0 is a real 'no data at all' case, not missing."""
scorer = MetadataScorer()
scores = scorer.score(
_seller(feedback_ratio=0.0, feedback_count=0),
market_median=None, listing_price=500.0,
)
# count=0 means zero_feedback; ratio=0 with count=0 is the standard no-history path
# (not the "missing 12-month window" path)
assert scores["feedback_ratio"] == 5 # ratio < 0.90 → 5