peregrine

Author	SHA1	Message	Date
pyr0ball	88908ceca2	feat: add DUAL_GPU_MODE default, VRAM warning, and download size report to preflight - Add _mixed_mode_vram_warning() to flag low VRAM on GPU 1 in mixed mode - Wire download size report block into main() before closing border line - Wire mixed-mode VRAM warning into report if triggered - Write DUAL_GPU_MODE=ollama default to .env for new 2-GPU setups (no override if already set) - Promote import os to top-level (was local import inside get_cpu_cores)	2026-02-27 00:17:00 -08:00
pyr0ball	be28aba07f	feat: add _download_size_mb() pure function for preflight size warning	2026-02-27 00:15:26 -08:00
pyr0ball	637e8379b6	feat: add ollama_research to preflight service table and LLM backend map	2026-02-27 00:14:04 -08:00
pyr0ball	124b950ca3	fix: GPU detection + pdfplumber + pass GPU env vars into app container - preflight.py now writes PEREGRINE_GPU_COUNT and PEREGRINE_GPU_NAMES to .env so the app container gets GPU info without needing nvidia-smi access - compose.yml passes PEREGRINE_GPU_COUNT, PEREGRINE_GPU_NAMES, and RECOMMENDED_PROFILE as env vars to the app service - 0_Setup.py _detect_gpus() reads PEREGRINE_GPU_NAMES env var first; falls back to nvidia-smi (bare / GPU-passthrough environments) - 0_Setup.py _suggest_profile() reads RECOMMENDED_PROFILE env var first - requirements.txt: add pdfplumber (needed for resume PDF parsing)	2026-02-25 21:58:28 -08:00
pyr0ball	26fc97dfe5	fix: stub-port adoption — stubs bind free ports, app routes to external via host.docker.internal Three inter-related fixes for the service adoption flow: - preflight: stub_port field — adopted services get a free port for their no-op container (avoids binding conflict with external service on real port) while update_llm_yaml still uses the real external port for host.docker.internal URLs - preflight: write_env now uses stub_port (not resolved) for adopted services so SEARXNG_PORT etc point to the stub's harmless port, not the occupied one - preflight: stub containers use sleep infinity + CMD true healthcheck so depends_on: service_healthy is satisfied without holding any real port - Makefile: finetune profile changed from [cpu,single-gpu,dual-gpu] to [finetune] so the pytorch/cuda base image is not built during make start	2026-02-25 21:38:23 -08:00
pyr0ball	8e3f58cf46	fix: ollama docker_owned=True; finetune gets own profile to avoid build on start - preflight: ollama was incorrectly marked docker_owned=False — Docker does define an ollama service, so external detection now correctly disables it via compose.override.yml when host Ollama is already running - compose.yml: finetune moves from [cpu,single-gpu,dual-gpu] profiles to [finetune] profile so it is never built during 'make start' (pytorch/cuda base is 3.7GB+ and unnecessary for the UI) - compose.yml: remove depends_on ollama from finetune — it reaches Ollama via OLLAMA_URL env var which works whether Ollama is Docker or host - Makefile: finetune target uses --profile finetune + compose.gpu.yml overlay	2026-02-25 21:24:33 -08:00
pyr0ball	2662bab1e6	feat: smart service adoption in preflight — use external services instead of conflicting preflight.py now detects when a managed service (ollama, vllm, vision, searxng) is already running on its configured port and adopts it rather than reassigning or conflicting: - Generates compose.override.yml disabling Docker containers for adopted services (profiles: [_external_] — a profile never passed via --profile) - Rewrites config/llm.yaml base_url entries to host.docker.internal:<port> so the app container can reach host-side services through Docker's host-gateway mapping - compose.yml: adds extra_hosts host.docker.internal:host-gateway to the app service (required on Linux; no-op on macOS Docker Desktop) - .gitignore: excludes compose.override.yml (auto-generated, host-specific) Only streamlit is non-adoptable and continues to reassign on conflict.	2026-02-25 19:23:02 -08:00
pyr0ball	e332b8a069	feat: startup preflight — port collision avoidance + resource checks scripts/preflight.py (stdlib-only, no psutil): - Port probing: owned services auto-reassign to next free port; external services (Ollama) show ✓ reachable / ⚠ not responding - System resources: CPU cores, RAM (total + available), GPU VRAM via nvidia-smi; works on Linux + macOS - Profile recommendation: remote / cpu / single-gpu / dual-gpu - vLLM KV cache offload: calculates CPU_OFFLOAD_GB when VRAM < 10 GB free and RAM headroom > 4 GB (uses up to 25% of available headroom) - Writes resolved values to .env for docker compose; single-service mode (--service streamlit) for scripted port queries - Exit 0 unless an owned port genuinely can't be resolved scripts/manage-ui.sh: - Calls preflight.py --service streamlit before bind; falls back to pure-bash port scan if Python/yaml unavailable compose.yml: - vllm command: adds --cpu-offload-gb ${CPU_OFFLOAD_GB:-0} Makefile: - start / restart depend on preflight target - PYTHON variable for env portability - test target uses PYTHON variable	2026-02-24 20:36:16 -08:00

8 commits