Commit graph

8 commits

Author SHA1 Message Date
49513cc081 fix: port drift on restart — down before preflight, read port from .env
Makefile restart target now runs compose down before preflight so ports
are free when preflight assigns them; previously preflight ran first while
the old container still held 8502, causing it to bump to 8503.

manage.sh start/restart/open now read STREAMLIT_PORT from .env instead
of re-running preflight after startup (which would see the live container
and bump the reported port again).
2026-02-26 13:57:12 -08:00
1dcf9d47a4 fix: stub-port adoption — stubs bind free ports, app routes to external via host.docker.internal
Three inter-related fixes for the service adoption flow:
- preflight: stub_port field — adopted services get a free port for their
  no-op container (avoids binding conflict with external service on real port)
  while update_llm_yaml still uses the real external port for host.docker.internal URLs
- preflight: write_env now uses stub_port (not resolved) for adopted services
  so SEARXNG_PORT etc point to the stub's harmless port, not the occupied one
- preflight: stub containers use sleep infinity + CMD true healthcheck so
  depends_on: service_healthy is satisfied without holding any real port
- Makefile: finetune profile changed from [cpu,single-gpu,dual-gpu] to [finetune]
  so the pytorch/cuda base image is not built during make start
2026-02-25 21:38:23 -08:00
010abe6339 fix: ollama docker_owned=True; finetune gets own profile to avoid build on start
- preflight: ollama was incorrectly marked docker_owned=False — Docker does
  define an ollama service, so external detection now correctly disables it
  via compose.override.yml when host Ollama is already running
- compose.yml: finetune moves from [cpu,single-gpu,dual-gpu] profiles to
  [finetune] profile so it is never built during 'make start' (pytorch/cuda
  base is 3.7GB+ and unnecessary for the UI)
- compose.yml: remove depends_on ollama from finetune — it reaches Ollama
  via OLLAMA_URL env var which works whether Ollama is Docker or host
- Makefile: finetune target uses --profile finetune + compose.gpu.yml overlay
2026-02-25 21:24:33 -08:00
cc01f67b04 fix: fix dual-gpu port conflict + move GPU config to overlay files
- Remove ollama-gpu service (was colliding with ollama on port 11434)
- Strip inline deploy.resources GPU blocks from vision and vllm
- Add compose.gpu.yml: Docker NVIDIA overlay for ollama (GPU 0),
  vision (GPU 0), vllm (GPU 1), finetune (GPU 0)
- Fix compose.podman-gpu.yml: rename ollama-gpu → ollama to match
  service name after removal of ollama-gpu
- Update Makefile: apply compose.gpu.yml for Docker + GPU profiles
  (was only applying podman-gpu.yml for Podman + GPU profiles)
2026-02-25 16:44:59 -08:00
54de37e5fa feat: containerize fine-tune pipeline (Dockerfile.finetune + make finetune)
- Dockerfile.finetune: PyTorch 2.3/CUDA 12.1 base + unsloth + training stack
- finetune_local.py: auto-register model via Ollama HTTP API after GGUF
  export; path-translate between finetune container mount and Ollama's view;
  update config/llm.yaml automatically; DOCS_DIR env override for Docker
- prepare_training_data.py: DOCS_DIR env override so make prepare-training
  works correctly inside the app container
- compose.yml: add finetune service (cpu/single-gpu/dual-gpu profiles);
  DOCS_DIR=/docs injected into app + finetune containers
- compose.podman-gpu.yml: CDI device override for finetune service
- Makefile: make prepare-training + make finetune targets
2026-02-25 16:22:48 -08:00
41aef225cd feat: Podman support — auto-detect COMPOSE, CDI GPU override, podman-compose in setup.sh 2026-02-25 15:36:36 -08:00
78917c8460 feat: startup preflight — port collision avoidance + resource checks
scripts/preflight.py (stdlib-only, no psutil):
- Port probing: owned services auto-reassign to next free port; external
  services (Ollama) show ✓ reachable / ⚠ not responding
- System resources: CPU cores, RAM (total + available), GPU VRAM via
  nvidia-smi; works on Linux + macOS
- Profile recommendation: remote / cpu / single-gpu / dual-gpu
- vLLM KV cache offload: calculates CPU_OFFLOAD_GB when VRAM < 10 GB
  free and RAM headroom > 4 GB (uses up to 25% of available headroom)
- Writes resolved values to .env for docker compose; single-service mode
  (--service streamlit) for scripted port queries
- Exit 0 unless an owned port genuinely can't be resolved

scripts/manage-ui.sh:
- Calls preflight.py --service streamlit before bind; falls back to
  pure-bash port scan if Python/yaml unavailable

compose.yml:
- vllm command: adds --cpu-offload-gb ${CPU_OFFLOAD_GB:-0}

Makefile:
- start / restart depend on preflight target
- PYTHON variable for env portability
- test target uses PYTHON variable
2026-02-24 20:36:16 -08:00
63fcfeadef feat: add cross-platform dependency installer and Makefile for Linux/macOS 2026-02-24 19:47:06 -08:00