# Minerva — Developer Context **Product code:** `MNRV` **Status:** Concept / early prototype **Domain:** Privacy-first, local-only voice assistant hardware platform --- ## What Minerva Is A 100% local, FOSS voice assistant hardware platform. No cloud. No subscriptions. No data leaving the local network. The goal is a reference hardware + software stack for a privacy-first voice assistant that anyone can build, extend, or self-host — including people without technical backgrounds if the assembly docs are good enough. Core design principles (same as all CF products): - **Local-first inference** — Whisper STT, Piper TTS, Mycroft Precise wake word all run on the host server - **Edge where possible** — wake word detection moves to edge hardware over time (K210 → ESP32-S3 → custom) - **No cloud dependency** — Home Assistant optional, not required - **100% FOSS stack** --- ## Hardware Targets ### Phase 1 (current): Maix Duino (K210) - K210 dual-core RISC-V @ 400MHz with KPU neural accelerator - Audio: I2S microphone + speaker output - Connectivity: ESP32 WiFi/BLE co-processor - Programming: MaixPy (MicroPython) - Status: server-side wake word working; edge inference in progress ### Phase 2: ESP32-S3 - More accessible, cheaper, better WiFi - On-device wake word with Espressif ESP-SR - See `docs/ESP32_S3_VOICE_ASSISTANT_SPEC.md` ### Phase 3: Custom hardware - Dedicated PCB for CF reference platform - Hardware-accelerated wake word + VAD - Designed for accessibility: large buttons, LED feedback, easy mounting --- ## Software Stack ### Edge device (Maix Duino / ESP32-S3) - Firmware: MaixPy or ESP-IDF - Client: `hardware/maixduino/maix_voice_client.py` - Audio: I2S capture and playback - Network: WiFi → Minerva server ### Server (runs on Heimdall or any Linux box) - Voice server: `scripts/voice_server.py` (Flask + Whisper + Precise) - Enhanced version: `scripts/voice_server_enhanced.py` (adds speaker ID via pyannote) - STT: Whisper (local) - Wake word: Mycroft Precise - TTS: Piper - Home Assistant: REST API integration (optional) - Conda env: `whisper_cli` (existing on Heimdall) --- ## Directory Structure ``` minerva/ ├── docs/ # Architecture, guides, reference docs │ ├── maix-voice-assistant-architecture.md │ ├── MYCROFT_PRECISE_GUIDE.md │ ├── PRECISE_DEPLOYMENT.md │ ├── ESP32_S3_VOICE_ASSISTANT_SPEC.md │ ├── HARDWARE_BUYING_GUIDE.md │ ├── LCD_CAMERA_FEATURES.md │ ├── K210_PERFORMANCE_VERIFICATION.md │ ├── WAKE_WORD_ADVANCED.md │ ├── ADVANCED_WAKE_WORD_TOPICS.md │ └── QUESTIONS_ANSWERED.md ├── scripts/ # Server-side scripts │ ├── voice_server.py # Core Flask + Whisper + Precise server │ ├── voice_server_enhanced.py # + speaker identification (pyannote) │ ├── setup_voice_assistant.sh # Server setup │ ├── setup_precise.sh # Mycroft Precise training environment │ └── download_pretrained_models.sh ├── hardware/ │ └── maixduino/ # K210 edge device scripts │ ├── maix_voice_client.py # Production client │ ├── maix_simple_record_test.py # Audio capture test │ ├── maix_test_simple.py # Hardware/network test │ ├── maix_debug_wifi.py # WiFi diagnostics │ ├── maix_discover_modules.py # Module discovery │ ├── secrets.py.example # WiFi/server credential template │ ├── MICROPYTHON_QUIRKS.md │ └── README.md ├── config/ │ └── .env.example # Server config template ├── models/ # Wake word models (gitignored, large) └── CLAUDE.md # This file ``` --- ## Credentials / Secrets **Never commit real credentials.** Pattern: - Server: copy `config/.env.example` → `config/.env`, fill in real values - Edge device: copy `hardware/maixduino/secrets.py.example` → `secrets.py`, fill in WiFi + server URL Both files are gitignored. `.example` files are committed as templates. --- ## Running the Server ```bash # Activate environment conda activate whisper_cli # Basic server (Whisper + Precise wake word) python scripts/voice_server.py \ --enable-precise \ --precise-model models/hey-minerva.net \ --precise-sensitivity 0.5 # Enhanced server (+ speaker identification) python scripts/voice_server_enhanced.py \ --enable-speaker-id \ --hf-token $HF_TOKEN # Test health curl http://localhost:5000/health curl http://localhost:5000/wake-word/status ``` --- ## Connection to CF Voice Infrastructure Minerva is the **hardware platform** for cf-voice. As `circuitforge_core.voice` matures: - `cf_voice.io` (STT/TTS) → replaces the ad hoc Whisper/Piper calls in `voice_server.py` - `cf_voice.context` (parallel classifier) → augments Mycroft Precise with tone/environment detection - `cf_voice.telephony` → future: Minerva as an always-on household linnet node Minerva hardware + cf-voice software = the CF reference voice assistant stack. --- ## Roadmap See Forgejo milestones on this repo. High-level: 1. **Alpha — Server-side pipeline** — Whisper + Precise + Piper working end-to-end on Heimdall 2. **Beta — Edge wake word** — wake word on K210 or ESP32-S3; audio only streams post-wake 3. **Hardware v1** — documented reference build; buying guide; assembly instructions 4. **cf-voice integration** — Minerva uses cf_voice modules from circuitforge-core 5. **Platform** — multiple hardware targets; custom PCB design --- ## Related - `cf-voice` module design: `circuitforge-plans/circuitforge-core/2026-04-06-cf-voice-design.md` - `linnet` product: real-time tone annotation, will eventually embed Minerva as a hardware node - Heimdall server: primary dev/deployment target (10.1.10.71 on LAN)