Ports prior voice assistant research and prototypes from devl/Devops into the Minerva repo. Includes: - docs/: architecture, wake word guides, ESP32-S3 spec, hardware buying guide - scripts/: voice_server.py, voice_server_enhanced.py, setup scripts - hardware/maixduino/: edge device scripts with WiFi credentials scrubbed (replaced hardcoded password with secrets.py pattern) - config/.env.example: server config template - .gitignore: excludes .env, secrets.py, model blobs, ELF firmware - CLAUDE.md: Minerva product context and connection to cf-voice roadmap
5.8 KiB
Minerva — Developer Context
Product code: MNRV
Status: Concept / early prototype
Domain: Privacy-first, local-only voice assistant hardware platform
What Minerva Is
A 100% local, FOSS voice assistant hardware platform. No cloud. No subscriptions. No data leaving the local network.
The goal is a reference hardware + software stack for a privacy-first voice assistant that anyone can build, extend, or self-host — including people without technical backgrounds if the assembly docs are good enough.
Core design principles (same as all CF products):
- Local-first inference — Whisper STT, Piper TTS, Mycroft Precise wake word all run on the host server
- Edge where possible — wake word detection moves to edge hardware over time (K210 → ESP32-S3 → custom)
- No cloud dependency — Home Assistant optional, not required
- 100% FOSS stack
Hardware Targets
Phase 1 (current): Maix Duino (K210)
- K210 dual-core RISC-V @ 400MHz with KPU neural accelerator
- Audio: I2S microphone + speaker output
- Connectivity: ESP32 WiFi/BLE co-processor
- Programming: MaixPy (MicroPython)
- Status: server-side wake word working; edge inference in progress
Phase 2: ESP32-S3
- More accessible, cheaper, better WiFi
- On-device wake word with Espressif ESP-SR
- See
docs/ESP32_S3_VOICE_ASSISTANT_SPEC.md
Phase 3: Custom hardware
- Dedicated PCB for CF reference platform
- Hardware-accelerated wake word + VAD
- Designed for accessibility: large buttons, LED feedback, easy mounting
Software Stack
Edge device (Maix Duino / ESP32-S3)
- Firmware: MaixPy or ESP-IDF
- Client:
hardware/maixduino/maix_voice_client.py - Audio: I2S capture and playback
- Network: WiFi → Minerva server
Server (runs on Heimdall or any Linux box)
- Voice server:
scripts/voice_server.py(Flask + Whisper + Precise) - Enhanced version:
scripts/voice_server_enhanced.py(adds speaker ID via pyannote) - STT: Whisper (local)
- Wake word: Mycroft Precise
- TTS: Piper
- Home Assistant: REST API integration (optional)
- Conda env:
whisper_cli(existing on Heimdall)
Directory Structure
minerva/
├── docs/ # Architecture, guides, reference docs
│ ├── maix-voice-assistant-architecture.md
│ ├── MYCROFT_PRECISE_GUIDE.md
│ ├── PRECISE_DEPLOYMENT.md
│ ├── ESP32_S3_VOICE_ASSISTANT_SPEC.md
│ ├── HARDWARE_BUYING_GUIDE.md
│ ├── LCD_CAMERA_FEATURES.md
│ ├── K210_PERFORMANCE_VERIFICATION.md
│ ├── WAKE_WORD_ADVANCED.md
│ ├── ADVANCED_WAKE_WORD_TOPICS.md
│ └── QUESTIONS_ANSWERED.md
├── scripts/ # Server-side scripts
│ ├── voice_server.py # Core Flask + Whisper + Precise server
│ ├── voice_server_enhanced.py # + speaker identification (pyannote)
│ ├── setup_voice_assistant.sh # Server setup
│ ├── setup_precise.sh # Mycroft Precise training environment
│ └── download_pretrained_models.sh
├── hardware/
│ └── maixduino/ # K210 edge device scripts
│ ├── maix_voice_client.py # Production client
│ ├── maix_simple_record_test.py # Audio capture test
│ ├── maix_test_simple.py # Hardware/network test
│ ├── maix_debug_wifi.py # WiFi diagnostics
│ ├── maix_discover_modules.py # Module discovery
│ ├── secrets.py.example # WiFi/server credential template
│ ├── MICROPYTHON_QUIRKS.md
│ └── README.md
├── config/
│ └── .env.example # Server config template
├── models/ # Wake word models (gitignored, large)
└── CLAUDE.md # This file
Credentials / Secrets
Never commit real credentials. Pattern:
- Server: copy
config/.env.example→config/.env, fill in real values - Edge device: copy
hardware/maixduino/secrets.py.example→secrets.py, fill in WiFi + server URL
Both files are gitignored. .example files are committed as templates.
Running the Server
# Activate environment
conda activate whisper_cli
# Basic server (Whisper + Precise wake word)
python scripts/voice_server.py \
--enable-precise \
--precise-model models/hey-minerva.net \
--precise-sensitivity 0.5
# Enhanced server (+ speaker identification)
python scripts/voice_server_enhanced.py \
--enable-speaker-id \
--hf-token $HF_TOKEN
# Test health
curl http://localhost:5000/health
curl http://localhost:5000/wake-word/status
Connection to CF Voice Infrastructure
Minerva is the hardware platform for cf-voice. As circuitforge_core.voice matures:
cf_voice.io(STT/TTS) → replaces the ad hoc Whisper/Piper calls invoice_server.pycf_voice.context(parallel classifier) → augments Mycroft Precise with tone/environment detectioncf_voice.telephony→ future: Minerva as an always-on household linnet node
Minerva hardware + cf-voice software = the CF reference voice assistant stack.
Roadmap
See Forgejo milestones on this repo. High-level:
- Alpha — Server-side pipeline — Whisper + Precise + Piper working end-to-end on Heimdall
- Beta — Edge wake word — wake word on K210 or ESP32-S3; audio only streams post-wake
- Hardware v1 — documented reference build; buying guide; assembly instructions
- cf-voice integration — Minerva uses cf_voice modules from circuitforge-core
- Platform — multiple hardware targets; custom PCB design
Related
cf-voicemodule design:circuitforge-plans/circuitforge-core/2026-04-06-cf-voice-design.mdlinnetproduct: real-time tone annotation, will eventually embed Minerva as a hardware node- Heimdall server: primary dev/deployment target (10.1.10.71 on LAN)