Ports prior voice assistant research and prototypes from devl/Devops into the Minerva repo. Includes: - docs/: architecture, wake word guides, ESP32-S3 spec, hardware buying guide - scripts/: voice_server.py, voice_server_enhanced.py, setup scripts - hardware/maixduino/: edge device scripts with WiFi credentials scrubbed (replaced hardcoded password with secrets.py pattern) - config/.env.example: server config template - .gitignore: excludes .env, secrets.py, model blobs, ELF firmware - CLAUDE.md: Minerva product context and connection to cf-voice roadmap
165 lines
5.8 KiB
Markdown
165 lines
5.8 KiB
Markdown
# Minerva — Developer Context
|
|
|
|
**Product code:** `MNRV`
|
|
**Status:** Concept / early prototype
|
|
**Domain:** Privacy-first, local-only voice assistant hardware platform
|
|
|
|
---
|
|
|
|
## What Minerva Is
|
|
|
|
A 100% local, FOSS voice assistant hardware platform. No cloud. No subscriptions. No data leaving the local network.
|
|
|
|
The goal is a reference hardware + software stack for a privacy-first voice assistant that anyone can build, extend, or self-host — including people without technical backgrounds if the assembly docs are good enough.
|
|
|
|
Core design principles (same as all CF products):
|
|
- **Local-first inference** — Whisper STT, Piper TTS, Mycroft Precise wake word all run on the host server
|
|
- **Edge where possible** — wake word detection moves to edge hardware over time (K210 → ESP32-S3 → custom)
|
|
- **No cloud dependency** — Home Assistant optional, not required
|
|
- **100% FOSS stack**
|
|
|
|
---
|
|
|
|
## Hardware Targets
|
|
|
|
### Phase 1 (current): Maix Duino (K210)
|
|
- K210 dual-core RISC-V @ 400MHz with KPU neural accelerator
|
|
- Audio: I2S microphone + speaker output
|
|
- Connectivity: ESP32 WiFi/BLE co-processor
|
|
- Programming: MaixPy (MicroPython)
|
|
- Status: server-side wake word working; edge inference in progress
|
|
|
|
### Phase 2: ESP32-S3
|
|
- More accessible, cheaper, better WiFi
|
|
- On-device wake word with Espressif ESP-SR
|
|
- See `docs/ESP32_S3_VOICE_ASSISTANT_SPEC.md`
|
|
|
|
### Phase 3: Custom hardware
|
|
- Dedicated PCB for CF reference platform
|
|
- Hardware-accelerated wake word + VAD
|
|
- Designed for accessibility: large buttons, LED feedback, easy mounting
|
|
|
|
---
|
|
|
|
## Software Stack
|
|
|
|
### Edge device (Maix Duino / ESP32-S3)
|
|
- Firmware: MaixPy or ESP-IDF
|
|
- Client: `hardware/maixduino/maix_voice_client.py`
|
|
- Audio: I2S capture and playback
|
|
- Network: WiFi → Minerva server
|
|
|
|
### Server (runs on Heimdall or any Linux box)
|
|
- Voice server: `scripts/voice_server.py` (Flask + Whisper + Precise)
|
|
- Enhanced version: `scripts/voice_server_enhanced.py` (adds speaker ID via pyannote)
|
|
- STT: Whisper (local)
|
|
- Wake word: Mycroft Precise
|
|
- TTS: Piper
|
|
- Home Assistant: REST API integration (optional)
|
|
- Conda env: `whisper_cli` (existing on Heimdall)
|
|
|
|
---
|
|
|
|
## Directory Structure
|
|
|
|
```
|
|
minerva/
|
|
├── docs/ # Architecture, guides, reference docs
|
|
│ ├── maix-voice-assistant-architecture.md
|
|
│ ├── MYCROFT_PRECISE_GUIDE.md
|
|
│ ├── PRECISE_DEPLOYMENT.md
|
|
│ ├── ESP32_S3_VOICE_ASSISTANT_SPEC.md
|
|
│ ├── HARDWARE_BUYING_GUIDE.md
|
|
│ ├── LCD_CAMERA_FEATURES.md
|
|
│ ├── K210_PERFORMANCE_VERIFICATION.md
|
|
│ ├── WAKE_WORD_ADVANCED.md
|
|
│ ├── ADVANCED_WAKE_WORD_TOPICS.md
|
|
│ └── QUESTIONS_ANSWERED.md
|
|
├── scripts/ # Server-side scripts
|
|
│ ├── voice_server.py # Core Flask + Whisper + Precise server
|
|
│ ├── voice_server_enhanced.py # + speaker identification (pyannote)
|
|
│ ├── setup_voice_assistant.sh # Server setup
|
|
│ ├── setup_precise.sh # Mycroft Precise training environment
|
|
│ └── download_pretrained_models.sh
|
|
├── hardware/
|
|
│ └── maixduino/ # K210 edge device scripts
|
|
│ ├── maix_voice_client.py # Production client
|
|
│ ├── maix_simple_record_test.py # Audio capture test
|
|
│ ├── maix_test_simple.py # Hardware/network test
|
|
│ ├── maix_debug_wifi.py # WiFi diagnostics
|
|
│ ├── maix_discover_modules.py # Module discovery
|
|
│ ├── secrets.py.example # WiFi/server credential template
|
|
│ ├── MICROPYTHON_QUIRKS.md
|
|
│ └── README.md
|
|
├── config/
|
|
│ └── .env.example # Server config template
|
|
├── models/ # Wake word models (gitignored, large)
|
|
└── CLAUDE.md # This file
|
|
```
|
|
|
|
---
|
|
|
|
## Credentials / Secrets
|
|
|
|
**Never commit real credentials.** Pattern:
|
|
|
|
- Server: copy `config/.env.example` → `config/.env`, fill in real values
|
|
- Edge device: copy `hardware/maixduino/secrets.py.example` → `secrets.py`, fill in WiFi + server URL
|
|
|
|
Both files are gitignored. `.example` files are committed as templates.
|
|
|
|
---
|
|
|
|
## Running the Server
|
|
|
|
```bash
|
|
# Activate environment
|
|
conda activate whisper_cli
|
|
|
|
# Basic server (Whisper + Precise wake word)
|
|
python scripts/voice_server.py \
|
|
--enable-precise \
|
|
--precise-model models/hey-minerva.net \
|
|
--precise-sensitivity 0.5
|
|
|
|
# Enhanced server (+ speaker identification)
|
|
python scripts/voice_server_enhanced.py \
|
|
--enable-speaker-id \
|
|
--hf-token $HF_TOKEN
|
|
|
|
# Test health
|
|
curl http://localhost:5000/health
|
|
curl http://localhost:5000/wake-word/status
|
|
```
|
|
|
|
---
|
|
|
|
## Connection to CF Voice Infrastructure
|
|
|
|
Minerva is the **hardware platform** for cf-voice. As `circuitforge_core.voice` matures:
|
|
|
|
- `cf_voice.io` (STT/TTS) → replaces the ad hoc Whisper/Piper calls in `voice_server.py`
|
|
- `cf_voice.context` (parallel classifier) → augments Mycroft Precise with tone/environment detection
|
|
- `cf_voice.telephony` → future: Minerva as an always-on household linnet node
|
|
|
|
Minerva hardware + cf-voice software = the CF reference voice assistant stack.
|
|
|
|
---
|
|
|
|
## Roadmap
|
|
|
|
See Forgejo milestones on this repo. High-level:
|
|
|
|
1. **Alpha — Server-side pipeline** — Whisper + Precise + Piper working end-to-end on Heimdall
|
|
2. **Beta — Edge wake word** — wake word on K210 or ESP32-S3; audio only streams post-wake
|
|
3. **Hardware v1** — documented reference build; buying guide; assembly instructions
|
|
4. **cf-voice integration** — Minerva uses cf_voice modules from circuitforge-core
|
|
5. **Platform** — multiple hardware targets; custom PCB design
|
|
|
|
---
|
|
|
|
## Related
|
|
|
|
- `cf-voice` module design: `circuitforge-plans/circuitforge-core/2026-04-06-cf-voice-design.md`
|
|
- `linnet` product: real-time tone annotation, will eventually embed Minerva as a hardware node
|
|
- Heimdall server: primary dev/deployment target (10.1.10.71 on LAN)
|