minerva/CLAUDE.md
pyr0ball 173f7f37d4 feat: import mycroft-precise work as Minerva foundation
Ports prior voice assistant research and prototypes from devl/Devops
into the Minerva repo. Includes:

- docs/: architecture, wake word guides, ESP32-S3 spec, hardware buying guide
- scripts/: voice_server.py, voice_server_enhanced.py, setup scripts
- hardware/maixduino/: edge device scripts with WiFi credentials scrubbed
  (replaced hardcoded password with secrets.py pattern)
- config/.env.example: server config template
- .gitignore: excludes .env, secrets.py, model blobs, ELF firmware
- CLAUDE.md: Minerva product context and connection to cf-voice roadmap
2026-04-06 22:21:12 -07:00

165 lines
5.8 KiB
Markdown

# Minerva — Developer Context
**Product code:** `MNRV`
**Status:** Concept / early prototype
**Domain:** Privacy-first, local-only voice assistant hardware platform
---
## What Minerva Is
A 100% local, FOSS voice assistant hardware platform. No cloud. No subscriptions. No data leaving the local network.
The goal is a reference hardware + software stack for a privacy-first voice assistant that anyone can build, extend, or self-host — including people without technical backgrounds if the assembly docs are good enough.
Core design principles (same as all CF products):
- **Local-first inference** — Whisper STT, Piper TTS, Mycroft Precise wake word all run on the host server
- **Edge where possible** — wake word detection moves to edge hardware over time (K210 → ESP32-S3 → custom)
- **No cloud dependency** — Home Assistant optional, not required
- **100% FOSS stack**
---
## Hardware Targets
### Phase 1 (current): Maix Duino (K210)
- K210 dual-core RISC-V @ 400MHz with KPU neural accelerator
- Audio: I2S microphone + speaker output
- Connectivity: ESP32 WiFi/BLE co-processor
- Programming: MaixPy (MicroPython)
- Status: server-side wake word working; edge inference in progress
### Phase 2: ESP32-S3
- More accessible, cheaper, better WiFi
- On-device wake word with Espressif ESP-SR
- See `docs/ESP32_S3_VOICE_ASSISTANT_SPEC.md`
### Phase 3: Custom hardware
- Dedicated PCB for CF reference platform
- Hardware-accelerated wake word + VAD
- Designed for accessibility: large buttons, LED feedback, easy mounting
---
## Software Stack
### Edge device (Maix Duino / ESP32-S3)
- Firmware: MaixPy or ESP-IDF
- Client: `hardware/maixduino/maix_voice_client.py`
- Audio: I2S capture and playback
- Network: WiFi → Minerva server
### Server (runs on Heimdall or any Linux box)
- Voice server: `scripts/voice_server.py` (Flask + Whisper + Precise)
- Enhanced version: `scripts/voice_server_enhanced.py` (adds speaker ID via pyannote)
- STT: Whisper (local)
- Wake word: Mycroft Precise
- TTS: Piper
- Home Assistant: REST API integration (optional)
- Conda env: `whisper_cli` (existing on Heimdall)
---
## Directory Structure
```
minerva/
├── docs/ # Architecture, guides, reference docs
│ ├── maix-voice-assistant-architecture.md
│ ├── MYCROFT_PRECISE_GUIDE.md
│ ├── PRECISE_DEPLOYMENT.md
│ ├── ESP32_S3_VOICE_ASSISTANT_SPEC.md
│ ├── HARDWARE_BUYING_GUIDE.md
│ ├── LCD_CAMERA_FEATURES.md
│ ├── K210_PERFORMANCE_VERIFICATION.md
│ ├── WAKE_WORD_ADVANCED.md
│ ├── ADVANCED_WAKE_WORD_TOPICS.md
│ └── QUESTIONS_ANSWERED.md
├── scripts/ # Server-side scripts
│ ├── voice_server.py # Core Flask + Whisper + Precise server
│ ├── voice_server_enhanced.py # + speaker identification (pyannote)
│ ├── setup_voice_assistant.sh # Server setup
│ ├── setup_precise.sh # Mycroft Precise training environment
│ └── download_pretrained_models.sh
├── hardware/
│ └── maixduino/ # K210 edge device scripts
│ ├── maix_voice_client.py # Production client
│ ├── maix_simple_record_test.py # Audio capture test
│ ├── maix_test_simple.py # Hardware/network test
│ ├── maix_debug_wifi.py # WiFi diagnostics
│ ├── maix_discover_modules.py # Module discovery
│ ├── secrets.py.example # WiFi/server credential template
│ ├── MICROPYTHON_QUIRKS.md
│ └── README.md
├── config/
│ └── .env.example # Server config template
├── models/ # Wake word models (gitignored, large)
└── CLAUDE.md # This file
```
---
## Credentials / Secrets
**Never commit real credentials.** Pattern:
- Server: copy `config/.env.example``config/.env`, fill in real values
- Edge device: copy `hardware/maixduino/secrets.py.example``secrets.py`, fill in WiFi + server URL
Both files are gitignored. `.example` files are committed as templates.
---
## Running the Server
```bash
# Activate environment
conda activate whisper_cli
# Basic server (Whisper + Precise wake word)
python scripts/voice_server.py \
--enable-precise \
--precise-model models/hey-minerva.net \
--precise-sensitivity 0.5
# Enhanced server (+ speaker identification)
python scripts/voice_server_enhanced.py \
--enable-speaker-id \
--hf-token $HF_TOKEN
# Test health
curl http://localhost:5000/health
curl http://localhost:5000/wake-word/status
```
---
## Connection to CF Voice Infrastructure
Minerva is the **hardware platform** for cf-voice. As `circuitforge_core.voice` matures:
- `cf_voice.io` (STT/TTS) → replaces the ad hoc Whisper/Piper calls in `voice_server.py`
- `cf_voice.context` (parallel classifier) → augments Mycroft Precise with tone/environment detection
- `cf_voice.telephony` → future: Minerva as an always-on household linnet node
Minerva hardware + cf-voice software = the CF reference voice assistant stack.
---
## Roadmap
See Forgejo milestones on this repo. High-level:
1. **Alpha — Server-side pipeline** — Whisper + Precise + Piper working end-to-end on Heimdall
2. **Beta — Edge wake word** — wake word on K210 or ESP32-S3; audio only streams post-wake
3. **Hardware v1** — documented reference build; buying guide; assembly instructions
4. **cf-voice integration** — Minerva uses cf_voice modules from circuitforge-core
5. **Platform** — multiple hardware targets; custom PCB design
---
## Related
- `cf-voice` module design: `circuitforge-plans/circuitforge-core/2026-04-06-cf-voice-design.md`
- `linnet` product: real-time tone annotation, will eventually embed Minerva as a hardware node
- Heimdall server: primary dev/deployment target (10.1.10.71 on LAN)