minerva/CLAUDE.md

# Minerva — Developer Context

**Product code:** `MNRV`
**Status:** Concept / early prototype
**Domain:** Privacy-first, local-only voice assistant hardware platform

---

## What Minerva Is

A 100% local, FOSS voice assistant hardware platform. No cloud. No subscriptions. No data leaving the local network.

The goal is a reference hardware + software stack for a privacy-first voice assistant that anyone can build, extend, or self-host — including people without technical backgrounds if the assembly docs are good enough.

Core design principles (same as all CF products):
- **Local-first inference** — Whisper STT, Piper TTS, Mycroft Precise wake word all run on the host server
- **Edge where possible** — wake word detection moves to edge hardware over time (K210 → ESP32-S3 → custom)
- **No cloud dependency** — Home Assistant optional, not required
- **100% FOSS stack**

---

## Hardware Targets

### Phase 1 (current): Maix Duino (K210)
- K210 dual-core RISC-V @ 400MHz with KPU neural accelerator
- Audio: I2S microphone + speaker output
- Connectivity: ESP32 WiFi/BLE co-processor
- Programming: MaixPy (MicroPython)
- Status: server-side wake word working; edge inference in progress

### Phase 2: ESP32-S3
- More accessible, cheaper, better WiFi
- On-device wake word with Espressif ESP-SR
- See `docs/ESP32_S3_VOICE_ASSISTANT_SPEC.md`

### Phase 3: Custom hardware
- Dedicated PCB for CF reference platform
- Hardware-accelerated wake word + VAD
- Designed for accessibility: large buttons, LED feedback, easy mounting

---

## Software Stack

### Edge device (Maix Duino / ESP32-S3)
- Firmware: MaixPy or ESP-IDF
- Client: `hardware/maixduino/maix_voice_client.py`
- Audio: I2S capture and playback
- Network: WiFi → Minerva server

### Server (runs on Heimdall or any Linux box)
- Voice server: `scripts/voice_server.py` (Flask + Whisper + Precise)
- Enhanced version: `scripts/voice_server_enhanced.py` (adds speaker ID via pyannote)
- STT: Whisper (local)
- Wake word: Mycroft Precise
- TTS: Piper
- Home Assistant: REST API integration (optional)
- Conda env: `whisper_cli` (existing on Heimdall)

---

## Directory Structure

```
minerva/
├── docs/                        # Architecture, guides, reference docs
│   ├── maix-voice-assistant-architecture.md
│   ├── MYCROFT_PRECISE_GUIDE.md
│   ├── PRECISE_DEPLOYMENT.md
│   ├── ESP32_S3_VOICE_ASSISTANT_SPEC.md
│   ├── HARDWARE_BUYING_GUIDE.md
│   ├── LCD_CAMERA_FEATURES.md
│   ├── K210_PERFORMANCE_VERIFICATION.md
│   ├── WAKE_WORD_ADVANCED.md
│   ├── ADVANCED_WAKE_WORD_TOPICS.md
│   └── QUESTIONS_ANSWERED.md
├── scripts/                     # Server-side scripts
│   ├── voice_server.py          # Core Flask + Whisper + Precise server
│   ├── voice_server_enhanced.py # + speaker identification (pyannote)
│   ├── setup_voice_assistant.sh # Server setup
│   ├── setup_precise.sh         # Mycroft Precise training environment
│   └── download_pretrained_models.sh
├── hardware/
│   └── maixduino/               # K210 edge device scripts
│       ├── maix_voice_client.py # Production client
│       ├── maix_simple_record_test.py  # Audio capture test
│       ├── maix_test_simple.py  # Hardware/network test
│       ├── maix_debug_wifi.py   # WiFi diagnostics
│       ├── maix_discover_modules.py    # Module discovery
│       ├── secrets.py.example   # WiFi/server credential template
│       ├── MICROPYTHON_QUIRKS.md
│       └── README.md
├── config/
│   └── .env.example             # Server config template
├── models/                      # Wake word models (gitignored, large)
└── CLAUDE.md                    # This file
```

---

## Credentials / Secrets

**Never commit real credentials.** Pattern:

- Server: copy `config/.env.example` → `config/.env`, fill in real values
- Edge device: copy `hardware/maixduino/secrets.py.example` → `secrets.py`, fill in WiFi + server URL

Both files are gitignored. `.example` files are committed as templates.

---

## Running the Server

```bash
# Activate environment
conda activate whisper_cli

# Basic server (Whisper + Precise wake word)
python scripts/voice_server.py \
    --enable-precise \
    --precise-model models/hey-minerva.net \
    --precise-sensitivity 0.5

# Enhanced server (+ speaker identification)
python scripts/voice_server_enhanced.py \
    --enable-speaker-id \
    --hf-token $HF_TOKEN

# Test health
curl http://localhost:5000/health
curl http://localhost:5000/wake-word/status
```

---

## Connection to CF Voice Infrastructure

Minerva is the **hardware platform** for cf-voice. As `circuitforge_core.voice` matures:

- `cf_voice.io` (STT/TTS) → replaces the ad hoc Whisper/Piper calls in `voice_server.py`
- `cf_voice.context` (parallel classifier) → augments Mycroft Precise with tone/environment detection
- `cf_voice.telephony` → future: Minerva as an always-on household linnet node

Minerva hardware + cf-voice software = the CF reference voice assistant stack.

---

## Roadmap

See Forgejo milestones on this repo. High-level:

1. **Alpha — Server-side pipeline** — Whisper + Precise + Piper working end-to-end on Heimdall
2. **Beta — Edge wake word** — wake word on K210 or ESP32-S3; audio only streams post-wake
3. **Hardware v1** — documented reference build; buying guide; assembly instructions
4. **cf-voice integration** — Minerva uses cf_voice modules from circuitforge-core
5. **Platform** — multiple hardware targets; custom PCB design

---

## Related

- `cf-voice` module design: `circuitforge-plans/circuitforge-core/2026-04-06-cf-voice-design.md`
- `linnet` product: real-time tone annotation, will eventually embed Minerva as a hardware node
- Heimdall server: primary dev/deployment target (10.1.10.71 on LAN)