minerva/CLAUDE.md
pyr0ball 173f7f37d4 feat: import mycroft-precise work as Minerva foundation
Ports prior voice assistant research and prototypes from devl/Devops
into the Minerva repo. Includes:

- docs/: architecture, wake word guides, ESP32-S3 spec, hardware buying guide
- scripts/: voice_server.py, voice_server_enhanced.py, setup scripts
- hardware/maixduino/: edge device scripts with WiFi credentials scrubbed
  (replaced hardcoded password with secrets.py pattern)
- config/.env.example: server config template
- .gitignore: excludes .env, secrets.py, model blobs, ELF firmware
- CLAUDE.md: Minerva product context and connection to cf-voice roadmap
2026-04-06 22:21:12 -07:00

5.8 KiB

Minerva — Developer Context

Product code: MNRV Status: Concept / early prototype Domain: Privacy-first, local-only voice assistant hardware platform


What Minerva Is

A 100% local, FOSS voice assistant hardware platform. No cloud. No subscriptions. No data leaving the local network.

The goal is a reference hardware + software stack for a privacy-first voice assistant that anyone can build, extend, or self-host — including people without technical backgrounds if the assembly docs are good enough.

Core design principles (same as all CF products):

  • Local-first inference — Whisper STT, Piper TTS, Mycroft Precise wake word all run on the host server
  • Edge where possible — wake word detection moves to edge hardware over time (K210 → ESP32-S3 → custom)
  • No cloud dependency — Home Assistant optional, not required
  • 100% FOSS stack

Hardware Targets

Phase 1 (current): Maix Duino (K210)

  • K210 dual-core RISC-V @ 400MHz with KPU neural accelerator
  • Audio: I2S microphone + speaker output
  • Connectivity: ESP32 WiFi/BLE co-processor
  • Programming: MaixPy (MicroPython)
  • Status: server-side wake word working; edge inference in progress

Phase 2: ESP32-S3

  • More accessible, cheaper, better WiFi
  • On-device wake word with Espressif ESP-SR
  • See docs/ESP32_S3_VOICE_ASSISTANT_SPEC.md

Phase 3: Custom hardware

  • Dedicated PCB for CF reference platform
  • Hardware-accelerated wake word + VAD
  • Designed for accessibility: large buttons, LED feedback, easy mounting

Software Stack

Edge device (Maix Duino / ESP32-S3)

  • Firmware: MaixPy or ESP-IDF
  • Client: hardware/maixduino/maix_voice_client.py
  • Audio: I2S capture and playback
  • Network: WiFi → Minerva server

Server (runs on Heimdall or any Linux box)

  • Voice server: scripts/voice_server.py (Flask + Whisper + Precise)
  • Enhanced version: scripts/voice_server_enhanced.py (adds speaker ID via pyannote)
  • STT: Whisper (local)
  • Wake word: Mycroft Precise
  • TTS: Piper
  • Home Assistant: REST API integration (optional)
  • Conda env: whisper_cli (existing on Heimdall)

Directory Structure

minerva/
├── docs/                        # Architecture, guides, reference docs
│   ├── maix-voice-assistant-architecture.md
│   ├── MYCROFT_PRECISE_GUIDE.md
│   ├── PRECISE_DEPLOYMENT.md
│   ├── ESP32_S3_VOICE_ASSISTANT_SPEC.md
│   ├── HARDWARE_BUYING_GUIDE.md
│   ├── LCD_CAMERA_FEATURES.md
│   ├── K210_PERFORMANCE_VERIFICATION.md
│   ├── WAKE_WORD_ADVANCED.md
│   ├── ADVANCED_WAKE_WORD_TOPICS.md
│   └── QUESTIONS_ANSWERED.md
├── scripts/                     # Server-side scripts
│   ├── voice_server.py          # Core Flask + Whisper + Precise server
│   ├── voice_server_enhanced.py # + speaker identification (pyannote)
│   ├── setup_voice_assistant.sh # Server setup
│   ├── setup_precise.sh         # Mycroft Precise training environment
│   └── download_pretrained_models.sh
├── hardware/
│   └── maixduino/               # K210 edge device scripts
│       ├── maix_voice_client.py # Production client
│       ├── maix_simple_record_test.py  # Audio capture test
│       ├── maix_test_simple.py  # Hardware/network test
│       ├── maix_debug_wifi.py   # WiFi diagnostics
│       ├── maix_discover_modules.py    # Module discovery
│       ├── secrets.py.example   # WiFi/server credential template
│       ├── MICROPYTHON_QUIRKS.md
│       └── README.md
├── config/
│   └── .env.example             # Server config template
├── models/                      # Wake word models (gitignored, large)
└── CLAUDE.md                    # This file

Credentials / Secrets

Never commit real credentials. Pattern:

  • Server: copy config/.env.exampleconfig/.env, fill in real values
  • Edge device: copy hardware/maixduino/secrets.py.examplesecrets.py, fill in WiFi + server URL

Both files are gitignored. .example files are committed as templates.


Running the Server

# Activate environment
conda activate whisper_cli

# Basic server (Whisper + Precise wake word)
python scripts/voice_server.py \
    --enable-precise \
    --precise-model models/hey-minerva.net \
    --precise-sensitivity 0.5

# Enhanced server (+ speaker identification)
python scripts/voice_server_enhanced.py \
    --enable-speaker-id \
    --hf-token $HF_TOKEN

# Test health
curl http://localhost:5000/health
curl http://localhost:5000/wake-word/status

Connection to CF Voice Infrastructure

Minerva is the hardware platform for cf-voice. As circuitforge_core.voice matures:

  • cf_voice.io (STT/TTS) → replaces the ad hoc Whisper/Piper calls in voice_server.py
  • cf_voice.context (parallel classifier) → augments Mycroft Precise with tone/environment detection
  • cf_voice.telephony → future: Minerva as an always-on household linnet node

Minerva hardware + cf-voice software = the CF reference voice assistant stack.


Roadmap

See Forgejo milestones on this repo. High-level:

  1. Alpha — Server-side pipeline — Whisper + Precise + Piper working end-to-end on Heimdall
  2. Beta — Edge wake word — wake word on K210 or ESP32-S3; audio only streams post-wake
  3. Hardware v1 — documented reference build; buying guide; assembly instructions
  4. cf-voice integration — Minerva uses cf_voice modules from circuitforge-core
  5. Platform — multiple hardware targets; custom PCB design

  • cf-voice module design: circuitforge-plans/circuitforge-core/2026-04-06-cf-voice-design.md
  • linnet product: real-time tone annotation, will eventually embed Minerva as a hardware node
  • Heimdall server: primary dev/deployment target (10.1.10.71 on LAN)