pyr0ball 173f7f37d4 feat: import mycroft-precise work as Minerva foundation

Ports prior voice assistant research and prototypes from devl/Devops
into the Minerva repo. Includes:

- docs/: architecture, wake word guides, ESP32-S3 spec, hardware buying guide
- scripts/: voice_server.py, voice_server_enhanced.py, setup scripts
- hardware/maixduino/: edge device scripts with WiFi credentials scrubbed
  (replaced hardcoded password with secrets.py pattern)
- config/.env.example: server config template
- .gitignore: excludes .env, secrets.py, model blobs, ELF firmware
- CLAUDE.md: Minerva product context and connection to cf-voice roadmap

2026-04-06 22:21:12 -07:00

5.8 KiB

Raw Blame History

Minerva — Developer Context

Product code: MNRV Status: Concept / early prototype Domain: Privacy-first, local-only voice assistant hardware platform

What Minerva Is

A 100% local, FOSS voice assistant hardware platform. No cloud. No subscriptions. No data leaving the local network.

The goal is a reference hardware + software stack for a privacy-first voice assistant that anyone can build, extend, or self-host — including people without technical backgrounds if the assembly docs are good enough.

Core design principles (same as all CF products):

Local-first inference — Whisper STT, Piper TTS, Mycroft Precise wake word all run on the host server
Edge where possible — wake word detection moves to edge hardware over time (K210 → ESP32-S3 → custom)
No cloud dependency — Home Assistant optional, not required
100% FOSS stack

Hardware Targets

Phase 1 (current): Maix Duino (K210)

K210 dual-core RISC-V @ 400MHz with KPU neural accelerator
Audio: I2S microphone + speaker output
Connectivity: ESP32 WiFi/BLE co-processor
Programming: MaixPy (MicroPython)
Status: server-side wake word working; edge inference in progress

Phase 2: ESP32-S3

More accessible, cheaper, better WiFi
On-device wake word with Espressif ESP-SR
See docs/ESP32_S3_VOICE_ASSISTANT_SPEC.md

Phase 3: Custom hardware

Dedicated PCB for CF reference platform
Hardware-accelerated wake word + VAD
Designed for accessibility: large buttons, LED feedback, easy mounting

Software Stack

Edge device (Maix Duino / ESP32-S3)

Firmware: MaixPy or ESP-IDF
Client: hardware/maixduino/maix_voice_client.py
Audio: I2S capture and playback
Network: WiFi → Minerva server

Server (runs on Heimdall or any Linux box)

Voice server: scripts/voice_server.py (Flask + Whisper + Precise)
Enhanced version: scripts/voice_server_enhanced.py (adds speaker ID via pyannote)
STT: Whisper (local)
Wake word: Mycroft Precise
TTS: Piper
Home Assistant: REST API integration (optional)
Conda env: whisper_cli (existing on Heimdall)

Directory Structure

minerva/
├── docs/                        # Architecture, guides, reference docs
│   ├── maix-voice-assistant-architecture.md
│   ├── MYCROFT_PRECISE_GUIDE.md
│   ├── PRECISE_DEPLOYMENT.md
│   ├── ESP32_S3_VOICE_ASSISTANT_SPEC.md
│   ├── HARDWARE_BUYING_GUIDE.md
│   ├── LCD_CAMERA_FEATURES.md
│   ├── K210_PERFORMANCE_VERIFICATION.md
│   ├── WAKE_WORD_ADVANCED.md
│   ├── ADVANCED_WAKE_WORD_TOPICS.md
│   └── QUESTIONS_ANSWERED.md
├── scripts/                     # Server-side scripts
│   ├── voice_server.py          # Core Flask + Whisper + Precise server
│   ├── voice_server_enhanced.py # + speaker identification (pyannote)
│   ├── setup_voice_assistant.sh # Server setup
│   ├── setup_precise.sh         # Mycroft Precise training environment
│   └── download_pretrained_models.sh
├── hardware/
│   └── maixduino/               # K210 edge device scripts
│       ├── maix_voice_client.py # Production client
│       ├── maix_simple_record_test.py  # Audio capture test
│       ├── maix_test_simple.py  # Hardware/network test
│       ├── maix_debug_wifi.py   # WiFi diagnostics
│       ├── maix_discover_modules.py    # Module discovery
│       ├── secrets.py.example   # WiFi/server credential template
│       ├── MICROPYTHON_QUIRKS.md
│       └── README.md
├── config/
│   └── .env.example             # Server config template
├── models/                      # Wake word models (gitignored, large)
└── CLAUDE.md                    # This file

Credentials / Secrets

Never commit real credentials. Pattern:

Server: copy config/.env.example → config/.env, fill in real values
Edge device: copy hardware/maixduino/secrets.py.example → secrets.py, fill in WiFi + server URL

Both files are gitignored. .example files are committed as templates.

Running the Server

# Activate environment
conda activate whisper_cli

# Basic server (Whisper + Precise wake word)
python scripts/voice_server.py \
    --enable-precise \
    --precise-model models/hey-minerva.net \
    --precise-sensitivity 0.5

# Enhanced server (+ speaker identification)
python scripts/voice_server_enhanced.py \
    --enable-speaker-id \
    --hf-token $HF_TOKEN

# Test health
curl http://localhost:5000/health
curl http://localhost:5000/wake-word/status

Connection to CF Voice Infrastructure

Minerva is the hardware platform for cf-voice. As circuitforge_core.voice matures:

cf_voice.io (STT/TTS) → replaces the ad hoc Whisper/Piper calls in voice_server.py
cf_voice.context (parallel classifier) → augments Mycroft Precise with tone/environment detection
cf_voice.telephony → future: Minerva as an always-on household linnet node

Minerva hardware + cf-voice software = the CF reference voice assistant stack.

Roadmap

See Forgejo milestones on this repo. High-level:

Alpha — Server-side pipeline — Whisper + Precise + Piper working end-to-end on Heimdall
Beta — Edge wake word — wake word on K210 or ESP32-S3; audio only streams post-wake
Hardware v1 — documented reference build; buying guide; assembly instructions
cf-voice integration — Minerva uses cf_voice modules from circuitforge-core
Platform — multiple hardware targets; custom PCB design

cf-voice module design: circuitforge-plans/circuitforge-core/2026-04-06-cf-voice-design.md
linnet product: real-time tone annotation, will eventually embed Minerva as a hardware node
Heimdall server: primary dev/deployment target (10.1.10.71 on LAN)

5.8 KiB Raw Blame History