pyr0ball 173f7f37d4 feat: import mycroft-precise work as Minerva foundation

Ports prior voice assistant research and prototypes from devl/Devops
into the Minerva repo. Includes:

- docs/: architecture, wake word guides, ESP32-S3 spec, hardware buying guide
- scripts/: voice_server.py, voice_server_enhanced.py, setup scripts
- hardware/maixduino/: edge device scripts with WiFi credentials scrubbed
  (replaced hardcoded password with secrets.py pattern)
- config/.env.example: server config template
- .gitignore: excludes .env, secrets.py, model blobs, ELF firmware
- CLAUDE.md: Minerva product context and connection to cf-voice roadmap

2026-04-06 22:21:12 -07:00

13 KiB

Executable file

Raw Blame History

Mycroft Precise Deployment Guide

Quick Reference: Server vs Edge Detection

Server-Side Detection (Recommended for Start)

Setup:

# 1. On Heimdall: Setup Precise
./setup_precise.sh --wake-word "hey computer"

# 2. Train your model (follow scripts in ~/precise-models/hey-computer/)
cd ~/precise-models/hey-computer
./1-record-wake-word.sh
./2-record-not-wake-word.sh
# Organize samples, then:
./3-train-model.sh
./4-test-model.sh

# 3. Start voice server with Precise
cd ~/voice-assistant
conda activate precise
python voice_server.py \
    --enable-precise \
    --precise-model ~/precise-models/hey-computer/hey-computer.net \
    --precise-sensitivity 0.5

Architecture:

Maix Duino → Continuous audio stream → Heimdall
Heimdall runs Precise on audio stream
On wake word: Process command with Whisper
Response → TTS → Stream back to Maix Duino

Pros: Easier setup, better accuracy, simple updates Cons: More network traffic, requires stable connection

Edge Detection (Advanced - Future Phase)

Setup:

# 1. Train model on Heimdall (same as above)
# 2. Convert to KMODEL for K210
# 3. Deploy to Maix Duino
# (See MYCROFT_PRECISE_GUIDE.md for detailed conversion steps)

Architecture:

Maix Duino runs Precise locally on K210
Only sends audio after wake word detected
Lower latency, less network traffic

Pros: Lower latency, less bandwidth, works offline Cons: Complex conversion, lower accuracy, harder updates

Phase-by-Phase Deployment

Phase 1: Server Setup (Day 1)

# On Heimdall
ssh alan@10.1.10.71

# 1. Setup voice assistant base
./setup_voice_assistant.sh

# 2. Setup Mycroft Precise
./setup_precise.sh --wake-word "hey computer"

# 3. Configure environment
vim ~/voice-assistant/config/.env

Update .env:

HA_URL=http://your-home-assistant:8123
HA_TOKEN=your_token_here
PRECISE_MODEL=/home/alan/precise-models/hey-computer/hey-computer.net
PRECISE_SENSITIVITY=0.5

Phase 2: Wake Word Training (Day 1-2)

# Navigate to training directory
cd ~/precise-models/hey-computer
conda activate precise

# Record samples (30-60 minutes)
./1-record-wake-word.sh    # Record 50-100 wake word samples
./2-record-not-wake-word.sh # Record 200-500 negative samples

# Organize samples
# Move 80% of wake-word recordings to wake-word/
# Move 20% of wake-word recordings to test/wake-word/
# Move 80% of not-wake-word to not-wake-word/
# Move 20% of not-wake-word to test/not-wake-word/

# Train model (30-60 minutes)
./3-train-model.sh

# Test model
./4-test-model.sh

# Evaluate on test set
./5-evaluate-model.sh

# Tune threshold
./6-tune-threshold.sh

Phase 3: Server Integration (Day 2)

Option A: Manual Testing

cd ~/voice-assistant
conda activate precise

# Start server with Precise enabled
python voice_server.py \
    --enable-precise \
    --precise-model ~/precise-models/hey-computer/hey-computer.net \
    --precise-sensitivity 0.5 \
    --ha-url http://your-ha:8123 \
    --ha-token your_token

Option B: Systemd Service

Update systemd service to use Precise environment:

sudo vim /etc/systemd/system/voice-assistant.service

[Unit]
Description=Voice Assistant with Wake Word Detection
After=network.target

[Service]
Type=simple
User=alan
WorkingDirectory=/home/alan/voice-assistant
Environment="PATH=/home/alan/miniconda3/envs/precise/bin:/usr/local/bin:/usr/bin:/bin"
EnvironmentFile=/home/alan/voice-assistant/config/.env
ExecStart=/home/alan/miniconda3/envs/precise/bin/python voice_server.py \
    --enable-precise \
    --precise-model /home/alan/precise-models/hey-computer/hey-computer.net \
    --precise-sensitivity 0.5
Restart=on-failure
RestartSec=10
StandardOutput=append:/home/alan/voice-assistant/logs/voice_assistant.log
StandardError=append:/home/alan/voice-assistant/logs/voice_assistant_error.log

[Install]
WantedBy=multi-user.target

Enable and start:

sudo systemctl daemon-reload
sudo systemctl enable voice-assistant
sudo systemctl start voice-assistant
sudo systemctl status voice-assistant

Phase 4: Maix Duino Setup (Day 2-3)

For server-side wake word detection, Maix Duino streams audio:

Update maix_voice_client.py:

# Use simplified mode - just stream audio
# Server handles wake word detection
CONTINUOUS_STREAM = True  # Enable continuous streaming
WAKE_WORD_CHECK_INTERVAL = 0  # Server-side detection

Flash and test:

Copy updated script to SD card
Boot Maix Duino
Check serial console for connection
Speak wake word
Verify server logs show detection

Phase 5: Testing & Tuning (Day 3-7)

Test Wake Word Detection

# Monitor server logs
journalctl -u voice-assistant -f

# Or check detections via API
curl http://10.1.10.71:5000/wake-word/detections

Test End-to-End Flow

Say wake word: "Hey Computer"
Wait for LED/beep on Maix Duino
Say command: "Turn on the living room lights"
Verify HA command executes
Hear TTS response

Monitor Performance

# Check wake word log
tail -f ~/voice-assistant/logs/wake_words.log

# Check false positive rate
grep "wake_word" ~/voice-assistant/logs/wake_words.log | wc -l

# Check accuracy
# Should see detections when you say wake word
# Should NOT see detections during normal conversation

Tune Sensitivity

If too many false positives:

# Increase threshold (more conservative)
# Edit systemd service or restart with:
python voice_server.py --precise-sensitivity 0.7

If missing wake words:

# Decrease threshold (more aggressive)
python voice_server.py --precise-sensitivity 0.3

Collect Hard Examples

# When you notice false positives, record them
cd ~/precise-models/hey-computer
precise-collect -f not-wake-word/false-positive-$(date +%s).wav

# When wake word is missed, record it
precise-collect -f wake-word/missed-$(date +%s).wav

# After collecting 10-20 examples, retrain
./3-train-model.sh

Monitoring Commands

Check System Status

# Service status
sudo systemctl status voice-assistant

# Server health
curl http://10.1.10.71:5000/health

# Wake word status
curl http://10.1.10.71:5000/wake-word/status

# Recent detections
curl http://10.1.10.71:5000/wake-word/detections

View Logs

# Real-time server logs
journalctl -u voice-assistant -f

# Last 50 lines
journalctl -u voice-assistant -n 50

# Specific log file
tail -f ~/voice-assistant/logs/voice_assistant.log

# Wake word detections
tail -f ~/voice-assistant/logs/wake_words.log

# Maix Duino serial console
screen /dev/ttyUSB0 115200

Performance Metrics

# CPU usage (should be ~5-10% idle, spikes during processing)
top -p $(pgrep -f voice_server.py)

# Memory usage
ps aux | grep voice_server.py

# Network traffic (if streaming audio)
iftop -i eth0  # or your network interface

Troubleshooting

Wake Word Not Detecting

Check model is loaded:

curl http://10.1.10.71:5000/wake-word/status
# Should show: "enabled": true

Test model directly:

conda activate precise
precise-listen ~/precise-models/hey-computer/hey-computer.net
# Speak wake word - should see "!"

Check sensitivity:

# Try lower threshold
precise-listen ~/precise-models/hey-computer/hey-computer.net -t 0.3

Verify audio input:

# Test microphone
arecord -d 5 test.wav
aplay test.wav

Too Many False Positives

Increase threshold:

# Edit service or restart with higher sensitivity
python voice_server.py --precise-sensitivity 0.7

Retrain with false positives:

cd ~/precise-models/hey-computer
# Record false triggers in not-wake-word/
precise-collect -f not-wake-word/false-triggers.wav
# Add to not-wake-word training set
./3-train-model.sh

Server Won't Start with Precise

Check Precise installation:

conda activate precise
python -c "from precise_runner import PreciseRunner; print('OK')"

Check engine:

precise-engine --version
# Should show: Precise v0.3.0

Check model file:

ls -lh ~/precise-models/hey-computer/hey-computer.net
file ~/precise-models/hey-computer/hey-computer.net

Check permissions:

chmod +x /usr/local/bin/precise-engine
chmod 644 ~/precise-models/hey-computer/hey-computer.net

Audio Quality Issues

Test audio path:

# Record test on server
arecord -f S16_LE -r 16000 -c 1 -d 5 test.wav

# Transcribe with Whisper
conda activate voice-assistant
python -c "
import whisper
model = whisper.load_model('base')
result = model.transcribe('test.wav')
print(result['text'])
"

If poor quality:

Check microphone connection
Verify sample rate (16kHz)
Test with USB microphone
Check for interference/noise

Maix Duino Connection Issues

Check WiFi:

# In Maix Duino serial console
import network
wlan = network.WLAN(network.STA_IF)
print(wlan.isconnected())
print(wlan.ifconfig())

Check server reachability:

# From Maix Duino
import urequests
response = urequests.get('http://10.1.10.71:5000/health')
print(response.json())

Check audio streaming:

# On Heimdall, monitor network
sudo tcpdump -i any -n host <maix-duino-ip>
# Should see continuous packets when streaming

Optimization Tips

Reduce Latency

Use smaller Whisper model:

# Edit .env
WHISPER_MODEL=base  # or tiny

Optimize Precise sensitivity:

# Find sweet spot between false positives and latency
# Lower threshold = faster trigger but more false positives

Pre-load models:

# Models load on startup, not first request
# Adds ~30s startup time but eliminates first-request delay

Improve Accuracy

Use larger Whisper model:
```
WHISPER_MODEL=large
```

Train more wake word samples:

# Aim for 100+ high-quality samples
# Diverse speakers, conditions, distances

Increase training epochs:

# In 3-train-model.sh
precise-train -e 120 hey-computer.net .  # vs default 60

Reduce False Positives

Collect hard negatives:

# Record TV, music, similar phrases
# Add to not-wake-word training set