Ports prior voice assistant research and prototypes from devl/Devops into the Minerva repo. Includes: - docs/: architecture, wake word guides, ESP32-S3 spec, hardware buying guide - scripts/: voice_server.py, voice_server_enhanced.py, setup scripts - hardware/maixduino/: edge device scripts with WiFi credentials scrubbed (replaced hardcoded password with secrets.py pattern) - config/.env.example: server config template - .gitignore: excludes .env, secrets.py, model blobs, ELF firmware - CLAUDE.md: Minerva product context and connection to cf-voice roadmap
13 KiB
Executable file
Mycroft Precise Deployment Guide
Quick Reference: Server vs Edge Detection
Server-Side Detection (Recommended for Start)
Setup:
# 1. On Heimdall: Setup Precise
./setup_precise.sh --wake-word "hey computer"
# 2. Train your model (follow scripts in ~/precise-models/hey-computer/)
cd ~/precise-models/hey-computer
./1-record-wake-word.sh
./2-record-not-wake-word.sh
# Organize samples, then:
./3-train-model.sh
./4-test-model.sh
# 3. Start voice server with Precise
cd ~/voice-assistant
conda activate precise
python voice_server.py \
--enable-precise \
--precise-model ~/precise-models/hey-computer/hey-computer.net \
--precise-sensitivity 0.5
Architecture:
- Maix Duino → Continuous audio stream → Heimdall
- Heimdall runs Precise on audio stream
- On wake word: Process command with Whisper
- Response → TTS → Stream back to Maix Duino
Pros: Easier setup, better accuracy, simple updates Cons: More network traffic, requires stable connection
Edge Detection (Advanced - Future Phase)
Setup:
# 1. Train model on Heimdall (same as above)
# 2. Convert to KMODEL for K210
# 3. Deploy to Maix Duino
# (See MYCROFT_PRECISE_GUIDE.md for detailed conversion steps)
Architecture:
- Maix Duino runs Precise locally on K210
- Only sends audio after wake word detected
- Lower latency, less network traffic
Pros: Lower latency, less bandwidth, works offline Cons: Complex conversion, lower accuracy, harder updates
Phase-by-Phase Deployment
Phase 1: Server Setup (Day 1)
# On Heimdall
ssh alan@10.1.10.71
# 1. Setup voice assistant base
./setup_voice_assistant.sh
# 2. Setup Mycroft Precise
./setup_precise.sh --wake-word "hey computer"
# 3. Configure environment
vim ~/voice-assistant/config/.env
Update .env:
HA_URL=http://your-home-assistant:8123
HA_TOKEN=your_token_here
PRECISE_MODEL=/home/alan/precise-models/hey-computer/hey-computer.net
PRECISE_SENSITIVITY=0.5
Phase 2: Wake Word Training (Day 1-2)
# Navigate to training directory
cd ~/precise-models/hey-computer
conda activate precise
# Record samples (30-60 minutes)
./1-record-wake-word.sh # Record 50-100 wake word samples
./2-record-not-wake-word.sh # Record 200-500 negative samples
# Organize samples
# Move 80% of wake-word recordings to wake-word/
# Move 20% of wake-word recordings to test/wake-word/
# Move 80% of not-wake-word to not-wake-word/
# Move 20% of not-wake-word to test/not-wake-word/
# Train model (30-60 minutes)
./3-train-model.sh
# Test model
./4-test-model.sh
# Evaluate on test set
./5-evaluate-model.sh
# Tune threshold
./6-tune-threshold.sh
Phase 3: Server Integration (Day 2)
Option A: Manual Testing
cd ~/voice-assistant
conda activate precise
# Start server with Precise enabled
python voice_server.py \
--enable-precise \
--precise-model ~/precise-models/hey-computer/hey-computer.net \
--precise-sensitivity 0.5 \
--ha-url http://your-ha:8123 \
--ha-token your_token
Option B: Systemd Service
Update systemd service to use Precise environment:
sudo vim /etc/systemd/system/voice-assistant.service
[Unit]
Description=Voice Assistant with Wake Word Detection
After=network.target
[Service]
Type=simple
User=alan
WorkingDirectory=/home/alan/voice-assistant
Environment="PATH=/home/alan/miniconda3/envs/precise/bin:/usr/local/bin:/usr/bin:/bin"
EnvironmentFile=/home/alan/voice-assistant/config/.env
ExecStart=/home/alan/miniconda3/envs/precise/bin/python voice_server.py \
--enable-precise \
--precise-model /home/alan/precise-models/hey-computer/hey-computer.net \
--precise-sensitivity 0.5
Restart=on-failure
RestartSec=10
StandardOutput=append:/home/alan/voice-assistant/logs/voice_assistant.log
StandardError=append:/home/alan/voice-assistant/logs/voice_assistant_error.log
[Install]
WantedBy=multi-user.target
Enable and start:
sudo systemctl daemon-reload
sudo systemctl enable voice-assistant
sudo systemctl start voice-assistant
sudo systemctl status voice-assistant
Phase 4: Maix Duino Setup (Day 2-3)
For server-side wake word detection, Maix Duino streams audio:
Update maix_voice_client.py:
# Use simplified mode - just stream audio
# Server handles wake word detection
CONTINUOUS_STREAM = True # Enable continuous streaming
WAKE_WORD_CHECK_INTERVAL = 0 # Server-side detection
Flash and test:
- Copy updated script to SD card
- Boot Maix Duino
- Check serial console for connection
- Speak wake word
- Verify server logs show detection
Phase 5: Testing & Tuning (Day 3-7)
Test Wake Word Detection
# Monitor server logs
journalctl -u voice-assistant -f
# Or check detections via API
curl http://10.1.10.71:5000/wake-word/detections
Test End-to-End Flow
- Say wake word: "Hey Computer"
- Wait for LED/beep on Maix Duino
- Say command: "Turn on the living room lights"
- Verify HA command executes
- Hear TTS response
Monitor Performance
# Check wake word log
tail -f ~/voice-assistant/logs/wake_words.log
# Check false positive rate
grep "wake_word" ~/voice-assistant/logs/wake_words.log | wc -l
# Check accuracy
# Should see detections when you say wake word
# Should NOT see detections during normal conversation
Tune Sensitivity
If too many false positives:
# Increase threshold (more conservative)
# Edit systemd service or restart with:
python voice_server.py --precise-sensitivity 0.7
If missing wake words:
# Decrease threshold (more aggressive)
python voice_server.py --precise-sensitivity 0.3
Collect Hard Examples
# When you notice false positives, record them
cd ~/precise-models/hey-computer
precise-collect -f not-wake-word/false-positive-$(date +%s).wav
# When wake word is missed, record it
precise-collect -f wake-word/missed-$(date +%s).wav
# After collecting 10-20 examples, retrain
./3-train-model.sh
Monitoring Commands
Check System Status
# Service status
sudo systemctl status voice-assistant
# Server health
curl http://10.1.10.71:5000/health
# Wake word status
curl http://10.1.10.71:5000/wake-word/status
# Recent detections
curl http://10.1.10.71:5000/wake-word/detections
View Logs
# Real-time server logs
journalctl -u voice-assistant -f
# Last 50 lines
journalctl -u voice-assistant -n 50
# Specific log file
tail -f ~/voice-assistant/logs/voice_assistant.log
# Wake word detections
tail -f ~/voice-assistant/logs/wake_words.log
# Maix Duino serial console
screen /dev/ttyUSB0 115200
Performance Metrics
# CPU usage (should be ~5-10% idle, spikes during processing)
top -p $(pgrep -f voice_server.py)
# Memory usage
ps aux | grep voice_server.py
# Network traffic (if streaming audio)
iftop -i eth0 # or your network interface
Troubleshooting
Wake Word Not Detecting
Check model is loaded:
curl http://10.1.10.71:5000/wake-word/status
# Should show: "enabled": true
Test model directly:
conda activate precise
precise-listen ~/precise-models/hey-computer/hey-computer.net
# Speak wake word - should see "!"
Check sensitivity:
# Try lower threshold
precise-listen ~/precise-models/hey-computer/hey-computer.net -t 0.3
Verify audio input:
# Test microphone
arecord -d 5 test.wav
aplay test.wav
Too Many False Positives
Increase threshold:
# Edit service or restart with higher sensitivity
python voice_server.py --precise-sensitivity 0.7
Retrain with false positives:
cd ~/precise-models/hey-computer
# Record false triggers in not-wake-word/
precise-collect -f not-wake-word/false-triggers.wav
# Add to not-wake-word training set
./3-train-model.sh
Server Won't Start with Precise
Check Precise installation:
conda activate precise
python -c "from precise_runner import PreciseRunner; print('OK')"
Check engine:
precise-engine --version
# Should show: Precise v0.3.0
Check model file:
ls -lh ~/precise-models/hey-computer/hey-computer.net
file ~/precise-models/hey-computer/hey-computer.net
Check permissions:
chmod +x /usr/local/bin/precise-engine
chmod 644 ~/precise-models/hey-computer/hey-computer.net
Audio Quality Issues
Test audio path:
# Record test on server
arecord -f S16_LE -r 16000 -c 1 -d 5 test.wav
# Transcribe with Whisper
conda activate voice-assistant
python -c "
import whisper
model = whisper.load_model('base')
result = model.transcribe('test.wav')
print(result['text'])
"
If poor quality:
- Check microphone connection
- Verify sample rate (16kHz)
- Test with USB microphone
- Check for interference/noise
Maix Duino Connection Issues
Check WiFi:
# In Maix Duino serial console
import network
wlan = network.WLAN(network.STA_IF)
print(wlan.isconnected())
print(wlan.ifconfig())
Check server reachability:
# From Maix Duino
import urequests
response = urequests.get('http://10.1.10.71:5000/health')
print(response.json())
Check audio streaming:
# On Heimdall, monitor network
sudo tcpdump -i any -n host <maix-duino-ip>
# Should see continuous packets when streaming
Optimization Tips
Reduce Latency
-
Use smaller Whisper model:
# Edit .env WHISPER_MODEL=base # or tiny -
Optimize Precise sensitivity:
# Find sweet spot between false positives and latency # Lower threshold = faster trigger but more false positives -
Pre-load models:
# Models load on startup, not first request # Adds ~30s startup time but eliminates first-request delay
Improve Accuracy
-
Use larger Whisper model:
WHISPER_MODEL=large -
Train more wake word samples:
# Aim for 100+ high-quality samples # Diverse speakers, conditions, distances -
Increase training epochs:
# In 3-train-model.sh precise-train -e 120 hey-computer.net . # vs default 60
Reduce False Positives
-
Collect hard negatives:
# Record TV, music, similar phrases # Add to not-wake-word training set -
Increase threshold:
--precise-sensitivity 0.7 # vs default 0.5 -
Use ensemble model:
# Run multiple models, require agreement # Advanced - requires code modification
Production Checklist
- Wake word model trained with 50+ samples
- Model tested with <5% false positive rate
- Server service enabled and auto-starting
- Home Assistant token configured
- Maix Duino WiFi configured
- End-to-end test successful
- Logs rotating properly
- Monitoring in place
- Backup of trained model
- Documentation updated
Backup and Recovery
Backup Trained Model
# Backup model
cp ~/precise-models/hey-computer/hey-computer.net \
~/precise-models/hey-computer/hey-computer.net.backup
# Backup to another host
scp ~/precise-models/hey-computer/hey-computer.net \
user@backup-host:/path/to/backups/
Restore from Backup
# Restore model
cp ~/precise-models/hey-computer/hey-computer.net.backup \
~/precise-models/hey-computer/hey-computer.net
# Restart service
sudo systemctl restart voice-assistant
Next Steps
Once basic server-side detection is working:
- Add more intents - Expand Home Assistant control
- Implement TTS playback - Complete the audio response loop
- Multi-room support - Deploy multiple Maix Duino units
- Voice profiles - Train model on family members
- Edge deployment - Convert model for K210 (advanced)
Resources
- Main guide: MYCROFT_PRECISE_GUIDE.md
- Quick start: QUICKSTART.md
- Architecture: maix-voice-assistant-architecture.md
- Mycroft Docs: https://github.com/MycroftAI/mycroft-precise
- Community: https://community.mycroft.ai/
Support
Log an Issue
# Collect debug info
echo "=== System Info ===" > debug.log
uname -a >> debug.log
conda list >> debug.log
echo "=== Service Status ===" >> debug.log
systemctl status voice-assistant >> debug.log
echo "=== Recent Logs ===" >> debug.log
journalctl -u voice-assistant -n 100 >> debug.log
echo "=== Wake Word Status ===" >> debug.log
curl http://10.1.10.71:5000/wake-word/status >> debug.log
Then share debug.log when asking for help.
Common Issues Database
| Symptom | Likely Cause | Solution |
|---|---|---|
| No wake detection | Model not loaded | Check /wake-word/status |
| Service won't start | Missing dependencies | Reinstall Precise |
| High false positives | Low threshold | Increase to 0.7+ |
| Missing wake words | High threshold | Decrease to 0.3-0.4 |
| Poor transcription | Bad audio quality | Check microphone |
| HA commands fail | Wrong token | Update .env |
| High CPU usage | Large Whisper model | Use smaller model |
Conclusion
With Mycroft Precise, you have complete control over your wake word detection. Start with server-side detection for easier debugging, collect good training data, and tune the threshold for your environment. Once it's working well, you can optionally optimize to edge detection for lower latency.
The key to success: Quality training data > Quantity
Happy voice assisting! 🎙️