Ports prior voice assistant research and prototypes from devl/Devops into the Minerva repo. Includes: - docs/: architecture, wake word guides, ESP32-S3 spec, hardware buying guide - scripts/: voice_server.py, voice_server_enhanced.py, setup scripts - hardware/maixduino/: edge device scripts with WiFi credentials scrubbed (replaced hardcoded password with secrets.py pattern) - config/.env.example: server config template - .gitignore: excludes .env, secrets.py, model blobs, ELF firmware - CLAUDE.md: Minerva product context and connection to cf-voice roadmap
577 lines
13 KiB
Markdown
Executable file
577 lines
13 KiB
Markdown
Executable file
# Mycroft Precise Deployment Guide
|
||
|
||
## Quick Reference: Server vs Edge Detection
|
||
|
||
### Server-Side Detection (Recommended for Start)
|
||
|
||
**Setup:**
|
||
```bash
|
||
# 1. On Heimdall: Setup Precise
|
||
./setup_precise.sh --wake-word "hey computer"
|
||
|
||
# 2. Train your model (follow scripts in ~/precise-models/hey-computer/)
|
||
cd ~/precise-models/hey-computer
|
||
./1-record-wake-word.sh
|
||
./2-record-not-wake-word.sh
|
||
# Organize samples, then:
|
||
./3-train-model.sh
|
||
./4-test-model.sh
|
||
|
||
# 3. Start voice server with Precise
|
||
cd ~/voice-assistant
|
||
conda activate precise
|
||
python voice_server.py \
|
||
--enable-precise \
|
||
--precise-model ~/precise-models/hey-computer/hey-computer.net \
|
||
--precise-sensitivity 0.5
|
||
```
|
||
|
||
**Architecture:**
|
||
- Maix Duino → Continuous audio stream → Heimdall
|
||
- Heimdall runs Precise on audio stream
|
||
- On wake word: Process command with Whisper
|
||
- Response → TTS → Stream back to Maix Duino
|
||
|
||
**Pros:** Easier setup, better accuracy, simple updates
|
||
**Cons:** More network traffic, requires stable connection
|
||
|
||
### Edge Detection (Advanced - Future Phase)
|
||
|
||
**Setup:**
|
||
```bash
|
||
# 1. Train model on Heimdall (same as above)
|
||
# 2. Convert to KMODEL for K210
|
||
# 3. Deploy to Maix Duino
|
||
# (See MYCROFT_PRECISE_GUIDE.md for detailed conversion steps)
|
||
```
|
||
|
||
**Architecture:**
|
||
- Maix Duino runs Precise locally on K210
|
||
- Only sends audio after wake word detected
|
||
- Lower latency, less network traffic
|
||
|
||
**Pros:** Lower latency, less bandwidth, works offline
|
||
**Cons:** Complex conversion, lower accuracy, harder updates
|
||
|
||
## Phase-by-Phase Deployment
|
||
|
||
### Phase 1: Server Setup (Day 1)
|
||
|
||
```bash
|
||
# On Heimdall
|
||
ssh alan@10.1.10.71
|
||
|
||
# 1. Setup voice assistant base
|
||
./setup_voice_assistant.sh
|
||
|
||
# 2. Setup Mycroft Precise
|
||
./setup_precise.sh --wake-word "hey computer"
|
||
|
||
# 3. Configure environment
|
||
vim ~/voice-assistant/config/.env
|
||
```
|
||
|
||
Update `.env`:
|
||
```bash
|
||
HA_URL=http://your-home-assistant:8123
|
||
HA_TOKEN=your_token_here
|
||
PRECISE_MODEL=/home/alan/precise-models/hey-computer/hey-computer.net
|
||
PRECISE_SENSITIVITY=0.5
|
||
```
|
||
|
||
### Phase 2: Wake Word Training (Day 1-2)
|
||
|
||
```bash
|
||
# Navigate to training directory
|
||
cd ~/precise-models/hey-computer
|
||
conda activate precise
|
||
|
||
# Record samples (30-60 minutes)
|
||
./1-record-wake-word.sh # Record 50-100 wake word samples
|
||
./2-record-not-wake-word.sh # Record 200-500 negative samples
|
||
|
||
# Organize samples
|
||
# Move 80% of wake-word recordings to wake-word/
|
||
# Move 20% of wake-word recordings to test/wake-word/
|
||
# Move 80% of not-wake-word to not-wake-word/
|
||
# Move 20% of not-wake-word to test/not-wake-word/
|
||
|
||
# Train model (30-60 minutes)
|
||
./3-train-model.sh
|
||
|
||
# Test model
|
||
./4-test-model.sh
|
||
|
||
# Evaluate on test set
|
||
./5-evaluate-model.sh
|
||
|
||
# Tune threshold
|
||
./6-tune-threshold.sh
|
||
```
|
||
|
||
### Phase 3: Server Integration (Day 2)
|
||
|
||
#### Option A: Manual Testing
|
||
|
||
```bash
|
||
cd ~/voice-assistant
|
||
conda activate precise
|
||
|
||
# Start server with Precise enabled
|
||
python voice_server.py \
|
||
--enable-precise \
|
||
--precise-model ~/precise-models/hey-computer/hey-computer.net \
|
||
--precise-sensitivity 0.5 \
|
||
--ha-url http://your-ha:8123 \
|
||
--ha-token your_token
|
||
```
|
||
|
||
#### Option B: Systemd Service
|
||
|
||
Update systemd service to use Precise environment:
|
||
|
||
```bash
|
||
sudo vim /etc/systemd/system/voice-assistant.service
|
||
```
|
||
|
||
```ini
|
||
[Unit]
|
||
Description=Voice Assistant with Wake Word Detection
|
||
After=network.target
|
||
|
||
[Service]
|
||
Type=simple
|
||
User=alan
|
||
WorkingDirectory=/home/alan/voice-assistant
|
||
Environment="PATH=/home/alan/miniconda3/envs/precise/bin:/usr/local/bin:/usr/bin:/bin"
|
||
EnvironmentFile=/home/alan/voice-assistant/config/.env
|
||
ExecStart=/home/alan/miniconda3/envs/precise/bin/python voice_server.py \
|
||
--enable-precise \
|
||
--precise-model /home/alan/precise-models/hey-computer/hey-computer.net \
|
||
--precise-sensitivity 0.5
|
||
Restart=on-failure
|
||
RestartSec=10
|
||
StandardOutput=append:/home/alan/voice-assistant/logs/voice_assistant.log
|
||
StandardError=append:/home/alan/voice-assistant/logs/voice_assistant_error.log
|
||
|
||
[Install]
|
||
WantedBy=multi-user.target
|
||
```
|
||
|
||
Enable and start:
|
||
```bash
|
||
sudo systemctl daemon-reload
|
||
sudo systemctl enable voice-assistant
|
||
sudo systemctl start voice-assistant
|
||
sudo systemctl status voice-assistant
|
||
```
|
||
|
||
### Phase 4: Maix Duino Setup (Day 2-3)
|
||
|
||
For server-side wake word detection, Maix Duino streams audio:
|
||
|
||
Update `maix_voice_client.py`:
|
||
|
||
```python
|
||
# Use simplified mode - just stream audio
|
||
# Server handles wake word detection
|
||
CONTINUOUS_STREAM = True # Enable continuous streaming
|
||
WAKE_WORD_CHECK_INTERVAL = 0 # Server-side detection
|
||
```
|
||
|
||
Flash and test:
|
||
1. Copy updated script to SD card
|
||
2. Boot Maix Duino
|
||
3. Check serial console for connection
|
||
4. Speak wake word
|
||
5. Verify server logs show detection
|
||
|
||
### Phase 5: Testing & Tuning (Day 3-7)
|
||
|
||
#### Test Wake Word Detection
|
||
|
||
```bash
|
||
# Monitor server logs
|
||
journalctl -u voice-assistant -f
|
||
|
||
# Or check detections via API
|
||
curl http://10.1.10.71:5000/wake-word/detections
|
||
```
|
||
|
||
#### Test End-to-End Flow
|
||
|
||
1. Say wake word: "Hey Computer"
|
||
2. Wait for LED/beep on Maix Duino
|
||
3. Say command: "Turn on the living room lights"
|
||
4. Verify HA command executes
|
||
5. Hear TTS response
|
||
|
||
#### Monitor Performance
|
||
|
||
```bash
|
||
# Check wake word log
|
||
tail -f ~/voice-assistant/logs/wake_words.log
|
||
|
||
# Check false positive rate
|
||
grep "wake_word" ~/voice-assistant/logs/wake_words.log | wc -l
|
||
|
||
# Check accuracy
|
||
# Should see detections when you say wake word
|
||
# Should NOT see detections during normal conversation
|
||
```
|
||
|
||
#### Tune Sensitivity
|
||
|
||
If too many false positives:
|
||
```bash
|
||
# Increase threshold (more conservative)
|
||
# Edit systemd service or restart with:
|
||
python voice_server.py --precise-sensitivity 0.7
|
||
```
|
||
|
||
If missing wake words:
|
||
```bash
|
||
# Decrease threshold (more aggressive)
|
||
python voice_server.py --precise-sensitivity 0.3
|
||
```
|
||
|
||
#### Collect Hard Examples
|
||
|
||
```bash
|
||
# When you notice false positives, record them
|
||
cd ~/precise-models/hey-computer
|
||
precise-collect -f not-wake-word/false-positive-$(date +%s).wav
|
||
|
||
# When wake word is missed, record it
|
||
precise-collect -f wake-word/missed-$(date +%s).wav
|
||
|
||
# After collecting 10-20 examples, retrain
|
||
./3-train-model.sh
|
||
```
|
||
|
||
## Monitoring Commands
|
||
|
||
### Check System Status
|
||
|
||
```bash
|
||
# Service status
|
||
sudo systemctl status voice-assistant
|
||
|
||
# Server health
|
||
curl http://10.1.10.71:5000/health
|
||
|
||
# Wake word status
|
||
curl http://10.1.10.71:5000/wake-word/status
|
||
|
||
# Recent detections
|
||
curl http://10.1.10.71:5000/wake-word/detections
|
||
```
|
||
|
||
### View Logs
|
||
|
||
```bash
|
||
# Real-time server logs
|
||
journalctl -u voice-assistant -f
|
||
|
||
# Last 50 lines
|
||
journalctl -u voice-assistant -n 50
|
||
|
||
# Specific log file
|
||
tail -f ~/voice-assistant/logs/voice_assistant.log
|
||
|
||
# Wake word detections
|
||
tail -f ~/voice-assistant/logs/wake_words.log
|
||
|
||
# Maix Duino serial console
|
||
screen /dev/ttyUSB0 115200
|
||
```
|
||
|
||
### Performance Metrics
|
||
|
||
```bash
|
||
# CPU usage (should be ~5-10% idle, spikes during processing)
|
||
top -p $(pgrep -f voice_server.py)
|
||
|
||
# Memory usage
|
||
ps aux | grep voice_server.py
|
||
|
||
# Network traffic (if streaming audio)
|
||
iftop -i eth0 # or your network interface
|
||
```
|
||
|
||
## Troubleshooting
|
||
|
||
### Wake Word Not Detecting
|
||
|
||
**Check model is loaded:**
|
||
```bash
|
||
curl http://10.1.10.71:5000/wake-word/status
|
||
# Should show: "enabled": true
|
||
```
|
||
|
||
**Test model directly:**
|
||
```bash
|
||
conda activate precise
|
||
precise-listen ~/precise-models/hey-computer/hey-computer.net
|
||
# Speak wake word - should see "!"
|
||
```
|
||
|
||
**Check sensitivity:**
|
||
```bash
|
||
# Try lower threshold
|
||
precise-listen ~/precise-models/hey-computer/hey-computer.net -t 0.3
|
||
```
|
||
|
||
**Verify audio input:**
|
||
```bash
|
||
# Test microphone
|
||
arecord -d 5 test.wav
|
||
aplay test.wav
|
||
```
|
||
|
||
### Too Many False Positives
|
||
|
||
**Increase threshold:**
|
||
```bash
|
||
# Edit service or restart with higher sensitivity
|
||
python voice_server.py --precise-sensitivity 0.7
|
||
```
|
||
|
||
**Retrain with false positives:**
|
||
```bash
|
||
cd ~/precise-models/hey-computer
|
||
# Record false triggers in not-wake-word/
|
||
precise-collect -f not-wake-word/false-triggers.wav
|
||
# Add to not-wake-word training set
|
||
./3-train-model.sh
|
||
```
|
||
|
||
### Server Won't Start with Precise
|
||
|
||
**Check Precise installation:**
|
||
```bash
|
||
conda activate precise
|
||
python -c "from precise_runner import PreciseRunner; print('OK')"
|
||
```
|
||
|
||
**Check engine:**
|
||
```bash
|
||
precise-engine --version
|
||
# Should show: Precise v0.3.0
|
||
```
|
||
|
||
**Check model file:**
|
||
```bash
|
||
ls -lh ~/precise-models/hey-computer/hey-computer.net
|
||
file ~/precise-models/hey-computer/hey-computer.net
|
||
```
|
||
|
||
**Check permissions:**
|
||
```bash
|
||
chmod +x /usr/local/bin/precise-engine
|
||
chmod 644 ~/precise-models/hey-computer/hey-computer.net
|
||
```
|
||
|
||
### Audio Quality Issues
|
||
|
||
**Test audio path:**
|
||
```bash
|
||
# Record test on server
|
||
arecord -f S16_LE -r 16000 -c 1 -d 5 test.wav
|
||
|
||
# Transcribe with Whisper
|
||
conda activate voice-assistant
|
||
python -c "
|
||
import whisper
|
||
model = whisper.load_model('base')
|
||
result = model.transcribe('test.wav')
|
||
print(result['text'])
|
||
"
|
||
```
|
||
|
||
**If poor quality:**
|
||
- Check microphone connection
|
||
- Verify sample rate (16kHz)
|
||
- Test with USB microphone
|
||
- Check for interference/noise
|
||
|
||
### Maix Duino Connection Issues
|
||
|
||
**Check WiFi:**
|
||
```python
|
||
# In Maix Duino serial console
|
||
import network
|
||
wlan = network.WLAN(network.STA_IF)
|
||
print(wlan.isconnected())
|
||
print(wlan.ifconfig())
|
||
```
|
||
|
||
**Check server reachability:**
|
||
```python
|
||
# From Maix Duino
|
||
import urequests
|
||
response = urequests.get('http://10.1.10.71:5000/health')
|
||
print(response.json())
|
||
```
|
||
|
||
**Check audio streaming:**
|
||
```bash
|
||
# On Heimdall, monitor network
|
||
sudo tcpdump -i any -n host <maix-duino-ip>
|
||
# Should see continuous packets when streaming
|
||
```
|
||
|
||
## Optimization Tips
|
||
|
||
### Reduce Latency
|
||
|
||
1. **Use smaller Whisper model:**
|
||
```bash
|
||
# Edit .env
|
||
WHISPER_MODEL=base # or tiny
|
||
```
|
||
|
||
2. **Optimize Precise sensitivity:**
|
||
```bash
|
||
# Find sweet spot between false positives and latency
|
||
# Lower threshold = faster trigger but more false positives
|
||
```
|
||
|
||
3. **Pre-load models:**
|
||
```python
|
||
# Models load on startup, not first request
|
||
# Adds ~30s startup time but eliminates first-request delay
|
||
```
|
||
|
||
### Improve Accuracy
|
||
|
||
1. **Use larger Whisper model:**
|
||
```bash
|
||
WHISPER_MODEL=large
|
||
```
|
||
|
||
2. **Train more wake word samples:**
|
||
```bash
|
||
# Aim for 100+ high-quality samples
|
||
# Diverse speakers, conditions, distances
|
||
```
|
||
|
||
3. **Increase training epochs:**
|
||
```bash
|
||
# In 3-train-model.sh
|
||
precise-train -e 120 hey-computer.net . # vs default 60
|
||
```
|
||
|
||
### Reduce False Positives
|
||
|
||
1. **Collect hard negatives:**
|
||
```bash
|
||
# Record TV, music, similar phrases
|
||
# Add to not-wake-word training set
|
||
```
|
||
|
||
2. **Increase threshold:**
|
||
```bash
|
||
--precise-sensitivity 0.7 # vs default 0.5
|
||
```
|
||
|
||
3. **Use ensemble model:**
|
||
```python
|
||
# Run multiple models, require agreement
|
||
# Advanced - requires code modification
|
||
```
|
||
|
||
## Production Checklist
|
||
|
||
- [ ] Wake word model trained with 50+ samples
|
||
- [ ] Model tested with <5% false positive rate
|
||
- [ ] Server service enabled and auto-starting
|
||
- [ ] Home Assistant token configured
|
||
- [ ] Maix Duino WiFi configured
|
||
- [ ] End-to-end test successful
|
||
- [ ] Logs rotating properly
|
||
- [ ] Monitoring in place
|
||
- [ ] Backup of trained model
|
||
- [ ] Documentation updated
|
||
|
||
## Backup and Recovery
|
||
|
||
### Backup Trained Model
|
||
|
||
```bash
|
||
# Backup model
|
||
cp ~/precise-models/hey-computer/hey-computer.net \
|
||
~/precise-models/hey-computer/hey-computer.net.backup
|
||
|
||
# Backup to another host
|
||
scp ~/precise-models/hey-computer/hey-computer.net \
|
||
user@backup-host:/path/to/backups/
|
||
```
|
||
|
||
### Restore from Backup
|
||
|
||
```bash
|
||
# Restore model
|
||
cp ~/precise-models/hey-computer/hey-computer.net.backup \
|
||
~/precise-models/hey-computer/hey-computer.net
|
||
|
||
# Restart service
|
||
sudo systemctl restart voice-assistant
|
||
```
|
||
|
||
## Next Steps
|
||
|
||
Once basic server-side detection is working:
|
||
|
||
1. **Add more intents** - Expand Home Assistant control
|
||
2. **Implement TTS playback** - Complete the audio response loop
|
||
3. **Multi-room support** - Deploy multiple Maix Duino units
|
||
4. **Voice profiles** - Train model on family members
|
||
5. **Edge deployment** - Convert model for K210 (advanced)
|
||
|
||
## Resources
|
||
|
||
- Main guide: MYCROFT_PRECISE_GUIDE.md
|
||
- Quick start: QUICKSTART.md
|
||
- Architecture: maix-voice-assistant-architecture.md
|
||
- Mycroft Docs: https://github.com/MycroftAI/mycroft-precise
|
||
- Community: https://community.mycroft.ai/
|
||
|
||
## Support
|
||
|
||
### Log an Issue
|
||
|
||
```bash
|
||
# Collect debug info
|
||
echo "=== System Info ===" > debug.log
|
||
uname -a >> debug.log
|
||
conda list >> debug.log
|
||
echo "=== Service Status ===" >> debug.log
|
||
systemctl status voice-assistant >> debug.log
|
||
echo "=== Recent Logs ===" >> debug.log
|
||
journalctl -u voice-assistant -n 100 >> debug.log
|
||
echo "=== Wake Word Status ===" >> debug.log
|
||
curl http://10.1.10.71:5000/wake-word/status >> debug.log
|
||
```
|
||
|
||
Then share `debug.log` when asking for help.
|
||
|
||
### Common Issues Database
|
||
|
||
| Symptom | Likely Cause | Solution |
|
||
|---------|--------------|----------|
|
||
| No wake detection | Model not loaded | Check `/wake-word/status` |
|
||
| Service won't start | Missing dependencies | Reinstall Precise |
|
||
| High false positives | Low threshold | Increase to 0.7+ |
|
||
| Missing wake words | High threshold | Decrease to 0.3-0.4 |
|
||
| Poor transcription | Bad audio quality | Check microphone |
|
||
| HA commands fail | Wrong token | Update .env |
|
||
| High CPU usage | Large Whisper model | Use smaller model |
|
||
|
||
## Conclusion
|
||
|
||
With Mycroft Precise, you have complete control over your wake word detection. Start with server-side detection for easier debugging, collect good training data, and tune the threshold for your environment. Once it's working well, you can optionally optimize to edge detection for lower latency.
|
||
|
||
The key to success: **Quality training data > Quantity**
|
||
|
||
Happy voice assisting! 🎙️
|