Ports prior voice assistant research and prototypes from devl/Devops into the Minerva repo. Includes: - docs/: architecture, wake word guides, ESP32-S3 spec, hardware buying guide - scripts/: voice_server.py, voice_server_enhanced.py, setup scripts - hardware/maixduino/: edge device scripts with WiFi credentials scrubbed (replaced hardcoded password with secrets.py pattern) - config/.env.example: server config template - .gitignore: excludes .env, secrets.py, model blobs, ELF firmware - CLAUDE.md: Minerva product context and connection to cf-voice roadmap
421 lines
12 KiB
Markdown
Executable file
421 lines
12 KiB
Markdown
Executable file
# Maix Duino Voice Assistant - Quick Start Guide
|
|
|
|
## Overview
|
|
This guide will walk you through setting up a local, privacy-focused voice assistant using your Maix Duino board and Home Assistant integration. All processing happens on your local network - no cloud services required.
|
|
|
|
## What You'll Build
|
|
- Wake word detection on Maix Duino (edge device)
|
|
- Speech-to-text using Whisper on Heimdall
|
|
- Home Assistant integration for smart home control
|
|
- Text-to-speech responses using Piper
|
|
- All processing local to your 10.1.10.0/24 network
|
|
|
|
## Hardware Requirements
|
|
- [x] Sipeed Maix Duino board (you have this!)
|
|
- [ ] I2S MEMS microphone (or microphone array)
|
|
- [ ] Small speaker (3-5W) or audio output
|
|
- [ ] MicroSD card (4GB+) formatted as FAT32
|
|
- [ ] USB-C cable for power and programming
|
|
|
|
## Network Prerequisites
|
|
- Maix Duino will need WiFi access to your 10.1.10.0/24 network
|
|
- Heimdall (10.1.10.71) for AI processing
|
|
- Home Assistant instance (configure URL in setup)
|
|
|
|
## Setup Process
|
|
|
|
### Phase 1: Server Setup (Heimdall)
|
|
|
|
#### Step 1: Run the setup script
|
|
```bash
|
|
# Transfer files to Heimdall
|
|
scp setup_voice_assistant.sh voice_server.py alan@10.1.10.71:~/
|
|
|
|
# SSH to Heimdall
|
|
ssh alan@10.1.10.71
|
|
|
|
# Make setup script executable and run it
|
|
chmod +x setup_voice_assistant.sh
|
|
./setup_voice_assistant.sh
|
|
```
|
|
|
|
#### Step 2: Configure Home Assistant access
|
|
```bash
|
|
# Edit the config file
|
|
vim ~/voice-assistant/config/.env
|
|
```
|
|
|
|
Update these values:
|
|
```env
|
|
HA_URL=http://your-home-assistant:8123
|
|
HA_TOKEN=your_long_lived_access_token_here
|
|
```
|
|
|
|
To get a long-lived access token:
|
|
1. Open Home Assistant
|
|
2. Click your profile (bottom left)
|
|
3. Scroll to "Long-Lived Access Tokens"
|
|
4. Click "Create Token"
|
|
5. Copy the token and paste it in .env
|
|
|
|
#### Step 3: Test the server
|
|
```bash
|
|
cd ~/voice-assistant
|
|
./test_server.sh
|
|
```
|
|
|
|
You should see:
|
|
```
|
|
Loading Whisper model: medium
|
|
Whisper model loaded successfully
|
|
Starting voice processing server on 0.0.0.0:5000
|
|
```
|
|
|
|
#### Step 4: Test with curl (from another terminal)
|
|
```bash
|
|
# Test health endpoint
|
|
curl http://10.1.10.71:5000/health
|
|
|
|
# Should return:
|
|
# {"status":"healthy","whisper_loaded":true,"ha_connected":true}
|
|
```
|
|
|
|
### Phase 2: Maix Duino Setup
|
|
|
|
#### Step 1: Flash MaixPy firmware
|
|
1. Download latest MaixPy firmware from: https://dl.sipeed.com/MAIX/MaixPy/release/
|
|
2. Download Kflash GUI: https://github.com/sipeed/kflash_gui
|
|
3. Connect Maix Duino via USB
|
|
4. Flash firmware using Kflash GUI
|
|
|
|
#### Step 2: Prepare SD card
|
|
```bash
|
|
# Format SD card as FAT32
|
|
# Create directory structure:
|
|
mkdir -p /path/to/sdcard/models
|
|
|
|
# Copy the client script
|
|
cp maix_voice_client.py /path/to/sdcard/main.py
|
|
```
|
|
|
|
#### Step 3: Configure WiFi settings
|
|
Edit `/path/to/sdcard/main.py`:
|
|
```python
|
|
# WiFi Settings
|
|
WIFI_SSID = "YourNetworkName"
|
|
WIFI_PASSWORD = "YourPassword"
|
|
|
|
# Server Settings
|
|
VOICE_SERVER_URL = "http://10.1.10.71:5000"
|
|
```
|
|
|
|
#### Step 4: Test the board
|
|
1. Insert SD card into Maix Duino
|
|
2. Connect to serial console (115200 baud)
|
|
```bash
|
|
screen /dev/ttyUSB0 115200
|
|
# or
|
|
minicom -D /dev/ttyUSB0 -b 115200
|
|
```
|
|
3. Power on the board
|
|
4. Watch the serial output for connection status
|
|
|
|
### Phase 3: Integration & Testing
|
|
|
|
#### Test 1: Basic connectivity
|
|
1. Maix Duino should connect to WiFi and display IP on LCD
|
|
2. Server should show in logs when Maix connects
|
|
|
|
#### Test 2: Audio capture
|
|
The current implementation uses amplitude-based wake word detection as a placeholder. To test:
|
|
1. Clap loudly near the microphone
|
|
2. Speak a command (e.g., "turn on the living room lights")
|
|
3. Watch the LCD for transcription and response
|
|
|
|
#### Test 3: Home Assistant control
|
|
Supported commands (add more in voice_server.py):
|
|
- "Turn on the living room lights"
|
|
- "Turn off the bedroom lights"
|
|
- "What's the temperature?"
|
|
- "Toggle the kitchen lights"
|
|
|
|
### Phase 4: Wake Word Training (Advanced)
|
|
|
|
The placeholder wake word detection uses simple amplitude triggering. For production use:
|
|
|
|
#### Option A: Use Porcupine (easiest)
|
|
1. Sign up at: https://console.picovoice.ai/
|
|
2. Train custom wake word
|
|
3. Download .ppn model
|
|
4. Convert to .kmodel for K210
|
|
|
|
#### Option B: Use Mycroft Precise (FOSS)
|
|
```bash
|
|
# On a machine with GPU
|
|
conda create -n precise python=3.6
|
|
conda activate precise
|
|
pip install precise-runner
|
|
|
|
# Record wake word samples
|
|
precise-collect
|
|
|
|
# Train model
|
|
precise-train -e 60 my-wake-word.net my-wake-word/
|
|
|
|
# Convert to .kmodel
|
|
# (requires additional tools - see MaixPy docs)
|
|
```
|
|
|
|
## Architecture Diagram
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Your Home Network (10.1.10.0/24) │
|
|
│ │
|
|
│ ┌──────────────┐ ┌──────────────┐ │
|
|
│ │ Maix Duino │────────>│ Heimdall │ │
|
|
│ │ 10.1.10.xxx │ Audio │ 10.1.10.71 │ │
|
|
│ │ │<────────│ │ │
|
|
│ │ - Wake Word │ Response│ - Whisper │ │
|
|
│ │ - Mic Input │ │ - Piper TTS │ │
|
|
│ │ - Speaker │ │ - Flask API │ │
|
|
│ └──────────────┘ └──────┬───────┘ │
|
|
│ │ │
|
|
│ │ REST API │
|
|
│ v │
|
|
│ ┌──────────────┐ │
|
|
│ │ Home Asst. │ │
|
|
│ │ homeassistant│ │
|
|
│ │ │ │
|
|
│ │ - Devices │ │
|
|
│ │ - Automation │ │
|
|
│ └──────────────┘ │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Maix Duino won't connect to WiFi
|
|
```python
|
|
# Check serial output for errors
|
|
# Common issues:
|
|
# - Wrong SSID/password
|
|
# - WPA3 not supported (use WPA2)
|
|
# - 5GHz network (use 2.4GHz)
|
|
```
|
|
|
|
### Whisper transcription is slow
|
|
```bash
|
|
# Use a smaller model on Heimdall
|
|
# Edit ~/voice-assistant/config/.env:
|
|
WHISPER_MODEL=base # or tiny for fastest
|
|
```
|
|
|
|
### Home Assistant commands don't work
|
|
```bash
|
|
# Check server logs
|
|
journalctl -u voice-assistant -f
|
|
|
|
# Test HA connection manually
|
|
curl -H "Authorization: Bearer YOUR_TOKEN" \
|
|
http://your-ha:8123/api/states
|
|
```
|
|
|
|
### Audio quality is poor
|
|
1. Check microphone connections
|
|
2. Adjust `SAMPLE_RATE` in maix_voice_client.py
|
|
3. Test with USB microphone first
|
|
4. Consider microphone array for better pickup
|
|
|
|
### Out of memory on Maix Duino
|
|
```python
|
|
# In main_loop(), add more frequent GC:
|
|
if gc.mem_free() < 200000: # Increase threshold
|
|
gc.collect()
|
|
```
|
|
|
|
## Adding New Intents
|
|
|
|
Edit `voice_server.py` and add patterns to `IntentParser.PATTERNS`:
|
|
|
|
```python
|
|
PATTERNS = {
|
|
# Existing patterns...
|
|
|
|
'set_temperature': [
|
|
r'set (?:the )?temperature to (\d+)',
|
|
r'make it (\d+) degrees',
|
|
],
|
|
}
|
|
```
|
|
|
|
Then add the handler in `execute_intent()`:
|
|
|
|
```python
|
|
elif intent == 'set_temperature':
|
|
temp = params.get('temperature')
|
|
success = ha_client.call_service(
|
|
'climate', 'set_temperature',
|
|
entity_id, temperature=temp
|
|
)
|
|
return f"Set temperature to {temp} degrees"
|
|
```
|
|
|
|
## Entity Mapping
|
|
|
|
Add your Home Assistant entities to `IntentParser.ENTITY_MAP`:
|
|
|
|
```python
|
|
ENTITY_MAP = {
|
|
# Lights
|
|
'living room light': 'light.living_room',
|
|
'bedroom light': 'light.bedroom',
|
|
|
|
# Climate
|
|
'thermostat': 'climate.main_floor',
|
|
'temperature': 'sensor.main_floor_temperature',
|
|
|
|
# Switches
|
|
'coffee maker': 'switch.coffee_maker',
|
|
'fan': 'switch.bedroom_fan',
|
|
|
|
# Media
|
|
'tv': 'media_player.living_room_tv',
|
|
'music': 'media_player.whole_house',
|
|
}
|
|
```
|
|
|
|
## Performance Tuning
|
|
|
|
### Reduce latency
|
|
1. Use Whisper `tiny` or `base` model
|
|
2. Implement streaming audio (currently batch)
|
|
3. Pre-load TTS models
|
|
4. Use faster TTS engine (e.g., espeak)
|
|
|
|
### Improve accuracy
|
|
1. Use Whisper `large` model (slower)
|
|
2. Train custom wake word
|
|
3. Add NLU layer (Rasa, spaCy)
|
|
4. Collect and fine-tune on your voice
|
|
|
|
## Next Steps
|
|
|
|
### Short term
|
|
- [ ] Add more Home Assistant entity mappings
|
|
- [ ] Implement Piper TTS playback on Maix Duino
|
|
- [ ] Train custom wake word model
|
|
- [ ] Add LED animations for better feedback
|
|
- [ ] Implement conversation context
|
|
|
|
### Medium term
|
|
- [ ] Multi-room support (multiple Maix Duino units)
|
|
- [ ] Voice profiles for different users
|
|
- [ ] Integration with Plex for media control
|
|
- [ ] Calendar and reminder functionality
|
|
- [ ] Weather updates from local weather station
|
|
|
|
### Long term
|
|
- [ ] Custom skills/plugins system
|
|
- [ ] Integration with other services (Nextcloud, Matrix)
|
|
- [ ] Sound event detection (doorbell, smoke alarm)
|
|
- [ ] Intercom functionality between rooms
|
|
- [ ] Voice-controlled automation creation
|
|
|
|
## Alternatives & Fallbacks
|
|
|
|
If the Maix Duino proves limiting:
|
|
|
|
### Raspberry Pi Zero 2 W
|
|
- More processing power
|
|
- Better software support
|
|
- USB audio support
|
|
- Cost: ~$15
|
|
|
|
### ESP32-S3
|
|
- Better WiFi
|
|
- More RAM (8MB)
|
|
- Cheaper (~$10)
|
|
- Good community support
|
|
|
|
### Orange Pi Zero 2
|
|
- ARM Cortex-A53 quad-core
|
|
- 512MB-1GB RAM
|
|
- Full Linux support
|
|
- Cost: ~$20
|
|
|
|
## Resources
|
|
|
|
### Documentation
|
|
- Maix Duino: https://wiki.sipeed.com/hardware/en/maix/
|
|
- MaixPy: https://maixpy.sipeed.com/
|
|
- Whisper: https://github.com/openai/whisper
|
|
- Piper TTS: https://github.com/rhasspy/piper
|
|
- Home Assistant API: https://developers.home-assistant.io/
|
|
|
|
### Community Projects
|
|
- Rhasspy: https://rhasspy.readthedocs.io/
|
|
- Willow: https://github.com/toverainc/willow
|
|
- Mycroft: https://mycroft.ai/
|
|
|
|
### Wake Word Tools
|
|
- Porcupine: https://picovoice.ai/platform/porcupine/
|
|
- Mycroft Precise: https://github.com/MycroftAI/mycroft-precise
|
|
- Snowboy (archived): https://github.com/Kitt-AI/snowboy
|
|
|
|
## Getting Help
|
|
|
|
### Check logs
|
|
```bash
|
|
# Server logs (if using systemd)
|
|
sudo journalctl -u voice-assistant -f
|
|
|
|
# Or manual log file
|
|
tail -f ~/voice-assistant/logs/voice_assistant.log
|
|
|
|
# Maix Duino serial console
|
|
screen /dev/ttyUSB0 115200
|
|
```
|
|
|
|
### Common issues and solutions
|
|
See the Troubleshooting section above
|
|
|
|
### Useful commands
|
|
```bash
|
|
# Restart service
|
|
sudo systemctl restart voice-assistant
|
|
|
|
# Check service status
|
|
sudo systemctl status voice-assistant
|
|
|
|
# Test HA connection
|
|
curl http://10.1.10.71:5000/health
|
|
|
|
# Monitor Maix Duino
|
|
minicom -D /dev/ttyUSB0 -b 115200
|
|
```
|
|
|
|
## Cost Breakdown
|
|
|
|
| Item | Cost | Status |
|
|
|------|------|--------|
|
|
| Maix Duino | $30 | Have it! |
|
|
| I2S Microphone | $5-10 | Need |
|
|
| Speaker | $10 | Need (or use existing) |
|
|
| MicroSD Card | $5 | Have it? |
|
|
| **Total** | **$15-25** | (vs $50+ commercial) |
|
|
|
|
**Benefits of local solution:**
|
|
- No subscription fees
|
|
- Complete privacy (no cloud)
|
|
- Customizable to your needs
|
|
- Integration with existing infrastructure
|
|
- Learning experience!
|
|
|
|
## Conclusion
|
|
|
|
You now have everything you need to build a local, privacy-focused voice assistant! The setup leverages your existing infrastructure (Heimdall for processing, Home Assistant for automation) while keeping costs minimal.
|
|
|
|
Start with the basic setup, test each component, then iterate and improve. The beauty of this approach is you can enhance it over time without being locked into a commercial platform.
|
|
|
|
Good luck, and enjoy your new voice assistant! 🎙️
|