Ports prior voice assistant research and prototypes from devl/Devops into the Minerva repo. Includes: - docs/: architecture, wake word guides, ESP32-S3 spec, hardware buying guide - scripts/: voice_server.py, voice_server_enhanced.py, setup scripts - hardware/maixduino/: edge device scripts with WiFi credentials scrubbed (replaced hardcoded password with secrets.py pattern) - config/.env.example: server config template - .gitignore: excludes .env, secrets.py, model blobs, ELF firmware - CLAUDE.md: Minerva product context and connection to cf-voice roadmap
12 KiB
Executable file
Maix Duino Voice Assistant - Quick Start Guide
Overview
This guide will walk you through setting up a local, privacy-focused voice assistant using your Maix Duino board and Home Assistant integration. All processing happens on your local network - no cloud services required.
What You'll Build
- Wake word detection on Maix Duino (edge device)
- Speech-to-text using Whisper on Heimdall
- Home Assistant integration for smart home control
- Text-to-speech responses using Piper
- All processing local to your 10.1.10.0/24 network
Hardware Requirements
- Sipeed Maix Duino board (you have this!)
- I2S MEMS microphone (or microphone array)
- Small speaker (3-5W) or audio output
- MicroSD card (4GB+) formatted as FAT32
- USB-C cable for power and programming
Network Prerequisites
- Maix Duino will need WiFi access to your 10.1.10.0/24 network
- Heimdall (10.1.10.71) for AI processing
- Home Assistant instance (configure URL in setup)
Setup Process
Phase 1: Server Setup (Heimdall)
Step 1: Run the setup script
# Transfer files to Heimdall
scp setup_voice_assistant.sh voice_server.py alan@10.1.10.71:~/
# SSH to Heimdall
ssh alan@10.1.10.71
# Make setup script executable and run it
chmod +x setup_voice_assistant.sh
./setup_voice_assistant.sh
Step 2: Configure Home Assistant access
# Edit the config file
vim ~/voice-assistant/config/.env
Update these values:
HA_URL=http://your-home-assistant:8123
HA_TOKEN=your_long_lived_access_token_here
To get a long-lived access token:
- Open Home Assistant
- Click your profile (bottom left)
- Scroll to "Long-Lived Access Tokens"
- Click "Create Token"
- Copy the token and paste it in .env
Step 3: Test the server
cd ~/voice-assistant
./test_server.sh
You should see:
Loading Whisper model: medium
Whisper model loaded successfully
Starting voice processing server on 0.0.0.0:5000
Step 4: Test with curl (from another terminal)
# Test health endpoint
curl http://10.1.10.71:5000/health
# Should return:
# {"status":"healthy","whisper_loaded":true,"ha_connected":true}
Phase 2: Maix Duino Setup
Step 1: Flash MaixPy firmware
- Download latest MaixPy firmware from: https://dl.sipeed.com/MAIX/MaixPy/release/
- Download Kflash GUI: https://github.com/sipeed/kflash_gui
- Connect Maix Duino via USB
- Flash firmware using Kflash GUI
Step 2: Prepare SD card
# Format SD card as FAT32
# Create directory structure:
mkdir -p /path/to/sdcard/models
# Copy the client script
cp maix_voice_client.py /path/to/sdcard/main.py
Step 3: Configure WiFi settings
Edit /path/to/sdcard/main.py:
# WiFi Settings
WIFI_SSID = "YourNetworkName"
WIFI_PASSWORD = "YourPassword"
# Server Settings
VOICE_SERVER_URL = "http://10.1.10.71:5000"
Step 4: Test the board
- Insert SD card into Maix Duino
- Connect to serial console (115200 baud)
screen /dev/ttyUSB0 115200 # or minicom -D /dev/ttyUSB0 -b 115200 - Power on the board
- Watch the serial output for connection status
Phase 3: Integration & Testing
Test 1: Basic connectivity
- Maix Duino should connect to WiFi and display IP on LCD
- Server should show in logs when Maix connects
Test 2: Audio capture
The current implementation uses amplitude-based wake word detection as a placeholder. To test:
- Clap loudly near the microphone
- Speak a command (e.g., "turn on the living room lights")
- Watch the LCD for transcription and response
Test 3: Home Assistant control
Supported commands (add more in voice_server.py):
- "Turn on the living room lights"
- "Turn off the bedroom lights"
- "What's the temperature?"
- "Toggle the kitchen lights"
Phase 4: Wake Word Training (Advanced)
The placeholder wake word detection uses simple amplitude triggering. For production use:
Option A: Use Porcupine (easiest)
- Sign up at: https://console.picovoice.ai/
- Train custom wake word
- Download .ppn model
- Convert to .kmodel for K210
Option B: Use Mycroft Precise (FOSS)
# On a machine with GPU
conda create -n precise python=3.6
conda activate precise
pip install precise-runner
# Record wake word samples
precise-collect
# Train model
precise-train -e 60 my-wake-word.net my-wake-word/
# Convert to .kmodel
# (requires additional tools - see MaixPy docs)
Architecture Diagram
┌─────────────────────────────────────────────────────────────┐
│ Your Home Network (10.1.10.0/24) │
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Maix Duino │────────>│ Heimdall │ │
│ │ 10.1.10.xxx │ Audio │ 10.1.10.71 │ │
│ │ │<────────│ │ │
│ │ - Wake Word │ Response│ - Whisper │ │
│ │ - Mic Input │ │ - Piper TTS │ │
│ │ - Speaker │ │ - Flask API │ │
│ └──────────────┘ └──────┬───────┘ │
│ │ │
│ │ REST API │
│ v │
│ ┌──────────────┐ │
│ │ Home Asst. │ │
│ │ homeassistant│ │
│ │ │ │
│ │ - Devices │ │
│ │ - Automation │ │
│ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
Troubleshooting
Maix Duino won't connect to WiFi
# Check serial output for errors
# Common issues:
# - Wrong SSID/password
# - WPA3 not supported (use WPA2)
# - 5GHz network (use 2.4GHz)
Whisper transcription is slow
# Use a smaller model on Heimdall
# Edit ~/voice-assistant/config/.env:
WHISPER_MODEL=base # or tiny for fastest
Home Assistant commands don't work
# Check server logs
journalctl -u voice-assistant -f
# Test HA connection manually
curl -H "Authorization: Bearer YOUR_TOKEN" \
http://your-ha:8123/api/states
Audio quality is poor
- Check microphone connections
- Adjust
SAMPLE_RATEin maix_voice_client.py - Test with USB microphone first
- Consider microphone array for better pickup
Out of memory on Maix Duino
# In main_loop(), add more frequent GC:
if gc.mem_free() < 200000: # Increase threshold
gc.collect()
Adding New Intents
Edit voice_server.py and add patterns to IntentParser.PATTERNS:
PATTERNS = {
# Existing patterns...
'set_temperature': [
r'set (?:the )?temperature to (\d+)',
r'make it (\d+) degrees',
],
}
Then add the handler in execute_intent():
elif intent == 'set_temperature':
temp = params.get('temperature')
success = ha_client.call_service(
'climate', 'set_temperature',
entity_id, temperature=temp
)
return f"Set temperature to {temp} degrees"
Entity Mapping
Add your Home Assistant entities to IntentParser.ENTITY_MAP:
ENTITY_MAP = {
# Lights
'living room light': 'light.living_room',
'bedroom light': 'light.bedroom',
# Climate
'thermostat': 'climate.main_floor',
'temperature': 'sensor.main_floor_temperature',
# Switches
'coffee maker': 'switch.coffee_maker',
'fan': 'switch.bedroom_fan',
# Media
'tv': 'media_player.living_room_tv',
'music': 'media_player.whole_house',
}
Performance Tuning
Reduce latency
- Use Whisper
tinyorbasemodel - Implement streaming audio (currently batch)
- Pre-load TTS models
- Use faster TTS engine (e.g., espeak)
Improve accuracy
- Use Whisper
largemodel (slower) - Train custom wake word
- Add NLU layer (Rasa, spaCy)
- Collect and fine-tune on your voice
Next Steps
Short term
- Add more Home Assistant entity mappings
- Implement Piper TTS playback on Maix Duino
- Train custom wake word model
- Add LED animations for better feedback
- Implement conversation context
Medium term
- Multi-room support (multiple Maix Duino units)
- Voice profiles for different users
- Integration with Plex for media control
- Calendar and reminder functionality
- Weather updates from local weather station
Long term
- Custom skills/plugins system
- Integration with other services (Nextcloud, Matrix)
- Sound event detection (doorbell, smoke alarm)
- Intercom functionality between rooms
- Voice-controlled automation creation
Alternatives & Fallbacks
If the Maix Duino proves limiting:
Raspberry Pi Zero 2 W
- More processing power
- Better software support
- USB audio support
- Cost: ~$15
ESP32-S3
- Better WiFi
- More RAM (8MB)
- Cheaper (~$10)
- Good community support
Orange Pi Zero 2
- ARM Cortex-A53 quad-core
- 512MB-1GB RAM
- Full Linux support
- Cost: ~$20
Resources
Documentation
- Maix Duino: https://wiki.sipeed.com/hardware/en/maix/
- MaixPy: https://maixpy.sipeed.com/
- Whisper: https://github.com/openai/whisper
- Piper TTS: https://github.com/rhasspy/piper
- Home Assistant API: https://developers.home-assistant.io/
Community Projects
- Rhasspy: https://rhasspy.readthedocs.io/
- Willow: https://github.com/toverainc/willow
- Mycroft: https://mycroft.ai/
Wake Word Tools
- Porcupine: https://picovoice.ai/platform/porcupine/
- Mycroft Precise: https://github.com/MycroftAI/mycroft-precise
- Snowboy (archived): https://github.com/Kitt-AI/snowboy
Getting Help
Check logs
# Server logs (if using systemd)
sudo journalctl -u voice-assistant -f
# Or manual log file
tail -f ~/voice-assistant/logs/voice_assistant.log
# Maix Duino serial console
screen /dev/ttyUSB0 115200
Common issues and solutions
See the Troubleshooting section above
Useful commands
# Restart service
sudo systemctl restart voice-assistant
# Check service status
sudo systemctl status voice-assistant
# Test HA connection
curl http://10.1.10.71:5000/health
# Monitor Maix Duino
minicom -D /dev/ttyUSB0 -b 115200
Cost Breakdown
| Item | Cost | Status |
|---|---|---|
| Maix Duino | $30 | Have it! |
| I2S Microphone | $5-10 | Need |
| Speaker | $10 | Need (or use existing) |
| MicroSD Card | $5 | Have it? |
| Total | $15-25 | (vs $50+ commercial) |
Benefits of local solution:
- No subscription fees
- Complete privacy (no cloud)
- Customizable to your needs
- Integration with existing infrastructure
- Learning experience!
Conclusion
You now have everything you need to build a local, privacy-focused voice assistant! The setup leverages your existing infrastructure (Heimdall for processing, Home Assistant for automation) while keeping costs minimal.
Start with the basic setup, test each component, then iterate and improve. The beauty of this approach is you can enhance it over time without being locked into a commercial platform.
Good luck, and enjoy your new voice assistant! 🎙️