# Maix Duino Voice Assistant - Quick Start Guide ## Overview This guide will walk you through setting up a local, privacy-focused voice assistant using your Maix Duino board and Home Assistant integration. All processing happens on your local network - no cloud services required. ## What You'll Build - Wake word detection on Maix Duino (edge device) - Speech-to-text using Whisper on Heimdall - Home Assistant integration for smart home control - Text-to-speech responses using Piper - All processing local to your 10.1.10.0/24 network ## Hardware Requirements - [x] Sipeed Maix Duino board (you have this!) - [ ] I2S MEMS microphone (or microphone array) - [ ] Small speaker (3-5W) or audio output - [ ] MicroSD card (4GB+) formatted as FAT32 - [ ] USB-C cable for power and programming ## Network Prerequisites - Maix Duino will need WiFi access to your 10.1.10.0/24 network - Heimdall (10.1.10.71) for AI processing - Home Assistant instance (configure URL in setup) ## Setup Process ### Phase 1: Server Setup (Heimdall) #### Step 1: Run the setup script ```bash # Transfer files to Heimdall scp setup_voice_assistant.sh voice_server.py alan@10.1.10.71:~/ # SSH to Heimdall ssh alan@10.1.10.71 # Make setup script executable and run it chmod +x setup_voice_assistant.sh ./setup_voice_assistant.sh ``` #### Step 2: Configure Home Assistant access ```bash # Edit the config file vim ~/voice-assistant/config/.env ``` Update these values: ```env HA_URL=http://your-home-assistant:8123 HA_TOKEN=your_long_lived_access_token_here ``` To get a long-lived access token: 1. Open Home Assistant 2. Click your profile (bottom left) 3. Scroll to "Long-Lived Access Tokens" 4. Click "Create Token" 5. Copy the token and paste it in .env #### Step 3: Test the server ```bash cd ~/voice-assistant ./test_server.sh ``` You should see: ``` Loading Whisper model: medium Whisper model loaded successfully Starting voice processing server on 0.0.0.0:5000 ``` #### Step 4: Test with curl (from another terminal) ```bash # Test health endpoint curl http://10.1.10.71:5000/health # Should return: # {"status":"healthy","whisper_loaded":true,"ha_connected":true} ``` ### Phase 2: Maix Duino Setup #### Step 1: Flash MaixPy firmware 1. Download latest MaixPy firmware from: https://dl.sipeed.com/MAIX/MaixPy/release/ 2. Download Kflash GUI: https://github.com/sipeed/kflash_gui 3. Connect Maix Duino via USB 4. Flash firmware using Kflash GUI #### Step 2: Prepare SD card ```bash # Format SD card as FAT32 # Create directory structure: mkdir -p /path/to/sdcard/models # Copy the client script cp maix_voice_client.py /path/to/sdcard/main.py ``` #### Step 3: Configure WiFi settings Edit `/path/to/sdcard/main.py`: ```python # WiFi Settings WIFI_SSID = "YourNetworkName" WIFI_PASSWORD = "YourPassword" # Server Settings VOICE_SERVER_URL = "http://10.1.10.71:5000" ``` #### Step 4: Test the board 1. Insert SD card into Maix Duino 2. Connect to serial console (115200 baud) ```bash screen /dev/ttyUSB0 115200 # or minicom -D /dev/ttyUSB0 -b 115200 ``` 3. Power on the board 4. Watch the serial output for connection status ### Phase 3: Integration & Testing #### Test 1: Basic connectivity 1. Maix Duino should connect to WiFi and display IP on LCD 2. Server should show in logs when Maix connects #### Test 2: Audio capture The current implementation uses amplitude-based wake word detection as a placeholder. To test: 1. Clap loudly near the microphone 2. Speak a command (e.g., "turn on the living room lights") 3. Watch the LCD for transcription and response #### Test 3: Home Assistant control Supported commands (add more in voice_server.py): - "Turn on the living room lights" - "Turn off the bedroom lights" - "What's the temperature?" - "Toggle the kitchen lights" ### Phase 4: Wake Word Training (Advanced) The placeholder wake word detection uses simple amplitude triggering. For production use: #### Option A: Use Porcupine (easiest) 1. Sign up at: https://console.picovoice.ai/ 2. Train custom wake word 3. Download .ppn model 4. Convert to .kmodel for K210 #### Option B: Use Mycroft Precise (FOSS) ```bash # On a machine with GPU conda create -n precise python=3.6 conda activate precise pip install precise-runner # Record wake word samples precise-collect # Train model precise-train -e 60 my-wake-word.net my-wake-word/ # Convert to .kmodel # (requires additional tools - see MaixPy docs) ``` ## Architecture Diagram ``` ┌─────────────────────────────────────────────────────────────┐ │ Your Home Network (10.1.10.0/24) │ │ │ │ ┌──────────────┐ ┌──────────────┐ │ │ │ Maix Duino │────────>│ Heimdall │ │ │ │ 10.1.10.xxx │ Audio │ 10.1.10.71 │ │ │ │ │<────────│ │ │ │ │ - Wake Word │ Response│ - Whisper │ │ │ │ - Mic Input │ │ - Piper TTS │ │ │ │ - Speaker │ │ - Flask API │ │ │ └──────────────┘ └──────┬───────┘ │ │ │ │ │ │ REST API │ │ v │ │ ┌──────────────┐ │ │ │ Home Asst. │ │ │ │ homeassistant│ │ │ │ │ │ │ │ - Devices │ │ │ │ - Automation │ │ │ └──────────────┘ │ └─────────────────────────────────────────────────────────────┘ ``` ## Troubleshooting ### Maix Duino won't connect to WiFi ```python # Check serial output for errors # Common issues: # - Wrong SSID/password # - WPA3 not supported (use WPA2) # - 5GHz network (use 2.4GHz) ``` ### Whisper transcription is slow ```bash # Use a smaller model on Heimdall # Edit ~/voice-assistant/config/.env: WHISPER_MODEL=base # or tiny for fastest ``` ### Home Assistant commands don't work ```bash # Check server logs journalctl -u voice-assistant -f # Test HA connection manually curl -H "Authorization: Bearer YOUR_TOKEN" \ http://your-ha:8123/api/states ``` ### Audio quality is poor 1. Check microphone connections 2. Adjust `SAMPLE_RATE` in maix_voice_client.py 3. Test with USB microphone first 4. Consider microphone array for better pickup ### Out of memory on Maix Duino ```python # In main_loop(), add more frequent GC: if gc.mem_free() < 200000: # Increase threshold gc.collect() ``` ## Adding New Intents Edit `voice_server.py` and add patterns to `IntentParser.PATTERNS`: ```python PATTERNS = { # Existing patterns... 'set_temperature': [ r'set (?:the )?temperature to (\d+)', r'make it (\d+) degrees', ], } ``` Then add the handler in `execute_intent()`: ```python elif intent == 'set_temperature': temp = params.get('temperature') success = ha_client.call_service( 'climate', 'set_temperature', entity_id, temperature=temp ) return f"Set temperature to {temp} degrees" ``` ## Entity Mapping Add your Home Assistant entities to `IntentParser.ENTITY_MAP`: ```python ENTITY_MAP = { # Lights 'living room light': 'light.living_room', 'bedroom light': 'light.bedroom', # Climate 'thermostat': 'climate.main_floor', 'temperature': 'sensor.main_floor_temperature', # Switches 'coffee maker': 'switch.coffee_maker', 'fan': 'switch.bedroom_fan', # Media 'tv': 'media_player.living_room_tv', 'music': 'media_player.whole_house', } ``` ## Performance Tuning ### Reduce latency 1. Use Whisper `tiny` or `base` model 2. Implement streaming audio (currently batch) 3. Pre-load TTS models 4. Use faster TTS engine (e.g., espeak) ### Improve accuracy 1. Use Whisper `large` model (slower) 2. Train custom wake word 3. Add NLU layer (Rasa, spaCy) 4. Collect and fine-tune on your voice ## Next Steps ### Short term - [ ] Add more Home Assistant entity mappings - [ ] Implement Piper TTS playback on Maix Duino - [ ] Train custom wake word model - [ ] Add LED animations for better feedback - [ ] Implement conversation context ### Medium term - [ ] Multi-room support (multiple Maix Duino units) - [ ] Voice profiles for different users - [ ] Integration with Plex for media control - [ ] Calendar and reminder functionality - [ ] Weather updates from local weather station ### Long term - [ ] Custom skills/plugins system - [ ] Integration with other services (Nextcloud, Matrix) - [ ] Sound event detection (doorbell, smoke alarm) - [ ] Intercom functionality between rooms - [ ] Voice-controlled automation creation ## Alternatives & Fallbacks If the Maix Duino proves limiting: ### Raspberry Pi Zero 2 W - More processing power - Better software support - USB audio support - Cost: ~$15 ### ESP32-S3 - Better WiFi - More RAM (8MB) - Cheaper (~$10) - Good community support ### Orange Pi Zero 2 - ARM Cortex-A53 quad-core - 512MB-1GB RAM - Full Linux support - Cost: ~$20 ## Resources ### Documentation - Maix Duino: https://wiki.sipeed.com/hardware/en/maix/ - MaixPy: https://maixpy.sipeed.com/ - Whisper: https://github.com/openai/whisper - Piper TTS: https://github.com/rhasspy/piper - Home Assistant API: https://developers.home-assistant.io/ ### Community Projects - Rhasspy: https://rhasspy.readthedocs.io/ - Willow: https://github.com/toverainc/willow - Mycroft: https://mycroft.ai/ ### Wake Word Tools - Porcupine: https://picovoice.ai/platform/porcupine/ - Mycroft Precise: https://github.com/MycroftAI/mycroft-precise - Snowboy (archived): https://github.com/Kitt-AI/snowboy ## Getting Help ### Check logs ```bash # Server logs (if using systemd) sudo journalctl -u voice-assistant -f # Or manual log file tail -f ~/voice-assistant/logs/voice_assistant.log # Maix Duino serial console screen /dev/ttyUSB0 115200 ``` ### Common issues and solutions See the Troubleshooting section above ### Useful commands ```bash # Restart service sudo systemctl restart voice-assistant # Check service status sudo systemctl status voice-assistant # Test HA connection curl http://10.1.10.71:5000/health # Monitor Maix Duino minicom -D /dev/ttyUSB0 -b 115200 ``` ## Cost Breakdown | Item | Cost | Status | |------|------|--------| | Maix Duino | $30 | Have it! | | I2S Microphone | $5-10 | Need | | Speaker | $10 | Need (or use existing) | | MicroSD Card | $5 | Have it? | | **Total** | **$15-25** | (vs $50+ commercial) | **Benefits of local solution:** - No subscription fees - Complete privacy (no cloud) - Customizable to your needs - Integration with existing infrastructure - Learning experience! ## Conclusion You now have everything you need to build a local, privacy-focused voice assistant! The setup leverages your existing infrastructure (Heimdall for processing, Home Assistant for automation) while keeping costs minimal. Start with the basic setup, test each component, then iterate and improve. The beauty of this approach is you can enhance it over time without being locked into a commercial platform. Good luck, and enjoy your new voice assistant! 🎙️