minerva/docs/QUICKSTART.md
pyr0ball 173f7f37d4 feat: import mycroft-precise work as Minerva foundation
Ports prior voice assistant research and prototypes from devl/Devops
into the Minerva repo. Includes:

- docs/: architecture, wake word guides, ESP32-S3 spec, hardware buying guide
- scripts/: voice_server.py, voice_server_enhanced.py, setup scripts
- hardware/maixduino/: edge device scripts with WiFi credentials scrubbed
  (replaced hardcoded password with secrets.py pattern)
- config/.env.example: server config template
- .gitignore: excludes .env, secrets.py, model blobs, ELF firmware
- CLAUDE.md: Minerva product context and connection to cf-voice roadmap
2026-04-06 22:21:12 -07:00

12 KiB
Executable file

Maix Duino Voice Assistant - Quick Start Guide

Overview

This guide will walk you through setting up a local, privacy-focused voice assistant using your Maix Duino board and Home Assistant integration. All processing happens on your local network - no cloud services required.

What You'll Build

  • Wake word detection on Maix Duino (edge device)
  • Speech-to-text using Whisper on Heimdall
  • Home Assistant integration for smart home control
  • Text-to-speech responses using Piper
  • All processing local to your 10.1.10.0/24 network

Hardware Requirements

  • Sipeed Maix Duino board (you have this!)
  • I2S MEMS microphone (or microphone array)
  • Small speaker (3-5W) or audio output
  • MicroSD card (4GB+) formatted as FAT32
  • USB-C cable for power and programming

Network Prerequisites

  • Maix Duino will need WiFi access to your 10.1.10.0/24 network
  • Heimdall (10.1.10.71) for AI processing
  • Home Assistant instance (configure URL in setup)

Setup Process

Phase 1: Server Setup (Heimdall)

Step 1: Run the setup script

# Transfer files to Heimdall
scp setup_voice_assistant.sh voice_server.py alan@10.1.10.71:~/

# SSH to Heimdall
ssh alan@10.1.10.71

# Make setup script executable and run it
chmod +x setup_voice_assistant.sh
./setup_voice_assistant.sh

Step 2: Configure Home Assistant access

# Edit the config file
vim ~/voice-assistant/config/.env

Update these values:

HA_URL=http://your-home-assistant:8123
HA_TOKEN=your_long_lived_access_token_here

To get a long-lived access token:

  1. Open Home Assistant
  2. Click your profile (bottom left)
  3. Scroll to "Long-Lived Access Tokens"
  4. Click "Create Token"
  5. Copy the token and paste it in .env

Step 3: Test the server

cd ~/voice-assistant
./test_server.sh

You should see:

Loading Whisper model: medium
Whisper model loaded successfully
Starting voice processing server on 0.0.0.0:5000

Step 4: Test with curl (from another terminal)

# Test health endpoint
curl http://10.1.10.71:5000/health

# Should return:
# {"status":"healthy","whisper_loaded":true,"ha_connected":true}

Phase 2: Maix Duino Setup

Step 1: Flash MaixPy firmware

  1. Download latest MaixPy firmware from: https://dl.sipeed.com/MAIX/MaixPy/release/
  2. Download Kflash GUI: https://github.com/sipeed/kflash_gui
  3. Connect Maix Duino via USB
  4. Flash firmware using Kflash GUI

Step 2: Prepare SD card

# Format SD card as FAT32
# Create directory structure:
mkdir -p /path/to/sdcard/models

# Copy the client script
cp maix_voice_client.py /path/to/sdcard/main.py

Step 3: Configure WiFi settings

Edit /path/to/sdcard/main.py:

# WiFi Settings
WIFI_SSID = "YourNetworkName"
WIFI_PASSWORD = "YourPassword"

# Server Settings
VOICE_SERVER_URL = "http://10.1.10.71:5000"

Step 4: Test the board

  1. Insert SD card into Maix Duino
  2. Connect to serial console (115200 baud)
    screen /dev/ttyUSB0 115200
    # or
    minicom -D /dev/ttyUSB0 -b 115200
    
  3. Power on the board
  4. Watch the serial output for connection status

Phase 3: Integration & Testing

Test 1: Basic connectivity

  1. Maix Duino should connect to WiFi and display IP on LCD
  2. Server should show in logs when Maix connects

Test 2: Audio capture

The current implementation uses amplitude-based wake word detection as a placeholder. To test:

  1. Clap loudly near the microphone
  2. Speak a command (e.g., "turn on the living room lights")
  3. Watch the LCD for transcription and response

Test 3: Home Assistant control

Supported commands (add more in voice_server.py):

  • "Turn on the living room lights"
  • "Turn off the bedroom lights"
  • "What's the temperature?"
  • "Toggle the kitchen lights"

Phase 4: Wake Word Training (Advanced)

The placeholder wake word detection uses simple amplitude triggering. For production use:

Option A: Use Porcupine (easiest)

  1. Sign up at: https://console.picovoice.ai/
  2. Train custom wake word
  3. Download .ppn model
  4. Convert to .kmodel for K210

Option B: Use Mycroft Precise (FOSS)

# On a machine with GPU
conda create -n precise python=3.6
conda activate precise
pip install precise-runner

# Record wake word samples
precise-collect

# Train model
precise-train -e 60 my-wake-word.net my-wake-word/

# Convert to .kmodel
# (requires additional tools - see MaixPy docs)

Architecture Diagram

┌─────────────────────────────────────────────────────────────┐
│                     Your Home Network (10.1.10.0/24)        │
│                                                              │
│  ┌──────────────┐         ┌──────────────┐                 │
│  │  Maix Duino  │────────>│  Heimdall    │                 │
│  │  10.1.10.xxx │ Audio   │  10.1.10.71  │                 │
│  │              │<────────│              │                 │
│  │ - Wake Word  │ Response│ - Whisper    │                 │
│  │ - Mic Input  │         │ - Piper TTS  │                 │
│  │ - Speaker    │         │ - Flask API  │                 │
│  └──────────────┘         └──────┬───────┘                 │
│                                   │                          │
│                                   │ REST API                │
│                                   v                          │
│                          ┌──────────────┐                   │
│                          │ Home Asst.   │                   │
│                          │ homeassistant│                   │
│                          │              │                   │
│                          │ - Devices    │                   │
│                          │ - Automation │                   │
│                          └──────────────┘                   │
└─────────────────────────────────────────────────────────────┘

Troubleshooting

Maix Duino won't connect to WiFi

# Check serial output for errors
# Common issues:
# - Wrong SSID/password
# - WPA3 not supported (use WPA2)
# - 5GHz network (use 2.4GHz)

Whisper transcription is slow

# Use a smaller model on Heimdall
# Edit ~/voice-assistant/config/.env:
WHISPER_MODEL=base  # or tiny for fastest

Home Assistant commands don't work

# Check server logs
journalctl -u voice-assistant -f

# Test HA connection manually
curl -H "Authorization: Bearer YOUR_TOKEN" \
     http://your-ha:8123/api/states

Audio quality is poor

  1. Check microphone connections
  2. Adjust SAMPLE_RATE in maix_voice_client.py
  3. Test with USB microphone first
  4. Consider microphone array for better pickup

Out of memory on Maix Duino

# In main_loop(), add more frequent GC:
if gc.mem_free() < 200000:  # Increase threshold
    gc.collect()

Adding New Intents

Edit voice_server.py and add patterns to IntentParser.PATTERNS:

PATTERNS = {
    # Existing patterns...
    
    'set_temperature': [
        r'set (?:the )?temperature to (\d+)',
        r'make it (\d+) degrees',
    ],
}

Then add the handler in execute_intent():

elif intent == 'set_temperature':
    temp = params.get('temperature')
    success = ha_client.call_service(
        'climate', 'set_temperature',
        entity_id, temperature=temp
    )
    return f"Set temperature to {temp} degrees"

Entity Mapping

Add your Home Assistant entities to IntentParser.ENTITY_MAP:

ENTITY_MAP = {
    # Lights
    'living room light': 'light.living_room',
    'bedroom light': 'light.bedroom',
    
    # Climate
    'thermostat': 'climate.main_floor',
    'temperature': 'sensor.main_floor_temperature',
    
    # Switches
    'coffee maker': 'switch.coffee_maker',
    'fan': 'switch.bedroom_fan',
    
    # Media
    'tv': 'media_player.living_room_tv',
    'music': 'media_player.whole_house',
}

Performance Tuning

Reduce latency

  1. Use Whisper tiny or base model
  2. Implement streaming audio (currently batch)
  3. Pre-load TTS models
  4. Use faster TTS engine (e.g., espeak)

Improve accuracy

  1. Use Whisper large model (slower)
  2. Train custom wake word
  3. Add NLU layer (Rasa, spaCy)
  4. Collect and fine-tune on your voice

Next Steps

Short term

  • Add more Home Assistant entity mappings
  • Implement Piper TTS playback on Maix Duino
  • Train custom wake word model
  • Add LED animations for better feedback
  • Implement conversation context

Medium term

  • Multi-room support (multiple Maix Duino units)
  • Voice profiles for different users
  • Integration with Plex for media control
  • Calendar and reminder functionality
  • Weather updates from local weather station

Long term

  • Custom skills/plugins system
  • Integration with other services (Nextcloud, Matrix)
  • Sound event detection (doorbell, smoke alarm)
  • Intercom functionality between rooms
  • Voice-controlled automation creation

Alternatives & Fallbacks

If the Maix Duino proves limiting:

Raspberry Pi Zero 2 W

  • More processing power
  • Better software support
  • USB audio support
  • Cost: ~$15

ESP32-S3

  • Better WiFi
  • More RAM (8MB)
  • Cheaper (~$10)
  • Good community support

Orange Pi Zero 2

  • ARM Cortex-A53 quad-core
  • 512MB-1GB RAM
  • Full Linux support
  • Cost: ~$20

Resources

Documentation

Community Projects

Wake Word Tools

Getting Help

Check logs

# Server logs (if using systemd)
sudo journalctl -u voice-assistant -f

# Or manual log file
tail -f ~/voice-assistant/logs/voice_assistant.log

# Maix Duino serial console
screen /dev/ttyUSB0 115200

Common issues and solutions

See the Troubleshooting section above

Useful commands

# Restart service
sudo systemctl restart voice-assistant

# Check service status
sudo systemctl status voice-assistant

# Test HA connection
curl http://10.1.10.71:5000/health

# Monitor Maix Duino
minicom -D /dev/ttyUSB0 -b 115200

Cost Breakdown

Item Cost Status
Maix Duino $30 Have it!
I2S Microphone $5-10 Need
Speaker $10 Need (or use existing)
MicroSD Card $5 Have it?
Total $15-25 (vs $50+ commercial)

Benefits of local solution:

  • No subscription fees
  • Complete privacy (no cloud)
  • Customizable to your needs
  • Integration with existing infrastructure
  • Learning experience!

Conclusion

You now have everything you need to build a local, privacy-focused voice assistant! The setup leverages your existing infrastructure (Heimdall for processing, Home Assistant for automation) while keeping costs minimal.

Start with the basic setup, test each component, then iterate and improve. The beauty of this approach is you can enhance it over time without being locked into a commercial platform.

Good luck, and enjoy your new voice assistant! 🎙️