pyr0ball 173f7f37d4 feat: import mycroft-precise work as Minerva foundation

Ports prior voice assistant research and prototypes from devl/Devops
into the Minerva repo. Includes:

- docs/: architecture, wake word guides, ESP32-S3 spec, hardware buying guide
- scripts/: voice_server.py, voice_server_enhanced.py, setup scripts
- hardware/maixduino/: edge device scripts with WiFi credentials scrubbed
  (replaced hardcoded password with secrets.py pattern)
- config/.env.example: server config template
- .gitignore: excludes .env, secrets.py, model blobs, ELF firmware
- CLAUDE.md: Minerva product context and connection to cf-voice roadmap

2026-04-06 22:21:12 -07:00

12 KiB

Executable file

Raw Permalink Blame History

Maix Duino Voice Assistant - Quick Start Guide

Overview

This guide will walk you through setting up a local, privacy-focused voice assistant using your Maix Duino board and Home Assistant integration. All processing happens on your local network - no cloud services required.

What You'll Build

Wake word detection on Maix Duino (edge device)
Speech-to-text using Whisper on Heimdall
Home Assistant integration for smart home control
Text-to-speech responses using Piper
All processing local to your 10.1.10.0/24 network

Hardware Requirements

Sipeed Maix Duino board (you have this!)
I2S MEMS microphone (or microphone array)
Small speaker (3-5W) or audio output
MicroSD card (4GB+) formatted as FAT32
USB-C cable for power and programming

Network Prerequisites

Maix Duino will need WiFi access to your 10.1.10.0/24 network
Heimdall (10.1.10.71) for AI processing
Home Assistant instance (configure URL in setup)

Setup Process

Phase 1: Server Setup (Heimdall)

Step 1: Run the setup script

# Transfer files to Heimdall
scp setup_voice_assistant.sh voice_server.py alan@10.1.10.71:~/

# SSH to Heimdall
ssh alan@10.1.10.71

# Make setup script executable and run it
chmod +x setup_voice_assistant.sh
./setup_voice_assistant.sh

Step 2: Configure Home Assistant access

# Edit the config file
vim ~/voice-assistant/config/.env

Update these values:

HA_URL=http://your-home-assistant:8123
HA_TOKEN=your_long_lived_access_token_here

To get a long-lived access token:

Open Home Assistant
Click your profile (bottom left)
Scroll to "Long-Lived Access Tokens"
Click "Create Token"
Copy the token and paste it in .env

Step 3: Test the server

cd ~/voice-assistant
./test_server.sh

You should see:

Loading Whisper model: medium
Whisper model loaded successfully
Starting voice processing server on 0.0.0.0:5000

Step 4: Test with curl (from another terminal)

# Test health endpoint
curl http://10.1.10.71:5000/health

# Should return:
# {"status":"healthy","whisper_loaded":true,"ha_connected":true}

Phase 2: Maix Duino Setup

Step 1: Flash MaixPy firmware

Download latest MaixPy firmware from: https://dl.sipeed.com/MAIX/MaixPy/release/
Download Kflash GUI: https://github.com/sipeed/kflash_gui
Connect Maix Duino via USB
Flash firmware using Kflash GUI

Step 2: Prepare SD card

# Format SD card as FAT32
# Create directory structure:
mkdir -p /path/to/sdcard/models

# Copy the client script
cp maix_voice_client.py /path/to/sdcard/main.py

Step 3: Configure WiFi settings

Edit /path/to/sdcard/main.py:

# WiFi Settings
WIFI_SSID = "YourNetworkName"
WIFI_PASSWORD = "YourPassword"

# Server Settings
VOICE_SERVER_URL = "http://10.1.10.71:5000"

Step 4: Test the board

Insert SD card into Maix Duino

Connect to serial console (115200 baud)

screen /dev/ttyUSB0 115200
# or
minicom -D /dev/ttyUSB0 -b 115200

Power on the board
Watch the serial output for connection status

Phase 3: Integration & Testing

Test 1: Basic connectivity

Maix Duino should connect to WiFi and display IP on LCD
Server should show in logs when Maix connects

Test 2: Audio capture

The current implementation uses amplitude-based wake word detection as a placeholder. To test:

Clap loudly near the microphone
Speak a command (e.g., "turn on the living room lights")
Watch the LCD for transcription and response

Test 3: Home Assistant control

Supported commands (add more in voice_server.py):

"Turn on the living room lights"
"Turn off the bedroom lights"
"What's the temperature?"
"Toggle the kitchen lights"

Phase 4: Wake Word Training (Advanced)

The placeholder wake word detection uses simple amplitude triggering. For production use:

Option A: Use Porcupine (easiest)

Sign up at: https://console.picovoice.ai/
Train custom wake word
Download .ppn model
Convert to .kmodel for K210

Option B: Use Mycroft Precise (FOSS)

# On a machine with GPU
conda create -n precise python=3.6
conda activate precise
pip install precise-runner

# Record wake word samples
precise-collect

# Train model
precise-train -e 60 my-wake-word.net my-wake-word/

# Convert to .kmodel
# (requires additional tools - see MaixPy docs)

Architecture Diagram

┌─────────────────────────────────────────────────────────────┐
│                     Your Home Network (10.1.10.0/24)        │
│                                                              │
│  ┌──────────────┐         ┌──────────────┐                 │
│  │  Maix Duino  │────────>│  Heimdall    │                 │
│  │  10.1.10.xxx │ Audio   │  10.1.10.71  │                 │
│  │              │<────────│              │                 │
│  │ - Wake Word  │ Response│ - Whisper    │                 │
│  │ - Mic Input  │         │ - Piper TTS  │                 │
│  │ - Speaker    │         │ - Flask API  │                 │
│  └──────────────┘         └──────┬───────┘                 │
│                                   │                          │
│                                   │ REST API                │
│                                   v                          │
│                          ┌──────────────┐                   │
│                          │ Home Asst.   │                   │
│                          │ homeassistant│                   │
│                          │              │                   │
│                          │ - Devices    │                   │
│                          │ - Automation │                   │
│                          └──────────────┘                   │
└─────────────────────────────────────────────────────────────┘

Troubleshooting

Maix Duino won't connect to WiFi

# Check serial output for errors
# Common issues:
# - Wrong SSID/password
# - WPA3 not supported (use WPA2)
# - 5GHz network (use 2.4GHz)

Whisper transcription is slow

# Use a smaller model on Heimdall
# Edit ~/voice-assistant/config/.env:
WHISPER_MODEL=base  # or tiny for fastest

Home Assistant commands don't work

# Check server logs
journalctl -u voice-assistant -f

# Test HA connection manually
curl -H "Authorization: Bearer YOUR_TOKEN" \
     http://your-ha:8123/api/states

Audio quality is poor

Check microphone connections
Adjust SAMPLE_RATE in maix_voice_client.py
Test with USB microphone first
Consider microphone array for better pickup

Out of memory on Maix Duino

# In main_loop(), add more frequent GC:
if gc.mem_free() < 200000:  # Increase threshold
    gc.collect()

Adding New Intents

Edit voice_server.py and add patterns to IntentParser.PATTERNS:

PATTERNS = {
    # Existing patterns...
    
    'set_temperature': [
        r'set (?:the )?temperature to (\d+)',
        r'make it (\d+) degrees',
    ],
}

Then add the handler in execute_intent():

elif intent == 'set_temperature':
    temp = params.get('temperature')
    success = ha_client.call_service(
        'climate', 'set_temperature',
        entity_id, temperature=temp
    )
    return f"Set temperature to {temp} degrees"

Entity Mapping

Add your Home Assistant entities to IntentParser.ENTITY_MAP:

ENTITY_MAP = {
    # Lights
    'living room light': 'light.living_room',
    'bedroom light': 'light.bedroom',
    
    # Climate
    'thermostat': 'climate.main_floor',
    'temperature': 'sensor.main_floor_temperature',
    
    # Switches
    'coffee maker': 'switch.coffee_maker',
    'fan': 'switch.bedroom_fan',
    
    # Media
    'tv': 'media_player.living_room_tv',
    'music': 'media_player.whole_house',
}

Performance Tuning

Reduce latency

Use Whisper tiny or base model
Implement streaming audio (currently batch)
Pre-load TTS models
Use faster TTS engine (e.g., espeak)

Improve accuracy

Use Whisper large model (slower)
Train custom wake word
Add NLU layer (Rasa, spaCy)
Collect and fine-tune on your voice

Next Steps

Short term

Add more Home Assistant entity mappings
Implement Piper TTS playback on Maix Duino
Train custom wake word model
Add LED animations for better feedback
Implement conversation context

Medium term

Multi-room support (multiple Maix Duino units)
Voice profiles for different users
Integration with Plex for media control
Calendar and reminder functionality
Weather updates from local weather station

Long term

Custom skills/plugins system
Integration with other services (Nextcloud, Matrix)
Sound event detection (doorbell, smoke alarm)
Intercom functionality between rooms
Voice-controlled automation creation

Alternatives & Fallbacks

If the Maix Duino proves limiting:

Raspberry Pi Zero 2 W

More processing power
Better software support
USB audio support
Cost: ~$15

ESP32-S3

Better WiFi
More RAM (8MB)
Cheaper (~$10)
Good community support

Orange Pi Zero 2

ARM Cortex-A53 quad-core
512MB-1GB RAM
Full Linux support
Cost: ~$20

Resources

Getting Help

Check logs

# Server logs (if using systemd)
sudo journalctl -u voice-assistant -f

# Or manual log file
tail -f ~/voice-assistant/logs/voice_assistant.log

# Maix Duino serial console
screen /dev/ttyUSB0 115200

Common issues and solutions

See the Troubleshooting section above

Useful commands

# Restart service
sudo systemctl restart voice-assistant

# Check service status
sudo systemctl status voice-assistant

# Test HA connection
curl http://10.1.10.71:5000/health

# Monitor Maix Duino
minicom -D /dev/ttyUSB0 -b 115200

Cost Breakdown

Item	Cost	Status
Maix Duino	$30	Have it!
I2S Microphone	$5-10	Need
Speaker	$10	Need (or use existing)
MicroSD Card	$5	Have it?
Total	$15-25	(vs $50+ commercial)

Benefits of local solution:

No subscription fees
Complete privacy (no cloud)
Customizable to your needs
Integration with existing infrastructure
Learning experience!

Conclusion

You now have everything you need to build a local, privacy-focused voice assistant! The setup leverages your existing infrastructure (Heimdall for processing, Home Assistant for automation) while keeping costs minimal.

Start with the basic setup, test each component, then iterate and improve. The beauty of this approach is you can enhance it over time without being locked into a commercial platform.

Good luck, and enjoy your new voice assistant! 🎙️

12 KiB Executable file Raw Permalink Blame History