minerva/docs/QUICKSTART.md

# Maix Duino Voice Assistant - Quick Start Guide

## Overview
This guide will walk you through setting up a local, privacy-focused voice assistant using your Maix Duino board and Home Assistant integration. All processing happens on your local network - no cloud services required.

## What You'll Build
- Wake word detection on Maix Duino (edge device)
- Speech-to-text using Whisper on Heimdall
- Home Assistant integration for smart home control
- Text-to-speech responses using Piper
- All processing local to your 10.1.10.0/24 network

## Hardware Requirements
- [x] Sipeed Maix Duino board (you have this!)
- [ ] I2S MEMS microphone (or microphone array)
- [ ] Small speaker (3-5W) or audio output
- [ ] MicroSD card (4GB+) formatted as FAT32
- [ ] USB-C cable for power and programming

## Network Prerequisites
- Maix Duino will need WiFi access to your 10.1.10.0/24 network
- Heimdall (10.1.10.71) for AI processing
- Home Assistant instance (configure URL in setup)

## Setup Process

### Phase 1: Server Setup (Heimdall)

#### Step 1: Run the setup script
```bash
# Transfer files to Heimdall
scp setup_voice_assistant.sh voice_server.py alan@10.1.10.71:~/

# SSH to Heimdall
ssh alan@10.1.10.71

# Make setup script executable and run it
chmod +x setup_voice_assistant.sh
./setup_voice_assistant.sh
```

#### Step 2: Configure Home Assistant access
```bash
# Edit the config file
vim ~/voice-assistant/config/.env
```

Update these values:
```env
HA_URL=http://your-home-assistant:8123
HA_TOKEN=your_long_lived_access_token_here
```

To get a long-lived access token:
1. Open Home Assistant
2. Click your profile (bottom left)
3. Scroll to "Long-Lived Access Tokens"
4. Click "Create Token"
5. Copy the token and paste it in .env

#### Step 3: Test the server
```bash
cd ~/voice-assistant
./test_server.sh
```

You should see:
```
Loading Whisper model: medium
Whisper model loaded successfully
Starting voice processing server on 0.0.0.0:5000
```

#### Step 4: Test with curl (from another terminal)
```bash
# Test health endpoint
curl http://10.1.10.71:5000/health

# Should return:
# {"status":"healthy","whisper_loaded":true,"ha_connected":true}
```

### Phase 2: Maix Duino Setup

#### Step 1: Flash MaixPy firmware
1. Download latest MaixPy firmware from: https://dl.sipeed.com/MAIX/MaixPy/release/
2. Download Kflash GUI: https://github.com/sipeed/kflash_gui
3. Connect Maix Duino via USB
4. Flash firmware using Kflash GUI

#### Step 2: Prepare SD card
```bash
# Format SD card as FAT32
# Create directory structure:
mkdir -p /path/to/sdcard/models

# Copy the client script
cp maix_voice_client.py /path/to/sdcard/main.py
```

#### Step 3: Configure WiFi settings
Edit `/path/to/sdcard/main.py`:
```python
# WiFi Settings
WIFI_SSID = "YourNetworkName"
WIFI_PASSWORD = "YourPassword"

# Server Settings
VOICE_SERVER_URL = "http://10.1.10.71:5000"
```

#### Step 4: Test the board
1. Insert SD card into Maix Duino
2. Connect to serial console (115200 baud)
   ```bash
   screen /dev/ttyUSB0 115200
   # or
   minicom -D /dev/ttyUSB0 -b 115200
   ```
3. Power on the board
4. Watch the serial output for connection status

### Phase 3: Integration & Testing

#### Test 1: Basic connectivity
1. Maix Duino should connect to WiFi and display IP on LCD
2. Server should show in logs when Maix connects

#### Test 2: Audio capture
The current implementation uses amplitude-based wake word detection as a placeholder. To test:
1. Clap loudly near the microphone
2. Speak a command (e.g., "turn on the living room lights")
3. Watch the LCD for transcription and response

#### Test 3: Home Assistant control
Supported commands (add more in voice_server.py):
- "Turn on the living room lights"
- "Turn off the bedroom lights"
- "What's the temperature?"
- "Toggle the kitchen lights"

### Phase 4: Wake Word Training (Advanced)

The placeholder wake word detection uses simple amplitude triggering. For production use:

#### Option A: Use Porcupine (easiest)
1. Sign up at: https://console.picovoice.ai/
2. Train custom wake word
3. Download .ppn model
4. Convert to .kmodel for K210

#### Option B: Use Mycroft Precise (FOSS)
```bash
# On a machine with GPU
conda create -n precise python=3.6
conda activate precise
pip install precise-runner

# Record wake word samples
precise-collect

# Train model
precise-train -e 60 my-wake-word.net my-wake-word/

# Convert to .kmodel
# (requires additional tools - see MaixPy docs)
```

## Architecture Diagram

```
┌─────────────────────────────────────────────────────────────┐
│                     Your Home Network (10.1.10.0/24)        │
│                                                              │
│  ┌──────────────┐         ┌──────────────┐                 │
│  │  Maix Duino  │────────>│  Heimdall    │                 │
│  │  10.1.10.xxx │ Audio   │  10.1.10.71  │                 │
│  │              │<────────│              │                 │
│  │ - Wake Word  │ Response│ - Whisper    │                 │
│  │ - Mic Input  │         │ - Piper TTS  │                 │
│  │ - Speaker    │         │ - Flask API  │                 │
│  └──────────────┘         └──────┬───────┘                 │
│                                   │                          │
│                                   │ REST API                │
│                                   v                          │
│                          ┌──────────────┐                   │
│                          │ Home Asst.   │                   │
│                          │ homeassistant│                   │
│                          │              │                   │
│                          │ - Devices    │                   │
│                          │ - Automation │                   │
│                          └──────────────┘                   │
└─────────────────────────────────────────────────────────────┘
```

## Troubleshooting

### Maix Duino won't connect to WiFi
```python
# Check serial output for errors
# Common issues:
# - Wrong SSID/password
# - WPA3 not supported (use WPA2)
# - 5GHz network (use 2.4GHz)
```

### Whisper transcription is slow
```bash
# Use a smaller model on Heimdall
# Edit ~/voice-assistant/config/.env:
WHISPER_MODEL=base  # or tiny for fastest
```

### Home Assistant commands don't work
```bash
# Check server logs
journalctl -u voice-assistant -f

# Test HA connection manually
curl -H "Authorization: Bearer YOUR_TOKEN" \
     http://your-ha:8123/api/states
```

### Audio quality is poor
1. Check microphone connections
2. Adjust `SAMPLE_RATE` in maix_voice_client.py
3. Test with USB microphone first
4. Consider microphone array for better pickup

### Out of memory on Maix Duino
```python
# In main_loop(), add more frequent GC:
if gc.mem_free() < 200000:  # Increase threshold
    gc.collect()
```

## Adding New Intents

Edit `voice_server.py` and add patterns to `IntentParser.PATTERNS`:

```python
PATTERNS = {
    # Existing patterns...

    'set_temperature': [
        r'set (?:the )?temperature to (\d+)',
        r'make it (\d+) degrees',
    ],
}
```

Then add the handler in `execute_intent()`:

```python
elif intent == 'set_temperature':
    temp = params.get('temperature')
    success = ha_client.call_service(
        'climate', 'set_temperature',
        entity_id, temperature=temp
    )
    return f"Set temperature to {temp} degrees"
```

## Entity Mapping

Add your Home Assistant entities to `IntentParser.ENTITY_MAP`:

```python
ENTITY_MAP = {
    # Lights
    'living room light': 'light.living_room',
    'bedroom light': 'light.bedroom',

    # Climate
    'thermostat': 'climate.main_floor',
    'temperature': 'sensor.main_floor_temperature',

    # Switches
    'coffee maker': 'switch.coffee_maker',
    'fan': 'switch.bedroom_fan',

    # Media
    'tv': 'media_player.living_room_tv',
    'music': 'media_player.whole_house',
}
```

## Performance Tuning

### Reduce latency
1. Use Whisper `tiny` or `base` model
2. Implement streaming audio (currently batch)
3. Pre-load TTS models
4. Use faster TTS engine (e.g., espeak)

### Improve accuracy
1. Use Whisper `large` model (slower)
2. Train custom wake word
3. Add NLU layer (Rasa, spaCy)
4. Collect and fine-tune on your voice

## Next Steps

### Short term
- [ ] Add more Home Assistant entity mappings
- [ ] Implement Piper TTS playback on Maix Duino
- [ ] Train custom wake word model
- [ ] Add LED animations for better feedback
- [ ] Implement conversation context

### Medium term
- [ ] Multi-room support (multiple Maix Duino units)
- [ ] Voice profiles for different users
- [ ] Integration with Plex for media control
- [ ] Calendar and reminder functionality
- [ ] Weather updates from local weather station

### Long term
- [ ] Custom skills/plugins system
- [ ] Integration with other services (Nextcloud, Matrix)
- [ ] Sound event detection (doorbell, smoke alarm)
- [ ] Intercom functionality between rooms
- [ ] Voice-controlled automation creation

## Alternatives & Fallbacks

If the Maix Duino proves limiting:

### Raspberry Pi Zero 2 W
- More processing power
- Better software support
- USB audio support
- Cost: ~$15

### ESP32-S3
- Better WiFi
- More RAM (8MB)
- Cheaper (~$10)
- Good community support

### Orange Pi Zero 2
- ARM Cortex-A53 quad-core
- 512MB-1GB RAM
- Full Linux support
- Cost: ~$20

## Resources

### Documentation
- Maix Duino: https://wiki.sipeed.com/hardware/en/maix/
- MaixPy: https://maixpy.sipeed.com/
- Whisper: https://github.com/openai/whisper
- Piper TTS: https://github.com/rhasspy/piper
- Home Assistant API: https://developers.home-assistant.io/

### Community Projects
- Rhasspy: https://rhasspy.readthedocs.io/
- Willow: https://github.com/toverainc/willow
- Mycroft: https://mycroft.ai/

### Wake Word Tools
- Porcupine: https://picovoice.ai/platform/porcupine/
- Mycroft Precise: https://github.com/MycroftAI/mycroft-precise
- Snowboy (archived): https://github.com/Kitt-AI/snowboy

## Getting Help

### Check logs
```bash
# Server logs (if using systemd)
sudo journalctl -u voice-assistant -f

# Or manual log file
tail -f ~/voice-assistant/logs/voice_assistant.log

# Maix Duino serial console
screen /dev/ttyUSB0 115200
```

### Common issues and solutions
See the Troubleshooting section above

### Useful commands
```bash
# Restart service
sudo systemctl restart voice-assistant

# Check service status
sudo systemctl status voice-assistant

# Test HA connection
curl http://10.1.10.71:5000/health

# Monitor Maix Duino
minicom -D /dev/ttyUSB0 -b 115200
```

## Cost Breakdown

| Item | Cost | Status |
|------|------|--------|
| Maix Duino | $30 | Have it! |
| I2S Microphone | $5-10 | Need |
| Speaker | $10 | Need (or use existing) |
| MicroSD Card | $5 | Have it? |
| **Total** | **$15-25** | (vs $50+ commercial) |

**Benefits of local solution:**
- No subscription fees
- Complete privacy (no cloud)
- Customizable to your needs
- Integration with existing infrastructure
- Learning experience!

## Conclusion

You now have everything you need to build a local, privacy-focused voice assistant! The setup leverages your existing infrastructure (Heimdall for processing, Home Assistant for automation) while keeping costs minimal.

Start with the basic setup, test each component, then iterate and improve. The beauty of this approach is you can enhance it over time without being locked into a commercial platform.

Good luck, and enjoy your new voice assistant! 🎙️