feat: import mycroft-precise work as Minerva foundation

Ports prior voice assistant research and prototypes from devl/Devops into the Minerva repo. Includes: - docs/: architecture, wake word guides, ESP32-S3 spec, hardware buying guide - scripts/: voice_server.py, voice_server_enhanced.py, setup scripts - hardware/maixduino/: edge device scripts with WiFi credentials scrubbed (replaced hardcoded password with secrets.py pattern) - config/.env.example: server config template - .gitignore: excludes .env, secrets.py, model blobs, ELF firmware - CLAUDE.md: Minerva product context and connection to cf-voice roadmap
2026-04-06 22:21:12 -07:00 · 2026-04-06 22:21:12 -07:00 · 173f7f37d4
commit 173f7f37d4
parent fca5a107de
30 changed files with 12519 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,29 @@
 # Credentials
 secrets.py
 config/.env
 *.env
 !*.env.example
 # Models (large binary files)
 models/*.pb
 models/*.pb.params
 models/*.net
 models/*.tflite
 models/*.kmodel
 # OEM firmware blobs
 *.elf
 *.7z
 *.bin
 # Python
 __pycache__/
 *.pyc
 *.pyo
 # Logs
 logs/
 # IDE
 .vscode/
 .idea/
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -0,0 +1,165 @@
 # Minerva — Developer Context
 **Product code:** `MNRV`
 **Status:** Concept / early prototype
 **Domain:** Privacy-first, local-only voice assistant hardware platform
 ---
 ## What Minerva Is
 A 100% local, FOSS voice assistant hardware platform. No cloud. No subscriptions. No data leaving the local network.
 The goal is a reference hardware + software stack for a privacy-first voice assistant that anyone can build, extend, or self-host — including people without technical backgrounds if the assembly docs are good enough.
 Core design principles (same as all CF products):
 - **Local-first inference** — Whisper STT, Piper TTS, Mycroft Precise wake word all run on the host server
 - **Edge where possible** — wake word detection moves to edge hardware over time (K210 → ESP32-S3 → custom)
 - **No cloud dependency** — Home Assistant optional, not required
 - **100% FOSS stack**
 ---
 ## Hardware Targets
 ### Phase 1 (current): Maix Duino (K210)
 - K210 dual-core RISC-V @ 400MHz with KPU neural accelerator
 - Audio: I2S microphone + speaker output
 - Connectivity: ESP32 WiFi/BLE co-processor
 - Programming: MaixPy (MicroPython)
 - Status: server-side wake word working; edge inference in progress
 ### Phase 2: ESP32-S3
 - More accessible, cheaper, better WiFi
 - On-device wake word with Espressif ESP-SR
 - See `docs/ESP32_S3_VOICE_ASSISTANT_SPEC.md`
 ### Phase 3: Custom hardware
 - Dedicated PCB for CF reference platform
 - Hardware-accelerated wake word + VAD
 - Designed for accessibility: large buttons, LED feedback, easy mounting
 ---
 ## Software Stack
 ### Edge device (Maix Duino / ESP32-S3)
 - Firmware: MaixPy or ESP-IDF
 - Client: `hardware/maixduino/maix_voice_client.py`
 - Audio: I2S capture and playback
 - Network: WiFi → Minerva server
 ### Server (runs on Heimdall or any Linux box)
 - Voice server: `scripts/voice_server.py` (Flask + Whisper + Precise)
 - Enhanced version: `scripts/voice_server_enhanced.py` (adds speaker ID via pyannote)
 - STT: Whisper (local)
 - Wake word: Mycroft Precise
 - TTS: Piper
 - Home Assistant: REST API integration (optional)
 - Conda env: `whisper_cli` (existing on Heimdall)
 ---
 ## Directory Structure
 ```
 minerva/
 ├── docs/                        # Architecture, guides, reference docs
 │   ├── maix-voice-assistant-architecture.md
 │   ├── MYCROFT_PRECISE_GUIDE.md
 │   ├── PRECISE_DEPLOYMENT.md
 │   ├── ESP32_S3_VOICE_ASSISTANT_SPEC.md
 │   ├── HARDWARE_BUYING_GUIDE.md
 │   ├── LCD_CAMERA_FEATURES.md
 │   ├── K210_PERFORMANCE_VERIFICATION.md
 │   ├── WAKE_WORD_ADVANCED.md
 │   ├── ADVANCED_WAKE_WORD_TOPICS.md
 │   └── QUESTIONS_ANSWERED.md
 ├── scripts/                     # Server-side scripts
 │   ├── voice_server.py          # Core Flask + Whisper + Precise server
 │   ├── voice_server_enhanced.py # + speaker identification (pyannote)
 │   ├── setup_voice_assistant.sh # Server setup
 │   ├── setup_precise.sh         # Mycroft Precise training environment
 │   └── download_pretrained_models.sh
 ├── hardware/
 │   └── maixduino/               # K210 edge device scripts
 │       ├── maix_voice_client.py # Production client
 │       ├── maix_simple_record_test.py  # Audio capture test
 │       ├── maix_test_simple.py  # Hardware/network test
 │       ├── maix_debug_wifi.py   # WiFi diagnostics
 │       ├── maix_discover_modules.py    # Module discovery
 │       ├── secrets.py.example   # WiFi/server credential template
 │       ├── MICROPYTHON_QUIRKS.md
 │       └── README.md
 ├── config/
 │   └── .env.example             # Server config template
 ├── models/                      # Wake word models (gitignored, large)
 └── CLAUDE.md                    # This file
 ```
 ---
 ## Credentials / Secrets
 **Never commit real credentials.** Pattern:
 - Server: copy `config/.env.example` → `config/.env`, fill in real values
 - Edge device: copy `hardware/maixduino/secrets.py.example` → `secrets.py`, fill in WiFi + server URL
 Both files are gitignored. `.example` files are committed as templates.
 ---
 ## Running the Server
 ```bash
 # Activate environment
 conda activate whisper_cli
 # Basic server (Whisper + Precise wake word)
 python scripts/voice_server.py \
    --enable-precise \
    --precise-model models/hey-minerva.net \
    --precise-sensitivity 0.5
 # Enhanced server (+ speaker identification)
 python scripts/voice_server_enhanced.py \
    --enable-speaker-id \
    --hf-token $HF_TOKEN
 # Test health
 curl http://localhost:5000/health
 curl http://localhost:5000/wake-word/status
 ```
 ---
 ## Connection to CF Voice Infrastructure
 Minerva is the **hardware platform** for cf-voice. As `circuitforge_core.voice` matures:
 - `cf_voice.io` (STT/TTS) → replaces the ad hoc Whisper/Piper calls in `voice_server.py`
 - `cf_voice.context` (parallel classifier) → augments Mycroft Precise with tone/environment detection
 - `cf_voice.telephony` → future: Minerva as an always-on household linnet node
 Minerva hardware + cf-voice software = the CF reference voice assistant stack.
 ---
 ## Roadmap
 See Forgejo milestones on this repo. High-level:
 1. **Alpha — Server-side pipeline** — Whisper + Precise + Piper working end-to-end on Heimdall
 2. **Beta — Edge wake word** — wake word on K210 or ESP32-S3; audio only streams post-wake
 3. **Hardware v1** — documented reference build; buying guide; assembly instructions
 4. **cf-voice integration** — Minerva uses cf_voice modules from circuitforge-core
 5. **Platform** — multiple hardware targets; custom PCB design
 ---
 ## Related
 - `cf-voice` module design: `circuitforge-plans/circuitforge-core/2026-04-06-cf-voice-design.md`
 - `linnet` product: real-time tone annotation, will eventually embed Minerva as a hardware node
 - Heimdall server: primary dev/deployment target (10.1.10.71 on LAN)
--- a/config/.env.example
+++ b/config/.env.example
@ -0,0 +1,24 @@
 # Minerva Voice Server — configuration
 # Copy to config/.env and fill in real values. Never commit .env.
 # Server
 SERVER_HOST=0.0.0.0
 SERVER_PORT=5000
 # Whisper STT
 WHISPER_MODEL=base
 # Mycroft Precise wake word
 # PRECISE_MODEL=/path/to/wake-word.net
 # PRECISE_SENSITIVITY=0.5
 # Home Assistant integration (optional)
 # HA_URL=http://homeassistant.local:8123
 # HA_TOKEN=your_long_lived_access_token_here
 # HuggingFace (for speaker identification, optional)
 # HF_TOKEN=your_huggingface_token_here
 # Logging
 LOG_LEVEL=INFO
 LOG_FILE=logs/minerva.log
--- a/docs/ADVANCED_WAKE_WORD_TOPICS.md
+++ b/docs/ADVANCED_WAKE_WORD_TOPICS.md
@ -0,0 +1,905 @@
 # Advanced Wake Word Topics - Pre-trained Models, Multiple Wake Words, and Voice Adaptation
 ## Pre-trained Mycroft Models
 ### Yes! Pre-trained Models Exist
 Mycroft AI provides several pre-trained wake word models you can use immediately:
 **Available Models:**
 - **Hey Mycroft** - Original Mycroft wake word (most training data)
 - **Hey Jarvis** - Popular alternative
 - **Christopher** - Alternative wake word
 - **Hey Ezra** - Another option
 ### Download Pre-trained Models
 ```bash
 # On Heimdall
 conda activate precise
 cd ~/precise-models
 # Create directory for pre-trained models
 mkdir -p pretrained
 cd pretrained
 # Download Hey Mycroft (recommended starting point)
 wget https://github.com/MycroftAI/precise-data/raw/models-dev/hey-mycroft.tar.gz
 tar xzf hey-mycroft.tar.gz
 # Download other models
 wget https://github.com/MycroftAI/precise-data/raw/models-dev/hey-jarvis.tar.gz
 tar xzf hey-jarvis.tar.gz
 # List available models
 ls -lh *.net
 ```
 ### Test Pre-trained Model
 ```bash
 conda activate precise
 # Test Hey Mycroft
 precise-listen hey-mycroft.net
 # Speak "Hey Mycroft" - should see "!" when detected
 # Press Ctrl+C to exit
 # Test with different threshold
 precise-listen hey-mycroft.net -t 0.7  # More conservative
 ```
 ### Use Pre-trained Model in Voice Server
 ```bash
 cd ~/voice-assistant
 # Start server with Hey Mycroft model
 python voice_server.py \
    --enable-precise \
    --precise-model ~/precise-models/pretrained/hey-mycroft.net \
    --precise-sensitivity 0.5
 ```
 ### Fine-tune Pre-trained Models
 You can use pre-trained models as a **starting point** and fine-tune with your voice:
 ```bash
 cd ~/precise-models
 mkdir -p hey-mycroft-custom
 # Copy base model
 cp pretrained/hey-mycroft.net hey-mycroft-custom/
 # Collect your samples
 cd hey-mycroft-custom
 precise-collect  # Record 20-30 samples of YOUR voice
 # Fine-tune from pre-trained model
 precise-train -e 30 hey-mycroft-custom.net . \
    --from-checkpoint ../pretrained/hey-mycroft.net
 # This is MUCH faster than training from scratch!
 ```
 **Benefits:**
 - ✅ Start with proven model
 - ✅ Much less training data needed (20-30 vs 100+ samples)
 - ✅ Faster training (30 mins vs 60 mins)
 - ✅ Good baseline accuracy
 ## Multiple Wake Words
 ### Architecture Options
 #### Option 1: Multiple Models in Parallel (Server-Side Only)
 Run multiple Precise instances simultaneously:
 ```python
 # In voice_server.py - Multiple wake word detection
 from precise_runner import PreciseEngine, PreciseRunner
 import threading
 # Global runners
 precise_runners = {}
 def on_wake_word_detected(wake_word_name):
    """Callback factory for different wake words"""
    def callback():
        print(f"Wake word detected: {wake_word_name}")
        wake_word_queue.put({
            'wake_word': wake_word_name,
            'timestamp': time.time()
        })
    return callback
 def start_multiple_wake_words(wake_word_configs):
    """
    Start multiple wake word detectors
    Args:
        wake_word_configs: List of dicts with 'name', 'model', 'sensitivity'
    Example:
        configs = [
            {'name': 'hey mycroft', 'model': 'hey-mycroft.net', 'sensitivity': 0.5},
            {'name': 'hey jarvis', 'model': 'hey-jarvis.net', 'sensitivity': 0.5}
        ]
    """
    global precise_runners
    for config in wake_word_configs:
        engine = PreciseEngine(
            '/usr/local/bin/precise-engine',
            config['model']
        )
        runner = PreciseRunner(
            engine,
            sensitivity=config['sensitivity'],
            on_activation=on_wake_word_detected(config['name'])
        )
        runner.start()
        precise_runners[config['name']] = runner
        print(f"Started wake word detector: {config['name']}")
 ```
 **Server-Side Multiple Wake Words:**
 ```bash
 # Start server with multiple wake words
 python voice_server.py \
    --enable-precise \
    --precise-models "hey-mycroft:~/models/hey-mycroft.net:0.5,hey-jarvis:~/models/hey-jarvis.net:0.5"
 ```
 **Performance Impact:**
 - CPU: ~5-10% per model (can run 2-3 easily)
 - Memory: ~50-100MB per model
 - Latency: Minimal (all run in parallel)
 #### Option 2: Single Model, Multiple Phrases (Edge or Server)
 Train ONE model that responds to multiple phrases:
 ```bash
 cd ~/precise-models/multi-wake
 conda activate precise
 # Record samples for BOTH wake words in the SAME dataset
 # Label all as "wake-word" regardless of which phrase
 mkdir -p wake-word not-wake-word
 # Record "Hey Mycroft" samples
 precise-collect  # Save to wake-word/hey-mycroft-*.wav
 # Record "Hey Computer" samples  
 precise-collect  # Save to wake-word/hey-computer-*.wav
 # Record negatives
 precise-collect -f not-wake-word/random.wav
 # Train single model on both phrases
 precise-train -e 60 multi-wake.net .
 ```
 **Pros:**
 - ✅ Single model = less compute
 - ✅ Works on edge (K210)
 - ✅ Easy to deploy
 **Cons:**
 - ❌ Can't tell which wake word was used
 - ❌ May reduce accuracy for each individual phrase
 - ❌ Higher false positive risk
 #### Option 3: Sequential Detection (Edge)
 Detect wake word, then identify which one:
 ```python
 # Pseudo-code for edge detection
 if wake_word_detected():
    audio_snippet = last_2_seconds()
    # Run all models on the audio snippet
    scores = {
        'hey-mycroft': model1.score(audio_snippet),
        'hey-jarvis': model2.score(audio_snippet),
        'hey-computer': model3.score(audio_snippet)
    }
    # Use highest scoring wake word
    wake_word = max(scores, key=scores.get)
 ```
 ### Recommendations
 **Server-Side (Heimdall):**
 - ✅ **Use Option 1** - Multiple models in parallel
 - Run 2-3 wake words easily
 - Each can have different sensitivity
 - Can identify which wake word was used
 - Example: "Hey Mycroft" for commands, "Hey Jarvis" for queries
 **Edge (Maix Duino K210):**
 - ✅ **Use Option 2** - Single multi-phrase model
 - K210 can handle 1 model efficiently
 - Train on 2-3 phrases max
 - Simpler deployment
 - Lower latency
 ## Voice Adaptation & Multi-User Support
 ### Approach 1: Inclusive Training (Recommended)
 Train ONE model on EVERYONE'S voices:
 ```bash
 cd ~/precise-models/family-wake-word
 conda activate precise
 # Record samples from each family member
 # Alice records 30 samples
 precise-collect  # Save as wake-word/alice-*.wav
 # Bob records 30 samples
 precise-collect  # Save as wake-word/bob-*.wav
 # Carol records 30 samples
 precise-collect  # Save as wake-word/carol-*.wav
 # Train on all voices
 precise-train -e 60 family-wake-word.net .
 ```
 **Pros:**
 - ✅ Everyone can use the system
 - ✅ Single model deployment
 - ✅ Works for all family members
 - ✅ Simple maintenance
 **Cons:**
 - ❌ Can't identify who spoke
 - ❌ May need more training data
 - ❌ No personalization
 **Best for:** Family voice assistant, shared devices
 ### Approach 2: Speaker Identification (Advanced)
 Detect wake word, then identify speaker:
 ```python
 # Architecture with speaker ID
 # Step 1: Precise detects wake word
 if wake_word_detected():
    # Step 2: Capture voice sample
    voice_sample = record_audio(duration=3)
    # Step 3: Speaker identification
    speaker = identify_speaker(voice_sample)
    # Uses voice embeddings/neural network
    # Step 4: Process with user context
    process_command(voice_sample, user=speaker)
 ```
 **Implementation Options:**
 #### Option A: Use resemblyzer (Voice Embeddings)
 ```bash
 pip install resemblyzer --break-system-packages
 # Enrollment phase
 python enroll_users.py
 # Each user records 10-20 seconds of speech
 # System creates voice profile (embedding)
 # Runtime
 python speaker_id.py
 # Compares incoming audio to stored embeddings
 # Returns most likely speaker
 ```
 **Example Code:**
 ```python
 from resemblyzer import VoiceEncoder, preprocess_wav
 import numpy as np
 # Initialize encoder
 encoder = VoiceEncoder()
 # Enrollment - do once per user
 def enroll_user(name, audio_files):
    """Create voice profile for user"""
    embeddings = []
    for audio_file in audio_files:
        wav = preprocess_wav(audio_file)
        embedding = encoder.embed_utterance(wav)
        embeddings.append(embedding)
    # Average embeddings for robustness
    user_profile = np.mean(embeddings, axis=0)
    # Save profile
    np.save(f'profiles/{name}.npy', user_profile)
    return user_profile
 # Identification - run each time
 def identify_speaker(audio_file, profiles_dir='profiles'):
    """Identify which enrolled user is speaking"""
    wav = preprocess_wav(audio_file)
    test_embedding = encoder.embed_utterance(wav)
    # Load all profiles
    profiles = {}
    for profile_file in os.listdir(profiles_dir):
        name = profile_file.replace('.npy', '')
        profile = np.load(os.path.join(profiles_dir, profile_file))
        profiles[name] = profile
    # Calculate similarity to each profile
    similarities = {}
    for name, profile in profiles.items():
        similarity = np.dot(test_embedding, profile)
        similarities[name] = similarity
    # Return most similar
    best_match = max(similarities, key=similarities.get)
    confidence = similarities[best_match]
    if confidence > 0.7:  # Threshold
        return best_match
    else:
        return "unknown"
 ```
 #### Option B: Use pyannote.audio (Production-grade)
 ```bash
 pip install pyannote.audio --break-system-packages
 # Requires HuggingFace token (same as diarization)
 ```
 **Example:**
 ```python
 from pyannote.audio import Inference
 # Initialize
 inference = Inference(
    "pyannote/embedding",
    use_auth_token="your_hf_token"
 )
 # Enroll users
 alice_profile = inference("alice_sample.wav")
 bob_profile = inference("bob_sample.wav")
 # Identify
 test_embedding = inference("test_audio.wav")
 # Compare
 from scipy.spatial.distance import cosine
 alice_similarity = 1 - cosine(test_embedding, alice_profile)
 bob_similarity = 1 - cosine(test_embedding, bob_profile)
 if alice_similarity > bob_similarity and alice_similarity > 0.7:
    speaker = "Alice"
 elif bob_similarity > 0.7:
    speaker = "Bob"
 else:
    speaker = "Unknown"
 ```
 **Pros:**
 - ✅ Can identify individual users
 - ✅ Personalized responses
 - ✅ User-specific commands/permissions
 - ✅ Better for privacy (know who's speaking)
 **Cons:**
 - ❌ More complex implementation
 - ❌ Requires enrollment phase
 - ❌ Additional processing time (~100-200ms)
 - ❌ May fail with similar voices
 ### Approach 3: Per-User Wake Word Models
 Each person has their OWN wake word:
 ```bash
 # Alice's wake word: "Hey Mycroft"
 # Train on ONLY Alice's voice
 # Bob's wake word: "Hey Jarvis"  
 # Train on ONLY Bob's voice
 # Carol's wake word: "Hey Computer"
 # Train on ONLY Carol's voice
 ```
 **Deployment:**
 Run all 3 models in parallel (server-side):
 ```python
 wake_word_configs = [
    {'name': 'Alice', 'wake_word': 'hey mycroft', 'model': 'alice-wake.net'},
    {'name': 'Bob', 'wake_word': 'hey jarvis', 'model': 'bob-wake.net'},
    {'name': 'Carol', 'wake_word': 'hey computer', 'model': 'carol-wake.net'}
 ]
 ```
 **Pros:**
 - ✅ Automatic user identification
 - ✅ Highest accuracy per user
 - ✅ Clear user separation
 - ✅ No additional speaker ID needed
 **Cons:**
 - ❌ Requires 3x models (server only)
 - ❌ Users must remember their wake word
 - ❌ 3x CPU usage (~15-30%)
 - ❌ Can't work on edge (K210)
 ### Approach 4: Context-Based Adaptation
 No speaker ID, but learn from interaction:
 ```python
 # Track command patterns
 user_context = {
    'last_command': 'turn on living room lights',
    'frequent_entities': ['light.living_room', 'light.bedroom'],
    'time_of_day_patterns': {'morning': 'coffee maker', 'evening': 'tv'},
    'location': 'home'  # vs 'away'
 }
 # Use context to improve intent recognition
 if "turn on the lights" and time.is_morning():
    # Probably means bedroom lights (based on history)
    entity = user_context['frequent_entities'][0]
 ```
 **Pros:**
 - ✅ No enrollment needed
 - ✅ Improves over time
 - ✅ Simple to implement
 - ✅ Works with any number of users
 **Cons:**
 - ❌ No true user identification
 - ❌ May make incorrect assumptions
 - ❌ Privacy concerns (tracking behavior)
 ## Recommended Strategy
 ### For Your Use Case
 Based on your home lab setup, I recommend:
 #### Phase 1: Single Wake Word, Inclusive Training (Week 1-2)
 ```bash
 # Start simple
 cd ~/precise-models/hey-computer
 conda activate precise
 # Have all family members record samples
 # Alice: 30 samples of "Hey Computer"
 # Bob: 30 samples of "Hey Computer"
 # You: 30 samples of "Hey Computer"
 # Train single model on all voices
 precise-train -e 60 hey-computer.net .
 # Deploy to server
 python voice_server.py \
    --enable-precise \
    --precise-model hey-computer.net
 ```
 **Why:** 
 - Simple to setup and test
 - Everyone can use it immediately
 - Single model = easier debugging
 - Works on edge if you migrate later
 #### Phase 2: Add Speaker Identification (Week 3-4)
 ```bash
 # Install resemblyzer
 pip install resemblyzer --break-system-packages
 # Enroll users
 python enroll_users.py
 # Each person speaks for 20 seconds
 # Update voice_server.py to identify speaker
 # Use speaker ID for personalized responses
 ```
 **Why:**
 - Enables personalization
 - Can track preferences per user
 - User-specific command permissions
 - Better privacy (know who's speaking)
 #### Phase 3: Multiple Wake Words (Month 2+)
 ```bash
 # Add alternative wake words for different contexts
 # "Hey Mycroft" - General commands
 # "Hey Jarvis" - Media/Plex control
 # "Computer" - Quick commands (lights, temp)
 # Deploy multiple models on server
 python voice_server.py \
    --enable-precise \
    --precise-models "mycroft:hey-mycroft.net:0.5,jarvis:hey-jarvis.net:0.5"
 ```
 **Why:**
 - Different wake words for different contexts
 - Reduces false positives (more specific triggers)
 - Fun factor (Jarvis for media!)
 - Server can handle 2-3 easily
 ## Implementation Guide: Multiple Wake Words
 ### Update voice_server.py for Multiple Wake Words
 ```python
 # Add to voice_server.py
 def start_multiple_wake_words(configs):
    """
    Start multiple wake word detectors
    Args:
        configs: List of dicts with 'name', 'model_path', 'sensitivity'
    """
    global precise_runners
    precise_runners = {}
    for config in configs:
        try:
            engine = PreciseEngine(
                DEFAULT_PRECISE_ENGINE,
                config['model_path']
            )
            def make_callback(wake_word_name):
                def callback():
                    print(f"Wake word detected: {wake_word_name}")
                    wake_word_queue.put({
                        'wake_word': wake_word_name,
                        'timestamp': time.time(),
                        'source': 'precise'
                    })
                return callback
            runner = PreciseRunner(
                engine,
                sensitivity=config['sensitivity'],
                on_activation=make_callback(config['name'])
            )
            runner.start()
            precise_runners[config['name']] = runner
            print(f"✓ Started: {config['name']} (sensitivity: {config['sensitivity']})")
        except Exception as e:
            print(f"✗ Failed to start {config['name']}: {e}")
    return len(precise_runners) > 0
 # Add to main()
 parser.add_argument('--precise-models', 
                   help='Multiple models: name:path:sensitivity,name2:path2:sensitivity2')
 # Parse multiple models
 if args.precise_models:
    configs = []
    for model_spec in args.precise_models.split(','):
        name, path, sensitivity = model_spec.split(':')
        configs.append({
            'name': name,
            'model_path': os.path.expanduser(path),
            'sensitivity': float(sensitivity)
        })
    start_multiple_wake_words(configs)
 ```
 ### Usage Example
 ```bash
 cd ~/voice-assistant
 # Start with multiple wake words
 python voice_server.py \
    --enable-precise \
    --precise-models "\
 hey-mycroft:~/precise-models/pretrained/hey-mycroft.net:0.5,\
 hey-jarvis:~/precise-models/pretrained/hey-jarvis.net:0.5"
 ```
 ## Implementation Guide: Speaker Identification
 ### Add to voice_server.py
 ```python
 # Add resemblyzer support
 try:
    from resemblyzer import VoiceEncoder, preprocess_wav
    import numpy as np
    SPEAKER_ID_AVAILABLE = True
 except ImportError:
    SPEAKER_ID_AVAILABLE = False
    print("Warning: resemblyzer not available. Speaker ID disabled.")
 # Initialize encoder
 voice_encoder = None
 speaker_profiles = {}
 def load_speaker_profiles(profiles_dir='~/voice-assistant/profiles'):
    """Load enrolled speaker profiles"""
    global speaker_profiles, voice_encoder
    if not SPEAKER_ID_AVAILABLE:
        return False
    profiles_dir = os.path.expanduser(profiles_dir)
    if not os.path.exists(profiles_dir):
        print(f"No speaker profiles found at {profiles_dir}")
        return False
    # Initialize encoder
    voice_encoder = VoiceEncoder()
    # Load all profiles
    for profile_file in os.listdir(profiles_dir):
        if profile_file.endswith('.npy'):
            name = profile_file.replace('.npy', '')
            profile = np.load(os.path.join(profiles_dir, profile_file))
            speaker_profiles[name] = profile
            print(f"Loaded speaker profile: {name}")
    return len(speaker_profiles) > 0
 def identify_speaker(audio_path, threshold=0.7):
    """Identify speaker from audio file"""
    if not SPEAKER_ID_AVAILABLE or not speaker_profiles:
        return None
    try:
        # Get embedding for test audio
        wav = preprocess_wav(audio_path)
        test_embedding = voice_encoder.embed_utterance(wav)
        # Compare to all profiles
        similarities = {}
        for name, profile in speaker_profiles.items():
            similarity = np.dot(test_embedding, profile)
            similarities[name] = similarity
        # Get best match
        best_match = max(similarities, key=similarities.get)
        confidence = similarities[best_match]
        print(f"Speaker ID: {best_match} (confidence: {confidence:.2f})")
        if confidence > threshold:
            return best_match
        else:
            return "unknown"
    except Exception as e:
        print(f"Error identifying speaker: {e}")
        return None
 # Update process endpoint to include speaker ID
@app.route('/process', methods=['POST'])
 def process():
    """Process complete voice command with speaker identification"""
    # ... existing code ...
    # Add speaker identification
    speaker = identify_speaker(temp_path) if speaker_profiles else None
    if speaker:
        print(f"Detected speaker: {speaker}")
        # Could personalize response based on speaker
    # ... rest of processing ...
 ```
 ### Enrollment Script
 Create `enroll_speaker.py`:
 ```python
 #!/usr/bin/env python3
 """
 Enroll users for speaker identification
 Usage:
    python enroll_speaker.py --name Alice --audio alice_sample.wav
    python enroll_speaker.py --name Alice --duration 20  # Record live
 """
 import argparse
 import os
 import numpy as np
 from resemblyzer import VoiceEncoder, preprocess_wav
 import pyaudio
 import wave
 def record_audio(duration=20, sample_rate=16000):
    """Record audio from microphone"""
    print(f"Recording for {duration} seconds...")
    print("Speak naturally - read a paragraph, have a conversation, etc.")
    chunk = 1024
    format = pyaudio.paInt16
    channels = 1
    p = pyaudio.PyAudio()
    stream = p.open(
        format=format,
        channels=channels,
        rate=sample_rate,
        input=True,
        frames_per_buffer=chunk
    )
    frames = []
    for i in range(0, int(sample_rate / chunk * duration)):
        data = stream.read(chunk)
        frames.append(data)
    stream.stop_stream()
    stream.close()
    p.terminate()
    # Save to temp file
    temp_file = f"/tmp/enrollment_{os.getpid()}.wav"
    wf = wave.open(temp_file, 'wb')
    wf.setnchannels(channels)
    wf.setsampwidth(p.get_sample_size(format))
    wf.setframerate(sample_rate)
    wf.writeframes(b''.join(frames))
    wf.close()
    return temp_file
 def enroll_speaker(name, audio_file, profiles_dir='~/voice-assistant/profiles'):
    """Create voice profile for speaker"""
    profiles_dir = os.path.expanduser(profiles_dir)
    os.makedirs(profiles_dir, exist_ok=True)
    # Initialize encoder
    encoder = VoiceEncoder()
    # Process audio
    wav = preprocess_wav(audio_file)
    embedding = encoder.embed_utterance(wav)
    # Save profile
    profile_path = os.path.join(profiles_dir, f'{name}.npy')
    np.save(profile_path, embedding)
    print(f"✓ Enrolled speaker: {name}")
    print(f"  Profile saved to: {profile_path}")
    return profile_path
 def main():
    parser = argparse.ArgumentParser(description="Enroll speaker for voice identification")
    parser.add_argument('--name', required=True, help='Speaker name')
    parser.add_argument('--audio', help='Path to audio file (wav)')
    parser.add_argument('--duration', type=int, default=20, 
                       help='Recording duration if not using audio file')
    parser.add_argument('--profiles-dir', default='~/voice-assistant/profiles',
                       help='Directory to save profiles')
    args = parser.parse_args()
    # Get audio file
    if args.audio:
        audio_file = args.audio
        if not os.path.exists(audio_file):
            print(f"Error: Audio file not found: {audio_file}")
            return 1
    else:
        audio_file = record_audio(args.duration)
    # Enroll speaker
    try:
        enroll_speaker(args.name, audio_file, args.profiles_dir)
        return 0
    except Exception as e:
        print(f"Error enrolling speaker: {e}")
        return 1
 if __name__ == '__main__':
    import sys
    sys.exit(main())
 ```
 ## Performance Comparison
 ### Single Wake Word
 - **Latency:** 100-200ms
 - **CPU:** ~5-10% (idle)
 - **Memory:** ~100MB
 - **Accuracy:** 95%+
 ### Multiple Wake Words (3 models)
 - **Latency:** 100-200ms (parallel)
 - **CPU:** ~15-30% (idle)
 - **Memory:** ~300MB
 - **Accuracy:** 95%+ each
 ### With Speaker Identification
 - **Additional latency:** +100-200ms
 - **Additional CPU:** +5% during ID
 - **Additional memory:** +50MB
 - **Accuracy:** 85-95% (depending on enrollment quality)
 ## Best Practices
 ### Wake Word Selection
 1. **Different enough** - "Hey Mycroft" vs "Hey Jarvis" (not "Hey Alice" vs "Hey Alex")
 2. **Clear consonants** - Easier to detect
 3. **2-3 syllables** - Not too short, not too long
 4. **Test in environment** - Check for false triggers
 ### Training
 1. **Include all users** - If using single model
 2. **Diverse conditions** - Different rooms, noise levels
 3. **Regular updates** - Add false positives weekly
 4. **Per-user models** - Higher accuracy, more compute
 ### Speaker Identification
 1. **Quality enrollment** - 20+ seconds of clear speech
 2. **Re-enroll periodically** - Voices change (colds, etc.)
 3. **Test thresholds** - Balance accuracy vs false IDs
 4. **Graceful fallback** - Handle unknown speakers
 ## Recommended Path for You
 ```bash
 # Week 1: Start with pre-trained "Hey Mycroft"
 wget https://github.com/MycroftAI/precise-data/raw/models-dev/hey-mycroft.tar.gz
 precise-listen hey-mycroft.net  # Test it!
 # Week 2: Fine-tune with your voices
 precise-train -e 30 hey-mycroft-custom.net . \
    --from-checkpoint hey-mycroft.net
 # Week 3: Add speaker identification
 pip install resemblyzer
 python enroll_speaker.py --name Alan --duration 20
 python enroll_speaker.py --name [Family Member] --duration 20
 # Week 4: Add second wake word ("Hey Jarvis" for Plex?)
 wget hey-jarvis.tar.gz
 # Run both in parallel
 # Month 2+: Optimize and expand
 # - More wake words for different contexts
 # - Per-user wake word models
 # - Context-aware responses
 ```
 This gives you a smooth progression from simple to advanced!
--- a/docs/ESP32_S3_VOICE_ASSISTANT_SPEC.md
+++ b/docs/ESP32_S3_VOICE_ASSISTANT_SPEC.md
--- a/docs/HARDWARE_BUYING_GUIDE.md
+++ b/docs/HARDWARE_BUYING_GUIDE.md
@ -0,0 +1,542 @@
 # Voice Assistant Hardware - Buying Guide for Second Unit
 **Date:** 2025-11-29  
 **Context:** You have one Maix Duino (K210), planning multi-room deployment  
 **Question:** What should I buy for the second unit?
 ---
 ## Quick Answer
 **Best Overall:** **Buy another Maix Duino K210** (~$30-40)  
 **Runner-up:** **ESP32-S3 with audio board** (~$20-30)  
 **Budget:** **Generic ESP32 + I2S** (~$15-20)  
 **Future-proof:** **Sipeed Maix-III** (~$60-80, when available)
 ---
 ## Analysis: Why Another Maix Duino K210?
 ### Pros ✅
 - **Identical to first unit** - Code reuse, same workflow
 - **Proven solution** - You'll know exactly what to expect
 - **Stock availability** - Still widely available despite being "outdated"
 - **Same accessories** - Microphones, displays, cables compatible
 - **Edge detection ready** - Can upgrade to edge wake word later
 - **Low cost** - ~$30-40 for full kit with LCD and camera
 - **Multi-room consistency** - All units behave identically
 ### Cons ❌
 - "Outdated" hardware (but doesn't matter for your use case)
 - Limited future support from Sipeed
 ### Verdict: ✅ **RECOMMENDED - Best choice for consistency**
 ---
 ## Alternative Options
 ### Option 1: Another Maix Duino K210
 **Price:** $30-40 (kit with LCD)  
 **Where:** AliExpress, Amazon, Seeed Studio
 **Specific Model:**
 - **Sipeed Maix Duino** (original, what you have)
 - Includes: LCD, camera module
 - Need to add: I2S microphone
 **Why Choose:**
 - Identical setup to first unit
 - Code works without modification
 - Same troubleshooting experience
 - Bulk buy discount possible
 **Link Examples:**
 - Seeed Studio: https://www.seeedstudio.com/Sipeed-Maix-Duino-Kit-for-RISC-V-AI-IoT.html
 - AliExpress: Search "Sipeed Maix Duino" (~$25-35)
 ---
 ### Option 2: Sipeed Maix Bit/Dock (K210 variant)
 **Price:** $15-25 (smaller form factor)
 **Differences from Maix Duino:**
 - Smaller board
 - May need separate LCD
 - Same K210 chip
 - Same capabilities
 **Why Choose:**
 - Cheaper
 - More compact
 - Same software
 **Why Skip:**
 - Need separate accessories
 - Different form factor means different mounting
 - Less convenient than all-in-one Duino
 **Verdict:** ⚠️ Only if you want smaller/cheaper
 ---
 ### Option 3: ESP32-S3 with Audio Kit
 **Price:** $20-30  
 **Chip:** ESP32-S3 (Xtensa dual-core @ 240MHz)
 **Examples:**
 - **ESP32-S3-Box** (~$30) - Has LCD, microphone, speaker built-in
 - **Seeed XIAO ESP32-S3 Sense** (~$15) - Tiny, needs accessories
 - **M5Stack Core S3** (~$50) - Premium, all-in-one
 **Pros:**
 - ✅ More modern than K210
 - ✅ Better WiFi/BLE support
 - ✅ Lower power consumption
 - ✅ Active development
 - ✅ Arduino/ESP-IDF support
 **Cons:**
 - ❌ No KPU (neural accelerator)
 - ❌ Different code needed (ESP32 vs MaixPy)
 - ❌ Less ML capability (for future edge wake word)
 - ❌ Different ecosystem
 **Best ESP32-S3 Choice:** **ESP32-S3-Box**
 - All-in-one like your Maix Duino
 - Built-in mic, speaker, LCD
 - Good for server-side wake word
 - Cheaper than Maix Duino
 **Verdict:** 🤔 Good alternative if you want to experiment
 ---
 ### Option 4: Raspberry Pi Zero 2 W
 **Price:** $15-20 (board only, need accessories)
 **Pros:**
 - ✅ Full Linux
 - ✅ Familiar ecosystem
 - ✅ Tons of support
 - ✅ Easy Python development
 **Cons:**
 - ❌ No neural accelerator
 - ❌ No dedicated audio hardware
 - ❌ More power hungry (~500mW vs 200mW)
 - ❌ Overkill for audio streaming
 - ❌ Need USB sound card or I2S HAT
 - ❌ Larger form factor
 **Verdict:** ❌ Not ideal for this project
 ---
 ### Option 5: Sipeed Maix-III AXera-Pi (Future)
 **Price:** $60-80 (when available)  
 **Chip:** AX620A (much more powerful than K210)
 **Pros:**
 - ✅ Modern hardware (2023)
 - ✅ Better AI performance
 - ✅ Linux + Python support
 - ✅ Sipeed ecosystem continuity
 - ✅ Great for edge wake word
 **Cons:**
 - ❌ More expensive
 - ❌ Newer = less community support
 - ❌ Overkill for server-side wake word
 - ❌ Stock availability varies
 **Verdict:** 🔮 Future-proof option if budget allows
 ---
 ### Option 6: Generic ESP32 + I2S Breakout
 **Price:** $10-15 (cheapest option)
 **What You Need:**
 - ESP32 DevKit (~$5)
 - I2S MEMS mic (~$5)
 - Optional: I2S speaker amp (~$5)
 **Pros:**
 - ✅ Cheapest option
 - ✅ Minimal, focused on audio only
 - ✅ Very low power
 - ✅ WiFi built-in
 **Cons:**
 - ❌ No LCD (would need separate)
 - ❌ No camera
 - ❌ DIY assembly required
 - ❌ No neural accelerator
 - ❌ Different code from K210
 **Verdict:** 💰 Budget choice, but less polished
 ---
 ## Comparison Table
 | Option | Price | Same Code? | LCD | AI Accel | Best For |
 |--------|-------|------------|-----|----------|----------|
 | **Maix Duino K210** | $30-40 | ✅ Yes | ✅ Included | ✅ KPU | **Multi-room consistency** |
 | Maix Bit/Dock (K210) | $15-25 | ✅ Yes | ⚠️ Optional | ✅ KPU | Compact/Budget |
 | ESP32-S3-Box | $25-35 | ❌ No | ✅ Included | ❌ No | Modern alternative |
 | ESP32-S3 DIY | $15-25 | ❌ No | ❌ No | ❌ No | Custom build |
 | Raspberry Pi Zero 2 W | $30+ | ❌ No | ❌ No | ❌ No | Linux/overkill |
 | Maix-III | $60-80 | ⚠️ Similar | ✅ Varies | ✅ NPU | Future-proof |
 | Generic ESP32 | $10-15 | ❌ No | ❌ No | ❌ No | Absolute budget |
 ---
 ## Recommended Purchase Plan
 ### Phase 1: Second Identical Unit (NOW)
 **Buy:** Sipeed Maix Duino K210 (same as first)  
 **Cost:** ~$30-40  
 **Why:** Code reuse, proven solution, multi-room consistency
 **What to Order:**
 - [ ] Sipeed Maix Duino board with LCD and camera
 - [ ] I2S MEMS microphone (if not included)
 - [ ] Small speaker or audio output (3-5W)
 - [ ] USB-C cable
 - [ ] MicroSD card (4GB+)
 **Total Cost:** ~$40-50 with accessories
 ---
 ### Phase 2: Third+ Units (LATER)
 **Option A:** More Maix Duinos (if still available)  
 **Option B:** Switch to ESP32-S3-Box for variety/testing  
 **Option C:** Wait for Maix-III if you want cutting edge
 ---
 ## Where to Buy Maix Duino
 ### Recommended Sellers
 **1. Seeed Studio (Official Partner)**
 - URL: https://www.seeedstudio.com/
 - Search: "Sipeed Maix Duino"
 - Price: ~$35-45
 - Shipping: International, good support
 - **Pro:** Official, reliable, good documentation
 - **Con:** Can be out of stock
 **2. AliExpress (Direct from Sipeed/China)**
 - Search: "Sipeed Maix Duino"
 - Price: ~$25-35
 - Shipping: 2-4 weeks (free or cheap)
 - **Pro:** Cheapest, often bundled with accessories
 - **Con:** Longer shipping, variable quality control
 - **Tip:** Look for "Sipeed Official Store"
 **3. Amazon**
 - Search: "Maix Duino K210"
 - Price: ~$40-50
 - Shipping: Fast (Prime eligible sometimes)
 - **Pro:** Fast shipping, easy returns
 - **Con:** Higher price, limited stock
 **4. Adafruit / SparkFun**
 - May carry Sipeed products
 - Higher price but US-based support
 - Check availability
 ---
 ## Accessories to Buy
 ### Essential (for each unit)
 **1. I2S MEMS Microphone**
 - **Recommended:** Adafruit I2S MEMS Microphone Breakout (~$7)
  - Model: SPH0645LM4H
  - URL: https://www.adafruit.com/product/3421
 - **Alternative:** INMP441 I2S Microphone (~$3 on AliExpress)
  - Cheaper, works well
  - Search: "INMP441 I2S microphone"
 **2. Speaker / Audio Output**
 - **Option A:** Small 3-5W speaker (~$5-10)
  - Search: "3W 8 ohm speaker"
 - **Option B:** I2S speaker amplifier + speaker
  - MAX98357A I2S amp (~$5)
  - 4-8 ohm speaker (~$5)
 - **Option C:** Line out to existing speakers (cheapest)
 **3. MicroSD Card**
 - 4GB or larger
 - FAT32 formatted
 - Class 10 recommended
 - ~$5
 **4. USB-C Cable**
 - For power and programming
 - ~$3-5
 ---
 ### Optional but Nice
 **1. Enclosure/Case**
 - 3D print custom case
 - Find STL files on Thingiverse
 - Or use small project box (~$5)
 **2. Microphone Array** (for better pickup)
 - 2 or 4-mic array board (~$15-25)
 - Better voice detection
 - Phase 2+ enhancement
 **3. Battery Pack** (for portable testing)
 - USB-C power bank
 - Makes testing easier
 - Already have? Use it!
 **4. Mounting Hardware**
 - Velcro strips
 - 3M command strips
 - Wall mount brackets
 - ~$5
 ---
 ## Multi-Unit Strategy
 ### Same Hardware (Recommended)
 **Buy:** 2-4x Maix Duino K210 units  
 **Benefit:** 
 - All units identical
 - Same code deployment
 - Easy troubleshooting
 - Bulk buy discount
 **Deployment:**
 - Unit 1: Living room
 - Unit 2: Bedroom
 - Unit 3: Kitchen
 - Unit 4: Office
 ### Mixed Hardware (Experimental)
 **Buy:** 
 - 2x Maix Duino K210 (proven)
 - 1x ESP32-S3-Box (modern)
 - 1x Maix-III (future-proof)
 **Benefit:**
 - Test different platforms
 - Evaluate performance
 - Future-proofing
 **Drawback:**
 - More complex code
 - Different troubleshooting
 - Inconsistent UX
 **Verdict:** ⚠️ Only if you want to experiment
 ---
 ## Budget Options
 ### Ultra-Budget Multi-Room (~$50 total)
 - 2x Generic ESP32 + I2S mic ($10 each = $20)
 - 2x Speakers ($5 each = $10)
 - 2x SD cards ($5 each = $10)
 - Cables ($10)
 - **Total:** ~$50 for 2 units
 **Pros:** Cheap  
 **Cons:** No LCD, DIY assembly, different code
 ---
 ### Mid-Budget Multi-Room (~$100 total)
 - 2x Maix Duino K210 ($35 each = $70)
 - 2x I2S mics ($5 each = $10)
 - 2x Speakers ($5 each = $10)
 - Accessories ($10)
 - **Total:** ~$100 for 2 units
 **Pros:** Proven, consistent, LCD included  
 **Cons:** "Outdated" hardware (doesn't matter for your use)
 ---
 ### Premium Multi-Room (~$200 total)
 - 2x Maix-III AXera-Pi ($70 each = $140)
 - 2x I2S mics ($10 each = $20)
 - 2x Speakers ($10 each = $20)
 - Accessories ($20)
 - **Total:** ~$200 for 2 units
 **Pros:** Future-proof, modern, powerful  
 **Cons:** More expensive, newer = less support
 ---
 ## My Recommendation
 ### For Second Unit: Buy Another Maix Duino K210 ✅
 **Reasoning:**
 1. **Code reuse** - Everything you develop for unit 1 works on unit 2
 2. **Known quantity** - No surprises, you know it works
 3. **Multi-room consistency** - All units behave the same
 4. **Edge wake word ready** - Can upgrade later if desired
 5. **Cost-effective** - ~$40 for full kit with LCD
 6. **Stock available** - Still widely sold despite being "outdated"
 **Where to Buy:**
 - **Best:** AliExpress "Sipeed Official Store" (~$30 + shipping)
 - **Fastest:** Amazon (~$45 with Prime)
 - **Support:** Seeed Studio (~$40 + shipping)
 **What to Order:**
 ```
 Shopping List for Second Unit:
 [ ] 1x Sipeed Maix Duino Kit (board + LCD + camera) - $30-35
 [ ] 1x I2S MEMS microphone (INMP441 or SPH0645) - $5-7
 [ ] 1x Small speaker (3W, 8 ohm) - $5-10
 [ ] 1x MicroSD card (8GB+, Class 10) - $5
 [ ] 1x USB-C cable - $3-5
 [ ] Optional: Enclosure/mounting - $5-10
 Total: ~$50-75 (depending on shipping and options)
 ```
 ---
 ### For Third+ Units: Evaluate
 By the time you're ready for 3rd/4th units:
 - You'll have experience with K210
 - You'll know if you want consistency (more K210s)
 - Or variety (try ESP32-S3 or Maix-III)
 - Maix-III may have better availability
 - Prices may have changed
 **Decision:** Revisit when units 1 and 2 are working
 ---
 ## Future-Proofing Considerations
 ### Will K210 be Supported?
 - **MaixPy:** Still actively maintained for K210
 - **Community:** Large existing user base
 - **Models:** Pre-trained models still work
 - **Lifespan:** Good for 3-5+ years
 **Verdict:** ✅ Safe to buy more K210s now
 ### When to Switch Hardware?
 Consider switching when:
 - [ ] K210 becomes hard to find
 - [ ] You need better performance (edge ML)
 - [ ] Power consumption is critical
 - [ ] New features require newer hardware
 **Timeline:** Probably 2-3 years out
 ---
 ## Special Considerations
 ### Different Rooms, Different Needs?
 **Living Room (Primary):**
 - Needs: Best audio, LCD display, polish
 - **Hardware:** Maix Duino K210 with all features
 **Bedroom (Secondary):**
 - Needs: Simple, no bright LCD at night
 - **Hardware:** Maix Duino K210, disable LCD at night
 **Kitchen (Ambient Noise):**
 - Needs: Better microphone array
 - **Hardware:** Maix Duino K210 + 4-mic array
 **Office (Minimal):**
 - Needs: Cheap, basic audio only
 - **Hardware:** Generic ESP32 + I2S mic
 ### All Same vs Customized?
 **Recommendation:** Start with all same (Maix Duino), customize later if needed.
 ---
 ## Action Plan
 ### This Week
 1. **Order second Maix Duino K210** (~$30-40)
 2. **Order I2S microphone** (~$5-7)
 3. **Order speaker** (~$5-10)
 4. **Order SD card** (~$5)
 **Total Investment:** ~$50-65
 ### Next Month
 1. Wait for delivery (2-4 weeks from AliExpress)
 2. Test unit 1 while waiting
 3. Refine code and setup process
 4. Prepare for unit 2 deployment
 ### In 2-3 Months
 1. Deploy unit 2 (should be easy after unit 1)
 2. Test multi-room
 3. Decide on unit 3/4 based on experience
 4. Consider bulk order if expanding
 ---
 ## Summary
 **Buy for Second Unit:**
 - ✅ **Sipeed Maix Duino K210** (same as first) - ~$35
 - ✅ **I2S MEMS microphone** (INMP441) - ~$5
 - ✅ **Small speaker** (3W, 8 ohm) - ~$8
 - ✅ **MicroSD card** (8GB Class 10) - ~$5
 - ✅ **USB-C cable** - ~$5
 **Total:** ~$60 shipped
 **Why:** Code reuse, consistency, proven solution, future-expandable
 **Where:** AliExpress (cheap) or Amazon (fast)
 **When:** Order now, 2-4 weeks delivery
 **Third+ Units:** Decide after testing 2 units (probably buy more K210s)
 ---
 ## Quick Links
 **Official Sipeed Store (AliExpress):**
 https://sipeed.aliexpress.com/store/1101739727
 **Seeed Studio:**
 https://www.seeedstudio.com/catalogsearch/result/?q=maix+duino
 **Amazon Search:**
 "Sipeed Maix Duino K210"
 **Microphone (Adafruit):**
 https://www.adafruit.com/product/3421
 **Alternative Mic (AliExpress):**
 Search: "INMP441 I2S microphone breakout"
 ---
 **Happy Building! 🏠🎙️**
--- a/docs/K210_PERFORMANCE_VERIFICATION.md
+++ b/docs/K210_PERFORMANCE_VERIFICATION.md
@ -0,0 +1,223 @@
 # K210 Performance Verification for Voice Assistant
 **Date:** 2025-11-29  
 **Source:** https://github.com/sipeed/MaixPy Performance Comparison  
 **Question:** Is K210 suitable for our Mycroft Precise wake word detection project?
 ---
 ## K210 Specifications
 - **Processor:** K210 dual-core RISC-V @ 400MHz
 - **AI Accelerator:** KPU (Neural Network Processor)
 - **SRAM:** 8MB
 - **Status:** Considered "outdated" by Sipeed (2018 release)
 ---
 ## Performance Comparison (from MaixPy GitHub)
 ### YOLOv2 Object Detection
 | Chip | Performance | Notes |
 |------|------------|-------|
 | K210 | 1.8 ms | Limited to older models |
 | V831 | 20-40 ms | More modern, but slower |
 | R329 | N/A | Newer hardware |
 ### Our Use Case: Audio Processing
 **For wake word detection, we need:**
 - Audio input (16kHz, mono) ✅ K210 has I2S
 - Real-time processing ✅ K210 KPU can handle this
 - Network communication ✅ K210 has ESP32 WiFi
 - Low latency (<100ms) ✅ Achievable
 ---
 ## Deployment Strategy Analysis
 ### Option A: Server-Side Wake Word (Recommended)
 **K210 Role:** Audio I/O only
 - Capture audio from I2S microphone ✅ Well supported
 - Stream to Heimdall via WiFi ✅ No problem
 - Receive and play TTS audio ✅ Works fine
 - LED/display feedback ✅ Easy
 **K210 Requirements:** MINIMAL
 - No AI processing needed
 - Simple audio streaming
 - Network communication only
 - **Verdict:** ✅ K210 is MORE than capable
 ### Option B: Edge Wake Word (Future)
 **K210 Role:** Wake word detection on-device
 - Load KMODEL wake word model ⚠️ Needs conversion
 - Run inference on KPU ⚠️ Quantization required
 - Detect wake word locally ⚠️ Possible but limited
 **K210 Limitations:**
 - KMODEL conversion complex (TF→ONNX→KMODEL)
 - Quantization may reduce accuracy (80-90% vs 95%+)
 - Limited to simpler models
 - **Verdict:** ⚠️ Possible but challenging
 ---
 ## Why K210 is PERFECT for Our Project
 ### 1. We're Starting with Server-Side Detection
 - K210 only does audio I/O
 - All AI processing on Heimdall (powerful server)
 - No need for cutting-edge hardware
 - **K210 is ideal for this role**
 ### 2. Audio Processing is Not Computationally Intensive
 Unlike YOLOv2 (60 FPS video processing):
 - Audio: 16kHz sample rate = 16,000 samples/second
 - Wake word: Simple streaming
 - No real-time neural network inference needed (server-side)
 - **K210's "old" specs don't matter**
 ### 3. Edge Detection is Optional (Future Enhancement)
 - We can prove the concept with server-side first
 - Edge detection is a nice-to-have optimization
 - If we need edge later, we can:
  - Use simpler wake word models
  - Accept slightly lower accuracy
  - Or upgrade hardware then
 - **Starting point doesn't require latest hardware**
 ### 4. K210 Advantages We Actually Care About
 - ✅ Well-documented (mature platform)
 - ✅ Stable MaixPy firmware
 - ✅ Large community and examples
 - ✅ Proven audio processing
 - ✅ Already have the hardware!
 - ✅ Cost-effective ($30 vs $100+ newer boards)
 ---
 ## Performance Targets vs K210 Capabilities
 ### What We Need:
 - Audio capture: 16kHz, 1 channel ✅ K210: Easy
 - Audio streaming: ~128 kbps over WiFi ✅ K210: No problem
 - Wake word latency: <200ms ✅ K210: Achievable (server-side)
 - LED feedback: Instant ✅ K210: Trivial
 - Audio playback: 16kHz TTS ✅ K210: Supported
 ### What We DON'T Need (for initial deployment):
 - ❌ Real-time video processing
 - ❌ Complex neural networks on device
 - ❌ Multi-model inference
 - ❌ High-resolution image processing
 - ❌ Latest and greatest AI accelerator
 ---
 ## Comparison to Alternatives
 ### If we bought newer hardware:
 **V831 ($50-70):**
 - Pros: Newer, better supported
 - Cons: 
  - More expensive
  - SLOWER at neural networks than K210
  - Still need server for Whisper anyway
  - Overkill for audio I/O
 **ESP32-S3 ($10-20):**
 - Pros: Cheap, WiFi built-in
 - Cons:
  - No KPU (if we want edge detection later)
  - Less capable for ML
  - Would work for server-side though
 **Raspberry Pi Zero 2 W ($15):**
 - Pros: Full Linux, familiar
 - Cons:
  - No dedicated audio hardware
  - No neural accelerator
  - More power hungry
  - Overkill for our needs
 **Verdict:** K210 is actually the sweet spot for this project!
 ---
 ## Real-World Comparison
 ### What K210 CAN Do (Proven):
 - Audio classification ✅
 - Simple keyword spotting ✅
 - Voice activity detection ✅
 - Audio streaming ✅
 - Multi-microphone beamforming ✅
 ### What We're Asking It To Do:
 - Stream audio to server ✅ Much easier
 - (Optional future) Simple wake word detection ✅ Proven capability
 ---
 ## Recommendation: Proceed with K210
 ### Phase 1: Server-Side (Now)
 K210 role: Audio I/O device
 - **Difficulty:** Easy
 - **Performance:** Excellent
 - **K210 utilization:** ~10-20%
 - **Status:** No concerns whatsoever
 ### Phase 2: Edge Detection (Future)
 K210 role: Wake word detection + audio I/O
 - **Difficulty:** Moderate (model conversion)
 - **Performance:** Good enough (80-90% accuracy)
 - **K210 utilization:** ~30-40%
 - **Status:** Feasible, community has done it
 ---
 ## Conclusion
 **Is K210 outdated?** Yes, for cutting-edge ML applications.
 **Is K210 suitable for our project?** ABSOLUTELY YES!
 **Why:**
 1. We're using server-side processing (K210 just streams audio)
 2. K210's audio capabilities are excellent
 3. Mature platform = more examples and stability
 4. Already have the hardware
 5. Cost-effective
 6. Can optionally upgrade to edge detection later
 **The "outdated" warning is for people wanting latest ML performance. We're using it as an audio I/O device with WiFi - it's perfect for that!**
 ---
 ## Additional Notes
 ### From MaixPy GitHub Warning:
 > "We now recommend users choose the MaixCAM ... For 2018 K210 ... limited performance"
 **Our Response:**
 - We don't need 2024 performance for audio streaming
 - Server does the heavy lifting (Heimdall with NVIDIA GPU)
 - K210 mature platform is actually an advantage
 - If we need more later, we can upgrade edge device while keeping server
 ### Community Validation:
 Many Mycroft Precise + K210 projects exist:
 - Audio streaming: Proven ✅
 - Edge wake word: Proven ✅
 - Full voice assistant: Proven ✅
 **The K210 is "outdated" for video/vision ML, not for audio projects.**
 ---
 **Final Verdict:** ✅ PROCEED WITH CONFIDENCE
 The K210 is perfect for our use case. Ignore the "outdated" warning - that's for people doing real-time video processing or wanting the latest ML features. For a voice assistant where the heavy lifting happens server-side, the K210 is an excellent, mature, cost-effective choice!
--- a/docs/LCD_CAMERA_FEATURES.md
+++ b/docs/LCD_CAMERA_FEATURES.md
@ -0,0 +1,566 @@
 # Maix Duino LCD & Camera Feature Analysis
 **Date:** 2025-11-29  
 **Hardware:** Sipeed Maix Duino (K210)  
 **Question:** What's the overhead for using LCD display and camera?
 ---
 ## Hardware Capabilities
 ### LCD Display
 - **Resolution:** Typically 320x240 or 240x135 (depending on model)
 - **Interface:** SPI
 - **Color:** RGB565 (16-bit color)
 - **Frame Rate:** Up to 60 FPS (limited by SPI bandwidth)
 - **Status:** ✅ Included with most Maix Duino kits
 ### Camera
 - **Resolution:** Various (OV2640 common: 2MP, up to 1600x1200)
 - **Interface:** DVP (Digital Video Port)
 - **Frame Rate:** Up to 60 FPS (lower at high resolution)
 - **Status:** ✅ Often included with Maix Duino kits
 ### K210 Resources
 - **CPU:** Dual-core RISC-V @ 400MHz
 - **KPU:** Neural network accelerator
 - **SRAM:** 8MB total (6MB available for apps)
 - **Flash:** 16MB
 ---
 ## LCD Usage for Voice Assistant
 ### Use Case 1: Status Display (Minimal Overhead)
 **What to Show:**
 - Current state (idle/listening/processing/responding)
 - Wake word detected indicator
 - WiFi status and signal strength
 - Server connection status
 - Volume level
 - Time/date
 **Overhead:**
 - **CPU:** ~2-5% (simple text/icons)
 - **RAM:** ~200KB (framebuffer + assets)
 - **Power:** ~50mW additional
 - **Complexity:** Low (MaixPy has built-in LCD support)
 **Code Example:**
 ```python
 import lcd
 import image
 lcd.init()
 lcd.rotation(2)  # Rotate if needed
 # Simple status display
 img = image.Image(size=(320, 240))
 img.draw_string(10, 10, "Listening...", color=(0, 255, 0), scale=3)
 img.draw_circle(300, 20, 10, color=(0, 255, 0), fill=True)  # Status LED
 lcd.display(img)
 ```
 **Verdict:** ✅ **Very Low Overhead - Highly Recommended**
 ---
 ### Use Case 2: Audio Waveform Visualizer (Moderate Overhead)
 #### Input Waveform (Microphone)
 **What to Show:**
 - Real-time audio level meter
 - Waveform display (oscilloscope style)
 - VU meter
 - Frequency spectrum (simple bars)
 **Overhead:**
 - **CPU:** ~10-15% (real-time drawing)
 - **RAM:** ~300KB (framebuffer + audio buffer)
 - **Frame Rate:** 15-30 FPS (sufficient for audio visualization)
 - **Complexity:** Moderate (drawing primitives + FFT)
 **Implementation:**
 ```python
 import lcd, audio, image
 import array
 lcd.init()
 audio.init()
 def draw_waveform(audio_buffer):
    img = image.Image(size=(320, 240))
    # Draw waveform
    width = 320
    height = 240
    center = height // 2
    # Sample every Nth point to fit on screen
    step = len(audio_buffer) // width
    for x in range(width - 1):
        y1 = center + (audio_buffer[x * step] // 256)
        y2 = center + (audio_buffer[(x + 1) * step] // 256)
        img.draw_line(x, y1, x + 1, y2, color=(0, 255, 0))
    # Add level meter
    level = max(abs(min(audio_buffer)), abs(max(audio_buffer)))
    bar_height = (level * height) // 32768
    img.draw_rectangle(0, height - bar_height, 20, bar_height, 
                       color=(0, 255, 0), fill=True)
    lcd.display(img)
 ```
 **Verdict:** ✅ **Moderate Overhead - Feasible and Cool!**
 ---
 #### Output Waveform (TTS Response)
 **What to Show:**
 - TTS audio being played back
 - Speaking animation (mouth/sound waves)
 - Response text scrolling
 **Overhead:**
 - **CPU:** ~10-15% (similar to input)
 - **RAM:** ~300KB
 - **Complexity:** Moderate
 **Note:** Can reuse same visualization code as input waveform.
 **Verdict:** ✅ **Same as Input - Totally Doable**
 ---
 ### Use Case 3: Spectrum Analyzer (Higher Overhead)
 **What to Show:**
 - Frequency bars (FFT visualization)
 - 8-16 frequency bands
 - Classic "equalizer" look
 **Overhead:**
 - **CPU:** ~20-30% (FFT computation + drawing)
 - **RAM:** ~500KB (FFT buffers + framebuffer)
 - **Complexity:** Moderate-High (FFT required)
 **Implementation Note:**
 - K210 KPU can accelerate FFT operations
 - Can do simple 8-band analysis with minimal CPU
 - More bands = more CPU
 **Verdict:** ⚠️ **Higher Overhead - Use Sparingly**
 ---
 ### Use Case 4: Interactive UI (High Overhead)
 **What to Show:**
 - Touchscreen controls (if touchscreen available)
 - Settings menu
 - Volume slider
 - Wake word selection
 - Network configuration
 **Overhead:**
 - **CPU:** ~20-40% (touch detection + UI rendering)
 - **RAM:** ~1MB (UI framework + assets)
 - **Complexity:** High (need UI framework)
 **Verdict:** ⚠️ **High Overhead - Nice-to-Have Later**
 ---
 ## Camera Usage for Voice Assistant
 ### Use Case 1: Person Detection (Wake on Face)
 **What to Do:**
 - Detect person in frame
 - Only listen when someone present
 - Privacy mode: disable when no one around
 **Overhead:**
 - **CPU:** ~30-40% (KPU handles inference)
 - **RAM:** ~1.5MB (model + frame buffers)
 - **Power:** ~200mW additional
 - **Complexity:** Moderate (pre-trained models available)
 **Pros:**
 - ✅ Privacy enhancement (only listen when occupied)
 - ✅ Power saving (sleep when empty room)
 - ✅ Pre-trained models available for K210
 **Cons:**
 - ❌ Adds latency (check camera before listening)
 - ❌ Privacy concerns (camera always on)
 - ❌ Moderate resource usage
 **Verdict:** 🤔 **Interesting but Complex - Phase 2+**
 ---
 ### Use Case 2: Visual Context (Future AI Integration)
 **What to Do:**
 - "What am I holding?" queries
 - Visual scene understanding
 - QR code scanning
 - Gesture control
 **Overhead:**
 - **CPU:** 40-60% (vision processing)
 - **RAM:** 2-3MB (models + buffers)
 - **Complexity:** High (requires vision models)
 **Verdict:** ❌ **Too Complex for Initial Release - Future Feature**
 ---
 ### Use Case 3: Visual Wake Word (Gesture Detection)
 **What to Do:**
 - Wave hand to activate
 - Thumbs up/down for feedback
 - Alternative to voice wake word
 **Overhead:**
 - **CPU:** ~30-40% (gesture detection)
 - **RAM:** ~1.5MB
 - **Complexity:** Moderate-High
 **Verdict:** 🤔 **Novel Idea - Phase 3+**
 ---
 ## Recommended LCD Implementation
 ### Phase 1: Basic Status Display (Recommended NOW)
 ```
 ┌─────────────────────────┐
 │  Voice Assistant        │
 │                         │
 │  Status: Listening  ●   │
 │  WiFi: ████░░  75%      │
 │  Server: Connected      │
 │                         │
 │  Volume: [██████░░░]    │
 │                         │
 │  Time: 14:23            │
 └─────────────────────────┘
 ```
 **Features:**
 - Current state indicator
 - WiFi signal strength
 - Server connection status
 - Volume level bar
 - Clock
 - Wake word indicator (pulsing circle)
 **Overhead:** ~2-5% CPU, 200KB RAM
 ---
 ### Phase 2: Waveform Visualization (Cool Addition)
 ```
 ┌─────────────────────────┐
 │ Listening...       [●]  │
 ├─────────────────────────┤
 │  ╱╲  ╱╲    ╱╲  ╱╲      │
 │ ╱  ╲╱  ╲  ╱  ╲╱  ╲     │
 │                         │
 │ Level: [████░░░░░░]     │
 └─────────────────────────┘
 ```
 **Features:**
 - Real-time waveform (15-30 FPS)
 - Audio level meter
 - State indicator
 - Simple and clean
 **Overhead:** ~10-15% CPU, 300KB RAM
 ---
 ### Phase 3: Enhanced Visualizer (Polish)
 ```
 ┌─────────────────────────┐
 │ Hey Computer!      [●]  │
 ├─────────────────────────┤
 │ ▁▂▃▄▅▆▇█ ▁▂▃▄▅▆▇█      │
 │ ▁▂▃▄▅▆▇█ ▁▂▃▄▅▆▇█      │
 │                         │
 │ "Turn off the lights"   │
 └─────────────────────────┘
 ```
 **Features:**
 - Spectrum analyzer (8-16 bands)
 - Transcription display
 - Animated response
 - More polished UI
 **Overhead:** ~20-30% CPU, 500KB RAM
 ---
 ## Resource Budget Analysis
 ### Total K210 Resources
 - **CPU:** 2 cores @ 400MHz (assume ~100% available)
 - **RAM:** 6MB available for app
 - **Bandwidth:** SPI (LCD), I2S (audio), WiFi
 ### Current Voice Assistant Usage (Server-Side Wake Word)
 | Component | CPU % | RAM (KB) |
 |-----------|-------|----------|
 | Audio Capture (I2S) | 5% | 128 |
 | Audio Playback | 5% | 128 |
 | WiFi Streaming | 10% | 256 |
 | Network Stack | 5% | 512 |
 | MaixPy Runtime | 10% | 1024 |
 | **Base Total** | **35%** | **~2MB** |
 ### With LCD Features
 | Display Mode | CPU % | RAM (KB) | Total CPU | Total RAM |
 |--------------|-------|----------|-----------|-----------|
 | **None** | 0% | 0 | 35% | 2MB |
 | **Status Only** | 2-5% | 200 | 37-40% | 2.2MB |
 | **Waveform** | 10-15% | 300 | 45-50% | 2.3MB |
 | **Spectrum** | 20-30% | 500 | 55-65% | 2.5MB |
 ### With Camera Features
 | Feature | CPU % | RAM (KB) | Feasible? |
 |---------|-------|----------|-----------|
 | Person Detection | 30-40% | 1500 | ⚠️ Tight |
 | Gesture Control | 30-40% | 1500 | ⚠️ Tight |
 | Visual Context | 40-60% | 2500 | ❌ Too much |
 ---
 ## Recommendations
 ### ✅ IMPLEMENT NOW: Basic Status Display
 - **Why:** Very low overhead, huge UX improvement
 - **Overhead:** 2-5% CPU, 200KB RAM
 - **Benefit:** Users know what's happening at a glance
 - **Difficulty:** Easy (MaixPy has good LCD support)
 ### ✅ IMPLEMENT SOON: Waveform Visualizer
 - **Why:** Cool factor, moderate overhead
 - **Overhead:** 10-15% CPU, 300KB RAM
 - **Benefit:** Engaging, confirms mic is working, looks professional
 - **Difficulty:** Moderate (simple drawing code)
 ### 🤔 CONSIDER LATER: Spectrum Analyzer
 - **Why:** Higher overhead, diminishing returns
 - **Overhead:** 20-30% CPU, 500KB RAM
 - **Benefit:** Looks cool but not essential
 - **Difficulty:** Moderate-High (FFT required)
 ### ❌ SKIP FOR NOW: Camera Features
 - **Why:** High overhead, complex, privacy concerns
 - **Overhead:** 30-60% CPU, 1.5-2.5MB RAM
 - **Benefit:** Novel but not core functionality
 - **Difficulty:** High (model integration, privacy handling)
 ---
 ## Implementation Priority
 ### Phase 1 (Week 1): Core Functionality
 - [x] Audio capture and streaming
 - [x] Server integration
 - [ ] Basic LCD status display
  - Idle/Listening/Processing states
  - WiFi status
  - Connection indicator
 ### Phase 2 (Week 2-3): Visual Enhancement
 - [ ] Audio waveform visualizer
  - Input (microphone) waveform
  - Output (TTS) waveform
  - Level meters
  - Clean, minimal design
 ### Phase 3 (Month 2): Polish
 - [ ] Spectrum analyzer option
 - [ ] Animated transitions
 - [ ] Settings display
 - [ ] Network configuration UI (optional)
 ### Phase 4 (Month 3+): Advanced Features
 - [ ] Camera person detection (privacy mode)
 - [ ] Gesture control experiments
 - [ ] Visual wake word alternative
 ---
 ## Code Structure Recommendation
 ```python
 # main.py structure with modular display
 import lcd, audio, network
 from display_manager import DisplayManager
 from audio_processor import AudioProcessor
 from voice_client import VoiceClient
 # Initialize
 lcd.init()
 display = DisplayManager(mode='waveform')  # or 'status' or 'spectrum'
 # Main loop
 while True:
    # Audio processing
    audio_buffer = audio.capture()
    # Update display (non-blocking)
    if display.mode == 'status':
        display.show_status(state='listening', wifi_level=75)
    elif display.mode == 'waveform':
        display.show_waveform(audio_buffer)
    elif display.mode == 'spectrum':
        display.show_spectrum(audio_buffer)
    # Network communication
    voice_client.stream_audio(audio_buffer)
 ```
 ---
 ## Measured Overhead (Estimated)
 ### Status Display Only
 - **CPU:** 38% total (3% for display)
 - **RAM:** 2.2MB total (200KB for display)
 - **Battery Life:** -2% (minimal impact)
 - **WiFi Latency:** No impact
 - **Verdict:** ✅ Negligible impact, worth it!
 ### Waveform Visualizer
 - **CPU:** 48% total (13% for display)
 - **RAM:** 2.3MB total (300KB for display)
 - **Battery Life:** -5% (minor impact)
 - **WiFi Latency:** No impact (still <200ms)
 - **Verdict:** ✅ Acceptable, looks great!
 ### Spectrum Analyzer
 - **CPU:** 60% total (25% for display)
 - **RAM:** 2.5MB total (500KB for display)
 - **Battery Life:** -8% (noticeable)
 - **WiFi Latency:** Possible minor impact
 - **Verdict:** ⚠️ Usable but pushing limits
 ---
 ## Camera: Should You Use It?
 ### Pros
 - ✅ Already have the hardware (free!)
 - ✅ Novel features (person detection, gestures)
 - ✅ Privacy enhancement potential
 - ✅ Future-proofing
 ### Cons
 - ❌ High resource usage (30-60% CPU, 1.5-2.5MB RAM)
 - ❌ Complex implementation
 - ❌ Privacy concerns (camera always on)
 - ❌ Not core to voice assistant
 - ❌ Competes with audio processing resources
 ### Recommendation
 **Skip camera for initial implementation.** Focus on core voice assistant functionality. Revisit in Phase 3+ when:
 1. Core features are stable
 2. You want to experiment
 3. You have time for optimization
 4. You want to differentiate from commercial assistants
 ---
 ## Final Recommendations
 ### Start With (NOW):
 ```python
 # Simple status display
 # - State indicator
 # - WiFi status  
 # - Connection status
 # - Time/date
 # Overhead: ~3% CPU, 200KB RAM
 ```
 ### Add Next (Week 2):
 ```python
 # Waveform visualizer
 # - Real-time audio waveform
 # - Level meter
 # - Clean design
 # Overhead: +10% CPU, +100KB RAM
 ```
 ### Maybe Later (Month 2+):
 ```python
 # Spectrum analyzer
 # - 8-16 frequency bands
 # - FFT visualization
 # - Optional mode
 # Overhead: +15% CPU, +200KB RAM
 ```
 ### Skip (For Now):
 ```python
 # Camera features
 # - Person detection
 # - Gestures
 # - Visual context
 # Too complex, revisit later
 ```
 ---
 ## Example: Combined Status + Waveform Display
 ```
 ┌───────────────────────────────┐
 │ Voice Assistant    [LISTENING]│
 ├───────────────────────────────┤
 │                               │
 │  ╱╲    ╱╲  ╱╲    ╱╲  ╱╲      │
 │ ╱  ╲  ╱  ╲╱  ╲  ╱  ╲╱  ╲     │
 │      ╲╱          ╲╱           │
 │                               │
 │ Vol: [████████░░] WiFi: ▂▃▅█ │
 │                               │
 │ Server: 10.1.10.71 ● 14:23   │
 └───────────────────────────────┘
 ```
 **Total Overhead:** ~15% CPU, 300KB RAM  
 **Impact:** Minimal, excellent UX improvement  
 **Coolness Factor:** 9/10  
 ---
 ## Conclusion
 ### LCD: YES! Definitely Use It! ✅
 - **Status display:** Low overhead, huge benefit
 - **Waveform:** Moderate overhead, looks amazing
 - **Spectrum:** Higher overhead, nice-to-have
 **Recommendation:** Start with status, add waveform, consider spectrum later.
 ### Camera: Skip For Now ❌
 - High overhead
 - Complex implementation
 - Not core functionality
 - Revisit in Phase 3+
 **Focus on nailing the voice assistant first, then add visual features incrementally!**
 ---
 **TL;DR:** Use the LCD for status + waveform visualization (~15% overhead total). Skip the camera for now. Your K210 can easily handle this! 🎉
--- a/docs/MYCROFT_PRECISE_GUIDE.md
+++ b/docs/MYCROFT_PRECISE_GUIDE.md
@ -0,0 +1,638 @@
 # Mycroft Precise Wake Word Training Guide
 ## Overview
 Mycroft Precise is a neural network-based wake word detector that you can train on custom wake words. This guide covers two deployment approaches for your Maix Duino voice assistant:
 1. **Server-side detection** (Recommended to start) - Run Precise on Heimdall
 2. **Edge detection** (Advanced) - Convert model for K210 on Maix Duino
 ## Architecture Options
 ### Option A: Server-Side Wake Word Detection (Recommended)
 ```
 Maix Duino                          Heimdall
 ┌─────────────────┐                ┌──────────────────────┐
 │ Continuous      │ Audio Stream   │ Mycroft Precise      │
 │ Audio Capture   │───────────────>│ Wake Word Detection  │
 │                 │                │                       │
 │ LED Feedback    │<───────────────│ Whisper STT          │
 │ Speaker Output  │   Response     │ HA Integration       │
 │                 │                │ Piper TTS            │
 └─────────────────┘                └──────────────────────┘
 ```
 **Pros:**
 - Easier setup and debugging
 - Better accuracy (more compute available)
 - Easy to retrain and update models
 - Can use ensemble models
 **Cons:**
 - Continuous audio streaming (bandwidth)
 - Slightly higher latency (~100-200ms)
 - Requires stable network
 ### Option B: Edge Detection on Maix Duino (Advanced)
 ```
 Maix Duino                          Heimdall
 ┌─────────────────┐                ┌──────────────────────┐
 │ Precise Model   │                │                       │
 │ (K210 KPU)      │                │                       │
 │ Wake Detection  │ Audio (on wake)│ Whisper STT          │
 │                 │───────────────>│ HA Integration       │
 │ Audio Capture   │                │ Piper TTS            │
 │ LED Feedback    │<───────────────│                      │
 └─────────────────┘   Response     └──────────────────────┘
 ```
 **Pros:**
 - Lower latency (~50ms wake detection)
 - Less network traffic
 - Works even if server is down
 - Better privacy (no continuous streaming)
 **Cons:**
 - Complex model conversion (TensorFlow → ONNX → KMODEL)
 - Limited by K210 compute
 - Harder to update models
 - Requires careful optimization
 ## Recommended Approach: Start with Server-Side
 Begin with server-side detection on Heimdall, then optimize to edge detection once everything works.
 ## Phase 1: Mycroft Precise Setup on Heimdall
 ### Install Mycroft Precise
 ```bash
 # SSH to Heimdall
 ssh alan@10.1.10.71
 # Create conda environment for Precise
 conda create -n precise python=3.7 -y
 conda activate precise
 # Install TensorFlow 1.x (Precise requires this)
 pip install tensorflow==1.15.5 --break-system-packages
 # Install Precise
 pip install mycroft-precise --break-system-packages
 # Install audio dependencies
 sudo apt-get install -y portaudio19-dev sox libatlas-base-dev
 # Install precise-engine (for faster inference)
 wget https://github.com/MycroftAI/mycroft-precise/releases/download/v0.3.0/precise-engine_0.3.0_x86_64.tar.gz
 tar xvf precise-engine_0.3.0_x86_64.tar.gz
 sudo cp precise-engine/precise-engine /usr/local/bin/
 sudo chmod +x /usr/local/bin/precise-engine
 ```
 ### Verify Installation
 ```bash
 precise-engine --version
 # Should output: Precise v0.3.0
 precise-listen --help
 # Should show help text
 ```
 ## Phase 2: Training Your Custom Wake Word
 ### Step 1: Collect Wake Word Samples
 You'll need ~50-100 samples of your wake word. Choose something:
 - 2-3 syllables long
 - Easy to pronounce
 - Unlikely to occur in normal speech
 Example wake words:
 - "Hey Computer" (recommended - similar to commercial products)
 - "Okay Jarvis"
 - "Hello Assistant"
 - "Activate Assistant"
 ```bash
 # Create project directory
 mkdir -p ~/precise-models/hey-computer
 cd ~/precise-models/hey-computer
 # Record wake word samples
 precise-collect
 ```
 When prompted:
 1. Type your wake word ("hey computer")
 2. Press SPACE to record
 3. Say the wake word clearly
 4. Press SPACE to stop
 5. Repeat 50-100 times
 **Tips for good samples:**
 - Vary your tone and speed
 - Different distances from mic
 - Different background noise levels
 - Different pronunciations
 - Have family members record too
 ### Step 2: Collect "Not Wake Word" Samples
 Record background audio and similar-sounding phrases:
 ```bash
 # Create not-wake-word directory
 mkdir -p not-wake-word
 # Record random speech, music, TV, etc.
 # These help the model learn what NOT to trigger on
 precise-collect -f not-wake-word/random.wav
 ```
 Collect ~200-500 samples of:
 - Normal conversation
 - TV/music in background
 - Similar sounding phrases ("hey commuter", "they computed", etc.)
 - Ambient noise
 - Other household sounds
 ### Step 3: Generate Training Data
 ```bash
 # Organize samples
 mkdir -p hey-computer/{wake-word,not-wake-word,test/wake-word,test/not-wake-word}
 # Split samples (80% train, 20% test)
 # Move 80% of wake-word samples to hey-computer/wake-word/
 # Move 20% to hey-computer/test/wake-word/
 # Move 80% of not-wake-word to hey-computer/not-wake-word/
 # Move 20% to hey-computer/test/not-wake-word/
 # Generate training data
 precise-train-incremental hey-computer.net hey-computer/
 ```
 ### Step 4: Train the Model
 ```bash
 # Basic training (will take 30-60 minutes)
 precise-train -e 60 hey-computer.net hey-computer/
 # For better accuracy, train longer
 precise-train -e 120 hey-computer.net hey-computer/
 # Watch for overfitting - validation loss should decrease
 # Stop if validation loss starts increasing
 ```
 Training output will show:
 ```
 Epoch 1/60
 loss: 0.4523 - val_loss: 0.3891
 Epoch 2/60
 loss: 0.3102 - val_loss: 0.2845
 ...
 ```
 ### Step 5: Test the Model
 ```bash
 # Test with microphone
 precise-listen hey-computer.net
 # Speak your wake word - should see "!" when detected
 # Speak other phrases - should not trigger
 # Test with audio files
 precise-test hey-computer.net hey-computer/test/
 # Should show accuracy metrics:
 # Wake word accuracy: 95%+
 # False positive rate: <5%
 ```
 ### Step 6: Optimize Sensitivity
 ```bash
 # Adjust activation threshold
 precise-listen hey-computer.net -t 0.5   # Default
 precise-listen hey-computer.net -t 0.7   # More conservative
 precise-listen hey-computer.net -t 0.3   # More aggressive
 # Find optimal threshold for your use case
 # Higher = fewer false positives, more false negatives
 # Lower = more false positives, fewer false negatives
 ```
 ## Phase 3: Integration with Voice Server
 ### Update voice_server.py
 Add Mycroft Precise support to the server:
 ```python
 # Add to imports
 from precise_runner import PreciseEngine, PreciseRunner
 import pyaudio
 # Add to configuration
 PRECISE_MODEL = os.getenv("PRECISE_MODEL", 
                          "/home/alan/precise-models/hey-computer.net")
 PRECISE_SENSITIVITY = float(os.getenv("PRECISE_SENSITIVITY", "0.5"))
 # Global precise runner
 precise_runner = None
 def on_activation():
    """Called when wake word is detected"""
    print("Wake word detected!")
    # Trigger recording and processing
    # (Implementation depends on your audio streaming setup)
 def start_precise_listener():
    """Start Mycroft Precise wake word detection"""
    global precise_runner
    engine = PreciseEngine(
        '/usr/local/bin/precise-engine',
        PRECISE_MODEL
    )
    precise_runner = PreciseRunner(
        engine,
        sensitivity=PRECISE_SENSITIVITY,
        on_activation=on_activation
    )
    precise_runner.start()
    print(f"Precise listening with model: {PRECISE_MODEL}")
 ```
 ### Server-Side Wake Word Detection Architecture
 For server-side detection, you need continuous audio streaming from Maix Duino:
 ```python
 # New endpoint for audio streaming
@app.route('/stream', methods=['POST'])
 def stream_audio():
    """
    Receive continuous audio stream for wake word detection
    This endpoint processes incoming audio chunks and runs them
    through Mycroft Precise for wake word detection.
    """
    # Implementation here
    pass
 ```
 ## Phase 4: Maix Duino Integration (Server-Side Detection)
 ### Update maix_voice_client.py
 For server-side detection, stream audio continuously:
 ```python
 # Add to configuration
 STREAM_ENDPOINT = "/stream"
 WAKE_WORD_CHECK_INTERVAL = 0.1  # Check every 100ms
 def stream_audio_continuous():
    """
    Stream audio to server for wake word detection
    Server will notify us when wake word is detected
    """
    import socket
    import struct
    # Create socket connection
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    server_addr = (VOICE_SERVER_URL.replace('http://', '').split(':')[0], 8888)
    try:
        sock.connect(server_addr)
        print("Connected to wake word server")
        while True:
            # Capture audio chunk
            chunk = i2s_dev.record(CHUNK_SIZE)
            if chunk:
                # Send chunk size first, then chunk
                sock.sendall(struct.pack('>I', len(chunk)))
                sock.sendall(chunk)
            # Check for wake word detection signal
            # (simplified - actual implementation needs non-blocking socket)
            time.sleep(0.01)
    except Exception as e:
        print(f"Streaming error: {e}")
    finally:
        sock.close()
 ```
 ## Phase 5: Edge Detection on Maix Duino (Advanced)
 ### Convert Precise Model to KMODEL
 This is complex and requires several conversion steps:
 ```bash
 # Step 1: Convert TensorFlow model to ONNX
 pip install tf2onnx --break-system-packages
 python -m tf2onnx.convert \
    --saved-model hey-computer.net \
    --output hey-computer.onnx
 # Step 2: Optimize ONNX model
 pip install onnx --break-system-packages
 python -c "
 import onnx
 from onnx import optimizer
 model = onnx.load('hey-computer.onnx')
 passes = ['eliminate_deadend', 'eliminate_identity', 
          'eliminate_nop_dropout', 'eliminate_nop_pad']
 optimized = optimizer.optimize(model, passes)
 onnx.save(optimized, 'hey-computer-opt.onnx')
 "
 # Step 3: Convert ONNX to KMODEL (for K210)
 # Use nncase (https://github.com/kendryte/nncase)
 # This step is hardware-specific and complex
 # Install nncase
 pip install nncase --break-system-packages
 # Convert (adjust parameters based on your model)
 ncc compile hey-computer-opt.onnx \
    -i onnx \
    --dataset calibration_data \
    -o hey-computer.kmodel \
    --target k210
 ```
 **Note:** KMODEL conversion is non-trivial and may require model architecture adjustments. The K210 has limitations:
 - Max model size: ~6MB
 - Limited operators support
 - Quantization required for performance
 ### Testing KMODEL on Maix Duino
 ```python
 # Load model in maix_voice_client.py
 import KPU as kpu
 def load_wake_word_model_kmodel():
    """Load converted KMODEL for wake word detection"""
    global kpu_task
    try:
        kpu_task = kpu.load("/sd/models/hey-computer.kmodel")
        print("Wake word model loaded on K210")
        return True
    except Exception as e:
        print(f"Failed to load model: {e}")
        return False
 def detect_wake_word_kmodel():
    """Run wake word detection using K210 KPU"""
    global kpu_task
    # Capture audio
    audio_chunk = i2s_dev.record(CHUNK_SIZE)
    # Preprocess for model (depends on model input format)
    # This is model-specific - adjust based on your training
    # Run inference
    features = preprocess_audio(audio_chunk)
    output = kpu.run_yolo2(kpu_task, features)  # Adjust based on model type
    # Check confidence
    if output[0] > WAKE_WORD_THRESHOLD:
        return True
    return False
 ```
 ## Recommended Wake Words
 Based on testing and community feedback:
 **Best performers:**
 1. "Hey Computer" - Clear, distinct, 2-syllable, hard consonants
 2. "Okay Jarvis" - Pop culture reference, easy to say
 3. "Hey Mycroft" - Original Mycroft wake word (lots of training data available)
 **Avoid:**
 - Single syllable words (too easy to trigger)
 - Common phrases ("okay", "hey there")
 - Names of people in your household
 - Words that sound like common speech patterns
 ## Training Tips
 ### For Best Accuracy
 1. **Diverse training data:**
   - Multiple speakers
   - Various distances (1ft to 15ft)
   - Different noise conditions
   - Accent variations
 2. **Quality over quantity:**
   - 50 good samples > 200 poor samples
   - Clear pronunciation
   - Consistent volume
 3. **Hard negatives:**
   - Include similar-sounding phrases
   - Include partial wake words
   - Include common false triggers you notice
 4. **Regular retraining:**
   - Add false positives to training set
   - Add missed detections
   - Retrain every few weeks initially
 ### Collecting Hard Negatives
 ```bash
 # Run Precise in test mode and collect false positives
 precise-listen hey-computer.net --save-false-positives
 # This will save audio clips when model triggers incorrectly
 # Add these to your not-wake-word training set
 # Retrain to reduce false positives
 ```
 ## Performance Benchmarks
 ### Server-Side Detection (Heimdall)
 - **Latency:** 100-200ms from utterance to detection
 - **Accuracy:** 95%+ with good training
 - **False positive rate:** <1 per hour with tuning
 - **CPU usage:** ~5-10% (single core)
 - **Network:** ~128kbps continuous stream
 ### Edge Detection (Maix Duino)
 - **Latency:** 50-100ms
 - **Accuracy:** 80-90% (limited by K210 quantization)
 - **False positive rate:** Varies by model optimization
 - **CPU usage:** ~30% K210 (leaves room for other tasks)
 - **Network:** 0 until wake detected
 ## Monitoring and Debugging
 ### Log Wake Word Detections
 ```python
 # Add to voice_server.py
 import datetime
 def log_wake_word(confidence, timestamp=None):
    """Log wake word detections for analysis"""
    if timestamp is None:
        timestamp = datetime.datetime.now()
    log_file = "/home/alan/voice-assistant/logs/wake_words.log"
    with open(log_file, 'a') as f:
        f.write(f"{timestamp.isoformat()},{confidence}\n")
 ```
 ### Analyze False Positives
 ```bash
 # Check wake word log
 tail -f ~/voice-assistant/logs/wake_words.log
 # Find patterns in false positives
 grep "wake_word" ~/voice-assistant/logs/wake_words.log | \
    awk -F',' '{print $2}' | \
    sort -n | uniq -c
 ```
 ## Production Deployment
 ### Systemd Service with Precise
 Update the systemd service to include Precise:
 ```ini
 [Unit]
 Description=Voice Assistant with Wake Word Detection
 After=network.target
 [Service]
 Type=simple
 User=alan
 WorkingDirectory=/home/alan/voice-assistant
 Environment="PATH=/home/alan/miniconda3/envs/precise/bin:/usr/local/bin:/usr/bin:/bin"
 EnvironmentFile=/home/alan/voice-assistant/config/.env
 ExecStart=/home/alan/miniconda3/envs/precise/bin/python voice_server.py --enable-precise
 Restart=on-failure
 RestartSec=10
 [Install]
 WantedBy=multi-user.target
 ```
 ## Troubleshooting
 ### Precise Won't Start
 ```bash
 # Check TensorFlow version
 python -c "import tensorflow as tf; print(tf.__version__)"
 # Should be 1.15.x
 # Check model file
 file hey-computer.net
 # Should be "TensorFlow SavedModel"
 # Test model directly
 precise-engine hey-computer.net
 # Should load without errors
 ```
 ### Low Accuracy
 1. **Collect more training data** - Especially hard negatives
 2. **Increase training epochs** - Try 200-300 epochs
 3. **Verify training/test split** - Should be 80/20
 4. **Check audio quality** - Sample rate should match (16kHz)
 5. **Try different wake words** - Some are easier to detect
 ### High False Positive Rate
 1. **Increase threshold** - Try 0.6, 0.7, 0.8
 2. **Add false positives to training** - Retrain with false triggers
 3. **Collect more negative samples** - Expand not-wake-word set
 4. **Use ensemble models** - Run multiple models, require agreement
 ### KMODEL Conversion Fails
 This is expected - K210 conversion is complex:
 1. **Simplify model architecture** - Reduce layer count
 2. **Use quantization-aware training** - Train with quantization in mind
 3. **Check operator support** - K210 doesn't support all TF ops
 4. **Consider alternatives:**
   - Use pre-trained models for K210
   - Stick with server-side detection
   - Use Porcupine instead (has K210 support)
 ## Alternative: Use Pre-trained Models
 Mycroft provides some pre-trained models:
 ```bash
 # Download Hey Mycroft model
 wget https://github.com/MycroftAI/precise-data/raw/models-dev/hey-mycroft.tar.gz
 tar xzf hey-mycroft.tar.gz
 # Test it
 precise-listen hey-mycroft.net
 ```
 Then train your own wake word starting from this base:
 ```bash
 # Fine-tune from pre-trained model
 precise-train -e 60 my-wake-word.net my-wake-word/ \
    --from-checkpoint hey-mycroft.net
 ```
 ## Next Steps
 1. **Start with server-side** - Get it working on Heimdall first
 2. **Collect good training data** - Quality samples are key
 3. **Test and tune threshold** - Find the sweet spot for your environment
 4. **Monitor performance** - Track false positives and misses
 5. **Iterate on training** - Add hard examples, retrain
 6. **Consider edge deployment** - Once server-side is solid
 ## Resources
 - Mycroft Precise Docs: https://github.com/MycroftAI/mycroft-precise
 - Training Guide: https://mycroft-ai.gitbook.io/docs/mycroft-technologies/precise
 - Community Models: https://github.com/MycroftAI/precise-data
 - K210 Docs: https://canaan-creative.com/developer
 - nncase: https://github.com/kendryte/nncase
 ## Conclusion
 Mycroft Precise gives you full control over your wake word detection with complete privacy. Start with server-side detection for easier development, then optimize to edge detection once you have a well-trained model.
 The key to success is good training data - invest time in collecting diverse, high-quality samples!
--- a/docs/PRECISE_DEPLOYMENT.md
+++ b/docs/PRECISE_DEPLOYMENT.md
@ -0,0 +1,577 @@
 # Mycroft Precise Deployment Guide
 ## Quick Reference: Server vs Edge Detection
 ### Server-Side Detection (Recommended for Start)
 **Setup:**
 ```bash
 # 1. On Heimdall: Setup Precise
 ./setup_precise.sh --wake-word "hey computer"
 # 2. Train your model (follow scripts in ~/precise-models/hey-computer/)
 cd ~/precise-models/hey-computer
 ./1-record-wake-word.sh
 ./2-record-not-wake-word.sh
 # Organize samples, then:
 ./3-train-model.sh
 ./4-test-model.sh
 # 3. Start voice server with Precise
 cd ~/voice-assistant
 conda activate precise
 python voice_server.py \
    --enable-precise \
    --precise-model ~/precise-models/hey-computer/hey-computer.net \
    --precise-sensitivity 0.5
 ```
 **Architecture:**
 - Maix Duino → Continuous audio stream → Heimdall
 - Heimdall runs Precise on audio stream
 - On wake word: Process command with Whisper
 - Response → TTS → Stream back to Maix Duino
 **Pros:** Easier setup, better accuracy, simple updates
 **Cons:** More network traffic, requires stable connection
 ### Edge Detection (Advanced - Future Phase)
 **Setup:**
 ```bash
 # 1. Train model on Heimdall (same as above)
 # 2. Convert to KMODEL for K210
 # 3. Deploy to Maix Duino
 # (See MYCROFT_PRECISE_GUIDE.md for detailed conversion steps)
 ```
 **Architecture:**
 - Maix Duino runs Precise locally on K210
 - Only sends audio after wake word detected
 - Lower latency, less network traffic
 **Pros:** Lower latency, less bandwidth, works offline
 **Cons:** Complex conversion, lower accuracy, harder updates
 ## Phase-by-Phase Deployment
 ### Phase 1: Server Setup (Day 1)
 ```bash
 # On Heimdall
 ssh alan@10.1.10.71
 # 1. Setup voice assistant base
 ./setup_voice_assistant.sh
 # 2. Setup Mycroft Precise
 ./setup_precise.sh --wake-word "hey computer"
 # 3. Configure environment
 vim ~/voice-assistant/config/.env
 ```
 Update `.env`:
 ```bash
 HA_URL=http://your-home-assistant:8123
 HA_TOKEN=your_token_here
 PRECISE_MODEL=/home/alan/precise-models/hey-computer/hey-computer.net
 PRECISE_SENSITIVITY=0.5
 ```
 ### Phase 2: Wake Word Training (Day 1-2)
 ```bash
 # Navigate to training directory
 cd ~/precise-models/hey-computer
 conda activate precise
 # Record samples (30-60 minutes)
 ./1-record-wake-word.sh    # Record 50-100 wake word samples
 ./2-record-not-wake-word.sh # Record 200-500 negative samples
 # Organize samples
 # Move 80% of wake-word recordings to wake-word/
 # Move 20% of wake-word recordings to test/wake-word/
 # Move 80% of not-wake-word to not-wake-word/
 # Move 20% of not-wake-word to test/not-wake-word/
 # Train model (30-60 minutes)
 ./3-train-model.sh
 # Test model
 ./4-test-model.sh
 # Evaluate on test set
 ./5-evaluate-model.sh
 # Tune threshold
 ./6-tune-threshold.sh
 ```
 ### Phase 3: Server Integration (Day 2)
 #### Option A: Manual Testing
 ```bash
 cd ~/voice-assistant
 conda activate precise
 # Start server with Precise enabled
 python voice_server.py \
    --enable-precise \
    --precise-model ~/precise-models/hey-computer/hey-computer.net \
    --precise-sensitivity 0.5 \
    --ha-url http://your-ha:8123 \
    --ha-token your_token
 ```
 #### Option B: Systemd Service
 Update systemd service to use Precise environment:
 ```bash
 sudo vim /etc/systemd/system/voice-assistant.service
 ```
 ```ini
 [Unit]
 Description=Voice Assistant with Wake Word Detection
 After=network.target
 [Service]
 Type=simple
 User=alan
 WorkingDirectory=/home/alan/voice-assistant
 Environment="PATH=/home/alan/miniconda3/envs/precise/bin:/usr/local/bin:/usr/bin:/bin"
 EnvironmentFile=/home/alan/voice-assistant/config/.env
 ExecStart=/home/alan/miniconda3/envs/precise/bin/python voice_server.py \
    --enable-precise \
    --precise-model /home/alan/precise-models/hey-computer/hey-computer.net \
    --precise-sensitivity 0.5
 Restart=on-failure
 RestartSec=10
 StandardOutput=append:/home/alan/voice-assistant/logs/voice_assistant.log
 StandardError=append:/home/alan/voice-assistant/logs/voice_assistant_error.log
 [Install]
 WantedBy=multi-user.target
 ```
 Enable and start:
 ```bash
 sudo systemctl daemon-reload
 sudo systemctl enable voice-assistant
 sudo systemctl start voice-assistant
 sudo systemctl status voice-assistant
 ```
 ### Phase 4: Maix Duino Setup (Day 2-3)
 For server-side wake word detection, Maix Duino streams audio:
 Update `maix_voice_client.py`:
 ```python
 # Use simplified mode - just stream audio
 # Server handles wake word detection
 CONTINUOUS_STREAM = True  # Enable continuous streaming
 WAKE_WORD_CHECK_INTERVAL = 0  # Server-side detection
 ```
 Flash and test:
 1. Copy updated script to SD card
 2. Boot Maix Duino
 3. Check serial console for connection
 4. Speak wake word
 5. Verify server logs show detection
 ### Phase 5: Testing & Tuning (Day 3-7)
 #### Test Wake Word Detection
 ```bash
 # Monitor server logs
 journalctl -u voice-assistant -f
 # Or check detections via API
 curl http://10.1.10.71:5000/wake-word/detections
 ```
 #### Test End-to-End Flow
 1. Say wake word: "Hey Computer"
 2. Wait for LED/beep on Maix Duino
 3. Say command: "Turn on the living room lights"
 4. Verify HA command executes
 5. Hear TTS response
 #### Monitor Performance
 ```bash
 # Check wake word log
 tail -f ~/voice-assistant/logs/wake_words.log
 # Check false positive rate
 grep "wake_word" ~/voice-assistant/logs/wake_words.log | wc -l
 # Check accuracy
 # Should see detections when you say wake word
 # Should NOT see detections during normal conversation
 ```
 #### Tune Sensitivity
 If too many false positives:
 ```bash
 # Increase threshold (more conservative)
 # Edit systemd service or restart with:
 python voice_server.py --precise-sensitivity 0.7
 ```
 If missing wake words:
 ```bash
 # Decrease threshold (more aggressive)
 python voice_server.py --precise-sensitivity 0.3
 ```
 #### Collect Hard Examples
 ```bash
 # When you notice false positives, record them
 cd ~/precise-models/hey-computer
 precise-collect -f not-wake-word/false-positive-$(date +%s).wav
 # When wake word is missed, record it
 precise-collect -f wake-word/missed-$(date +%s).wav
 # After collecting 10-20 examples, retrain
 ./3-train-model.sh
 ```
 ## Monitoring Commands
 ### Check System Status
 ```bash
 # Service status
 sudo systemctl status voice-assistant
 # Server health
 curl http://10.1.10.71:5000/health
 # Wake word status
 curl http://10.1.10.71:5000/wake-word/status
 # Recent detections
 curl http://10.1.10.71:5000/wake-word/detections
 ```
 ### View Logs
 ```bash
 # Real-time server logs
 journalctl -u voice-assistant -f
 # Last 50 lines
 journalctl -u voice-assistant -n 50
 # Specific log file
 tail -f ~/voice-assistant/logs/voice_assistant.log
 # Wake word detections
 tail -f ~/voice-assistant/logs/wake_words.log
 # Maix Duino serial console
 screen /dev/ttyUSB0 115200
 ```
 ### Performance Metrics
 ```bash
 # CPU usage (should be ~5-10% idle, spikes during processing)
 top -p $(pgrep -f voice_server.py)
 # Memory usage
 ps aux | grep voice_server.py
 # Network traffic (if streaming audio)
 iftop -i eth0  # or your network interface
 ```
 ## Troubleshooting
 ### Wake Word Not Detecting
 **Check model is loaded:**
 ```bash
 curl http://10.1.10.71:5000/wake-word/status
 # Should show: "enabled": true
 ```
 **Test model directly:**
 ```bash
 conda activate precise
 precise-listen ~/precise-models/hey-computer/hey-computer.net
 # Speak wake word - should see "!"
 ```
 **Check sensitivity:**
 ```bash
 # Try lower threshold
 precise-listen ~/precise-models/hey-computer/hey-computer.net -t 0.3
 ```
 **Verify audio input:**
 ```bash
 # Test microphone
 arecord -d 5 test.wav
 aplay test.wav
 ```
 ### Too Many False Positives
 **Increase threshold:**
 ```bash
 # Edit service or restart with higher sensitivity
 python voice_server.py --precise-sensitivity 0.7
 ```
 **Retrain with false positives:**
 ```bash
 cd ~/precise-models/hey-computer
 # Record false triggers in not-wake-word/
 precise-collect -f not-wake-word/false-triggers.wav
 # Add to not-wake-word training set
 ./3-train-model.sh
 ```
 ### Server Won't Start with Precise
 **Check Precise installation:**
 ```bash
 conda activate precise
 python -c "from precise_runner import PreciseRunner; print('OK')"
 ```
 **Check engine:**
 ```bash
 precise-engine --version
 # Should show: Precise v0.3.0
 ```
 **Check model file:**
 ```bash
 ls -lh ~/precise-models/hey-computer/hey-computer.net
 file ~/precise-models/hey-computer/hey-computer.net
 ```
 **Check permissions:**
 ```bash
 chmod +x /usr/local/bin/precise-engine
 chmod 644 ~/precise-models/hey-computer/hey-computer.net
 ```
 ### Audio Quality Issues
 **Test audio path:**
 ```bash
 # Record test on server
 arecord -f S16_LE -r 16000 -c 1 -d 5 test.wav
 # Transcribe with Whisper
 conda activate voice-assistant
 python -c "
 import whisper
 model = whisper.load_model('base')
 result = model.transcribe('test.wav')
 print(result['text'])
 "
 ```
 **If poor quality:**
 - Check microphone connection
 - Verify sample rate (16kHz)
 - Test with USB microphone
 - Check for interference/noise
 ### Maix Duino Connection Issues
 **Check WiFi:**
 ```python
 # In Maix Duino serial console
 import network
 wlan = network.WLAN(network.STA_IF)
 print(wlan.isconnected())
 print(wlan.ifconfig())
 ```
 **Check server reachability:**
 ```python
 # From Maix Duino
 import urequests
 response = urequests.get('http://10.1.10.71:5000/health')
 print(response.json())
 ```
 **Check audio streaming:**
 ```bash
 # On Heimdall, monitor network
 sudo tcpdump -i any -n host <maix-duino-ip>
 # Should see continuous packets when streaming
 ```
 ## Optimization Tips
 ### Reduce Latency
 1. **Use smaller Whisper model:**
   ```bash
   # Edit .env
   WHISPER_MODEL=base  # or tiny
   ```
 2. **Optimize Precise sensitivity:**
   ```bash
   # Find sweet spot between false positives and latency
   # Lower threshold = faster trigger but more false positives
   ```
 3. **Pre-load models:**
   ```python
   # Models load on startup, not first request
   # Adds ~30s startup time but eliminates first-request delay
   ```
 ### Improve Accuracy
 1. **Use larger Whisper model:**
   ```bash
   WHISPER_MODEL=large
   ```
 2. **Train more wake word samples:**
   ```bash
   # Aim for 100+ high-quality samples
   # Diverse speakers, conditions, distances
   ```
 3. **Increase training epochs:**
   ```bash
   # In 3-train-model.sh
   precise-train -e 120 hey-computer.net .  # vs default 60
   ```
 ### Reduce False Positives
 1. **Collect hard negatives:**
   ```bash
   # Record TV, music, similar phrases
   # Add to not-wake-word training set
   ```
 2. **Increase threshold:**
   ```bash
   --precise-sensitivity 0.7  # vs default 0.5
   ```
 3. **Use ensemble model:**
   ```python
   # Run multiple models, require agreement
   # Advanced - requires code modification
   ```
 ## Production Checklist
 - [ ] Wake word model trained with 50+ samples
 - [ ] Model tested with <5% false positive rate
 - [ ] Server service enabled and auto-starting
 - [ ] Home Assistant token configured
 - [ ] Maix Duino WiFi configured
 - [ ] End-to-end test successful
 - [ ] Logs rotating properly
 - [ ] Monitoring in place
 - [ ] Backup of trained model
 - [ ] Documentation updated
 ## Backup and Recovery
 ### Backup Trained Model
 ```bash
 # Backup model
 cp ~/precise-models/hey-computer/hey-computer.net \
   ~/precise-models/hey-computer/hey-computer.net.backup
 # Backup to another host
 scp ~/precise-models/hey-computer/hey-computer.net \
    user@backup-host:/path/to/backups/
 ```
 ### Restore from Backup
 ```bash
 # Restore model
 cp ~/precise-models/hey-computer/hey-computer.net.backup \
   ~/precise-models/hey-computer/hey-computer.net
 # Restart service
 sudo systemctl restart voice-assistant
 ```
 ## Next Steps
 Once basic server-side detection is working:
 1. **Add more intents** - Expand Home Assistant control
 2. **Implement TTS playback** - Complete the audio response loop
 3. **Multi-room support** - Deploy multiple Maix Duino units
 4. **Voice profiles** - Train model on family members
 5. **Edge deployment** - Convert model for K210 (advanced)
 ## Resources
 - Main guide: MYCROFT_PRECISE_GUIDE.md
 - Quick start: QUICKSTART.md
 - Architecture: maix-voice-assistant-architecture.md
 - Mycroft Docs: https://github.com/MycroftAI/mycroft-precise
 - Community: https://community.mycroft.ai/
 ## Support
 ### Log an Issue
 ```bash
 # Collect debug info
 echo "=== System Info ===" > debug.log
 uname -a >> debug.log
 conda list >> debug.log
 echo "=== Service Status ===" >> debug.log
 systemctl status voice-assistant >> debug.log
 echo "=== Recent Logs ===" >> debug.log
 journalctl -u voice-assistant -n 100 >> debug.log
 echo "=== Wake Word Status ===" >> debug.log
 curl http://10.1.10.71:5000/wake-word/status >> debug.log
 ```
 Then share `debug.log` when asking for help.
 ### Common Issues Database
 | Symptom | Likely Cause | Solution |
 |---------|--------------|----------|
 | No wake detection | Model not loaded | Check `/wake-word/status` |
 | Service won't start | Missing dependencies | Reinstall Precise |
 | High false positives | Low threshold | Increase to 0.7+ |
 | Missing wake words | High threshold | Decrease to 0.3-0.4 |
 | Poor transcription | Bad audio quality | Check microphone |
 | HA commands fail | Wrong token | Update .env |
 | High CPU usage | Large Whisper model | Use smaller model |
 ## Conclusion
 With Mycroft Precise, you have complete control over your wake word detection. Start with server-side detection for easier debugging, collect good training data, and tune the threshold for your environment. Once it's working well, you can optionally optimize to edge detection for lower latency.
 The key to success: **Quality training data > Quantity**
 Happy voice assisting! 🎙️
--- a/docs/QUESTIONS_ANSWERED.md
+++ b/docs/QUESTIONS_ANSWERED.md
@ -0,0 +1,470 @@
 # Your Questions Answered - Quick Reference
 ## TL;DR: Yes, Yes, and Multiple Options!
 ### Q1: Pre-trained "Hey Mycroft" Model?
 **Answer: YES! ✅**
 Download and use immediately:
 ```bash
 ./quick_start_hey_mycroft.sh
 # Done in 5 minutes - no training!
 ```
 The pre-trained model works great and saves you 1-2 hours of training time.
 ### Q2: Multiple Wake Words?
 **Answer: YES! ✅ (with considerations)**
 **Server-side (Heimdall):** Easy, run 3-5 wake words
 ```bash
 python voice_server_enhanced.py \
    --enable-precise \
    --multi-wake-word
 ```
 **Edge (K210):** Feasible for 1-2, challenging for 3+
 ### Q3: Adopting New Users' Voices?
 **Answer: Multiple approaches ✅**
 **Best option:** Train one model with everyone's voices upfront
 **Alternative:** Incremental retraining as new users join
 **Advanced:** Speaker identification with personalization
 ---
 ## Detailed Answers
 ### 1. Pre-trained "Hey Mycroft" Model
 #### Where to Get It
 ```bash
 # Quick start script does this for you
 wget https://github.com/MycroftAI/precise-data/raw/models-dev/hey-mycroft.tar.gz
 tar xzf hey-mycroft.tar.gz
 ```
 #### How to Use
 **Instant deployment:**
 ```bash
 python voice_server.py \
    --enable-precise \
    --precise-model ~/precise-models/pretrained/hey-mycroft.net
 ```
 **Fine-tune with your voice:**
 ```bash
 # Record 20-30 samples of your voice saying "Hey Mycroft"
 precise-collect
 # Fine-tune from pre-trained
 precise-train -e 30 my-hey-mycroft.net . \
    --from-checkpoint ~/precise-models/pretrained/hey-mycroft.net
 ```
 #### Advantages
 ✅ **Zero training time** - Works immediately  
 ✅ **Proven accuracy** - Tested by thousands  
 ✅ **Good baseline** - Already includes diverse voices  
 ✅ **Easy fine-tuning** - Add your voice in 30 mins vs 60+ mins from scratch  
 #### When to Use Pre-trained vs Custom
 **Use Pre-trained "Hey Mycroft" when:**
 - You want to test quickly
 - "Hey Mycroft" is an acceptable wake word
 - You want proven accuracy out-of-box
 **Train Custom when:**
 - You want a different wake word ("Hey Computer", "Jarvis", etc.)
 - Maximum accuracy for your specific environment
 - Family-specific wake word
 **Hybrid (Recommended):**
 - Start with pre-trained "Hey Mycroft"
 - Test and learn the system
 - Fine-tune with your samples
 - Or add custom wake word later
 ---
 ### 2. Multiple Wake Words
 #### Can You Have Multiple?
 **Yes!** Options:
 #### Option A: Server-Side (Recommended)
 **Easy implementation:**
 ```bash
 # Use the enhanced server
 python voice_server_enhanced.py \
    --enable-precise \
    --multi-wake-word
 ```
 **Configured wake words:**
 - "Hey Mycroft" (pre-trained)
 - "Hey Computer" (custom)
 - "Jarvis" (custom)
 **Resource impact:**
 - 3 models = ~15-30% CPU (Heimdall handles easily)
 - ~300-600MB RAM
 - Each model runs independently
 **Example use cases:**
 ```python
 "Hey Mycroft, what's the time?" → General assistant
 "Jarvis, run diagnostics"        → Personal assistant mode
 "Emergency, call help"           → Priority/emergency mode
 ```
 #### Option B: Edge (K210)
 **Feasible for 1-2 wake words:**
 ```python
 # Sequential checking
 for model in ['hey-mycroft.kmodel', 'emergency.kmodel']:
    if detect_wake_word(model):
        return model
 ```
 **Limitations:**
 - +50-100ms latency per additional model
 - Memory constraints (6MB total for all models)
 - More models = more power consumption
 **Recommendation:**
 - K210: 1 wake word (optimal)
 - K210: 2 wake words (acceptable)
 - K210: 3+ wake words (not recommended)
 #### Option C: Contextual Wake Words
 Different wake words for different purposes:
 ```python
 wake_word_contexts = {
    'hey_mycroft': 'general_assistant',
    'emergency': 'priority_emergency',
    'goodnight': 'bedtime_routine',
 }
 ```
 #### Should You Use Multiple?
 **One wake word is usually enough!**
 Commercial products (Alexa, Google) use one wake word and they work fine.
 **Use multiple when:**
 - Different family members want different wake words
 - You want context-specific behaviors (emergency vs. general)
 - You enjoy the flexibility
 **Start with one, add more later if needed.**
 ---
 ### 3. Adopting New Users' Voices
 #### Challenge
 Same wake word, different voices:
 - Mom says "Hey Mycroft" (soprano)
 - Dad says "Hey Mycroft" (bass)
 - Kids say "Hey Mycroft" (high-pitched)
 All need to work!
 #### Solution 1: Diverse Training (Recommended)
 **During initial training, have everyone record samples:**
 ```bash
 cd ~/precise-models/family-hey-mycroft
 # Session 1: Mom records 30 samples
 precise-collect  # Mom speaks "Hey Mycroft" 30 times
 # Session 2: Dad records 30 samples  
 precise-collect  # Dad speaks "Hey Mycroft" 30 times
 # Session 3: Kids record 20 samples each
 precise-collect  # Kids speak "Hey Mycroft" 40 times total
 # Train one model with all voices
 precise-train -e 60 family-hey-mycroft.net .
 # Deploy
 python voice_server.py \
    --enable-precise \
    --precise-model family-hey-mycroft.net
 ```
 **Pros:**
 ✅ One model works for everyone  
 ✅ Simple deployment  
 ✅ No switching needed  
 ✅ Works from day one  
 **Cons:**
 ❌ Need everyone's time upfront  
 ❌ Slightly lower per-person accuracy than individual models  
 #### Solution 2: Incremental Training
 **Start with one person, add others over time:**
 ```bash
 # Week 1: Train with Dad's voice
 precise-train -e 60 hey-mycroft.net .
 # Week 2: Mom wants to use it
 # Collect Mom's samples
 precise-collect  # Mom records 20-30 samples
 # Add to training set
 cp mom-samples/* wake-word/
 # Retrain from checkpoint (faster!)
 precise-train -e 30 hey-mycroft.net . \
    --from-checkpoint hey-mycroft.net
 # Now works for both Dad and Mom!
 # Week 3: Kids want in
 # Repeat process...
 ```
 **Pros:**
 ✅ Don't need everyone upfront  
 ✅ Easy to add new users  
 ✅ Model improves gradually  
 **Cons:**
 ❌ New users may have issues initially  
 ❌ Requires periodic retraining  
 #### Solution 3: Speaker Identification (Advanced)
 **Identify who's speaking, use personalized model/settings:**
 ```bash
 # Install speaker ID
 pip install pyannote.audio scipy --break-system-packages
 # Use enhanced server
 python voice_server_enhanced.py \
    --enable-precise \
    --enable-speaker-id \
    --hf-token YOUR_HF_TOKEN
 ```
 **Enroll users:**
 ```bash
 # Record 30-second voice sample from each person
 # POST to /speakers/enroll with audio + name
 curl -F "name=alan" \
     -F "audio=@alan_voice.wav" \
     http://localhost:5000/speakers/enroll
 curl -F "name=sarah" \
     -F "audio=@sarah_voice.wav" \
     http://localhost:5000/speakers/enroll
 ```
 **Benefits:**
 ```python
 # Different responses per user
 if speaker == 'alan':
    turn_on('light.alan_office')
 elif speaker == 'sarah':
    turn_on('light.sarah_office')
 # Different permissions
 if speaker == 'kids' and command.startswith('buy'):
    return "Sorry, kids can't make purchases"
 ```
 **Pros:**
 ✅ Personalized responses  
 ✅ User-specific settings  
 ✅ Better accuracy (optimized per voice)  
 ✅ Can track who said what  
 **Cons:**
 ❌ More complex  
 ❌ Privacy considerations  
 ❌ Additional CPU/RAM (~10% + 200MB)  
 ❌ Requires voice enrollment  
 #### Solution 4: Pre-trained Model (Easiest)
 **"Hey Mycroft" already includes diverse voices!**
 ```bash
 # Just use it - already trained on many voices
 ./quick_start_hey_mycroft.sh
 ```
 The community model was trained with:
 - Male and female voices
 - Different accents
 - Different ages
 - Various environments
 **It should work for most family members out-of-box!**
 Then fine-tune if needed.
 ---
 ## Recommended Path for Your Situation
 ### Scenario: Family of 3-4 People
 **Week 1: Quick Start**
 ```bash
 # Use pre-trained "Hey Mycroft"
 ./quick_start_hey_mycroft.sh
 # Test with all family members
 # Likely works for everyone already!
 ```
 **Week 2: Fine-tune if Needed**
 ```bash
 # If someone has issues:
 # Have them record 20 samples
 # Fine-tune the model
 precise-train -e 30 family-hey-mycroft.net . \
    --from-checkpoint ~/precise-models/pretrained/hey-mycroft.net
 ```
 **Week 3: Add Features**
 ```bash
 # If you want personalization:
 python voice_server_enhanced.py \
    --enable-speaker-id
 # Enroll each family member
 ```
 ### Scenario: Just You (or 1-2 People)
 **Option 1: Pre-trained**
 ```bash
 ./quick_start_hey_mycroft.sh
 # Done!
 ```
 **Option 2: Custom Wake Word**
 ```bash
 # Train custom "Hey Computer"
 cd ~/precise-models/hey-computer
 ./1-record-wake-word.sh  # 50 samples
 ./2-record-not-wake-word.sh  # 200 samples
 ./3-train-model.sh
 ```
 ### Scenario: Multiple People + Multiple Wake Words
 **Full setup:**
 ```bash
 # Pre-trained for family
 ./quick_start_hey_mycroft.sh
 # Personal wake word for Dad
 cd ~/precise-models/jarvis
 # Train custom wake word
 # Emergency wake word
 cd ~/precise-models/emergency
 # Train emergency wake word
 # Run multi-wake-word server
 python voice_server_enhanced.py \
    --enable-precise \
    --multi-wake-word \
    --enable-speaker-id
 ```
 ---
 ## Quick Decision Matrix
 | Your Situation | Recommendation |
 |----------------|----------------|
 | **Just getting started** | Pre-trained "Hey Mycroft" |
 | **Want different wake word** | Train custom model |
 | **Family of 3-4** | Pre-trained + fine-tune if needed |
 | **Want personalization** | Add speaker ID |
 | **Multiple purposes** | Multiple wake words (server-side) |
 | **Deploying to K210** | 1 wake word, no speaker ID |
 ---
 ## Files to Use
 **Quick start with pre-trained:**
 - `quick_start_hey_mycroft.sh` - Zero training, 5 minutes!
 **Multiple wake words:**
 - `voice_server_enhanced.py` - Multi-wake-word + speaker ID support
 **Training custom:**
 - `setup_precise.sh` - Setup training environment
 - Scripts in `~/precise-models/your-wake-word/`
 **Documentation:**
 - `WAKE_WORD_ADVANCED.md` - Detailed guide (this is comprehensive!)
 - `PRECISE_DEPLOYMENT.md` - Production deployment
 ---
 ## Summary
 ✅ **Yes**, pre-trained "Hey Mycroft" exists and works great  
 ✅ **Yes**, you can have multiple wake words (server-side is easy)  
 ✅ **Yes**, multiple approaches for multi-user support  
 **Recommended approach:**
 1. Start with `./quick_start_hey_mycroft.sh` (5 mins)
 2. Test with all family members
 3. Fine-tune if anyone has issues
 4. Add speaker ID later if you want personalization
 5. Consider multiple wake words only if you have specific use cases
 **Keep it simple!** One pre-trained wake word works for most people.
 ---
 ## Next Actions
 **Ready to start?**
 ```bash
 # 5-minute quick start
 ./quick_start_hey_mycroft.sh
 # Or read more first
 cat WAKE_WORD_ADVANCED.md
 ```
 **Questions?**
 - Pre-trained models: See WAKE_WORD_ADVANCED.md § Pre-trained
 - Multiple wake words: See WAKE_WORD_ADVANCED.md § Multiple Wake Words
 - Voice adaptation: See WAKE_WORD_ADVANCED.md § Voice Adaptation
 **Happy voice assisting! 🎙️**
--- a/docs/QUICKSTART.md
+++ b/docs/QUICKSTART.md
@ -0,0 +1,421 @@
 # Maix Duino Voice Assistant - Quick Start Guide
 ## Overview
 This guide will walk you through setting up a local, privacy-focused voice assistant using your Maix Duino board and Home Assistant integration. All processing happens on your local network - no cloud services required.
 ## What You'll Build
 - Wake word detection on Maix Duino (edge device)
 - Speech-to-text using Whisper on Heimdall
 - Home Assistant integration for smart home control
 - Text-to-speech responses using Piper
 - All processing local to your 10.1.10.0/24 network
 ## Hardware Requirements
 - [x] Sipeed Maix Duino board (you have this!)
 - [ ] I2S MEMS microphone (or microphone array)
 - [ ] Small speaker (3-5W) or audio output
 - [ ] MicroSD card (4GB+) formatted as FAT32
 - [ ] USB-C cable for power and programming
 ## Network Prerequisites
 - Maix Duino will need WiFi access to your 10.1.10.0/24 network
 - Heimdall (10.1.10.71) for AI processing
 - Home Assistant instance (configure URL in setup)
 ## Setup Process
 ### Phase 1: Server Setup (Heimdall)
 #### Step 1: Run the setup script
 ```bash
 # Transfer files to Heimdall
 scp setup_voice_assistant.sh voice_server.py alan@10.1.10.71:~/
 # SSH to Heimdall
 ssh alan@10.1.10.71
 # Make setup script executable and run it
 chmod +x setup_voice_assistant.sh
 ./setup_voice_assistant.sh
 ```
 #### Step 2: Configure Home Assistant access
 ```bash
 # Edit the config file
 vim ~/voice-assistant/config/.env
 ```
 Update these values:
 ```env
 HA_URL=http://your-home-assistant:8123
 HA_TOKEN=your_long_lived_access_token_here
 ```
 To get a long-lived access token:
 1. Open Home Assistant
 2. Click your profile (bottom left)
 3. Scroll to "Long-Lived Access Tokens"
 4. Click "Create Token"
 5. Copy the token and paste it in .env
 #### Step 3: Test the server
 ```bash
 cd ~/voice-assistant
 ./test_server.sh
 ```
 You should see:
 ```
 Loading Whisper model: medium
 Whisper model loaded successfully
 Starting voice processing server on 0.0.0.0:5000
 ```
 #### Step 4: Test with curl (from another terminal)
 ```bash
 # Test health endpoint
 curl http://10.1.10.71:5000/health
 # Should return:
 # {"status":"healthy","whisper_loaded":true,"ha_connected":true}
 ```
 ### Phase 2: Maix Duino Setup
 #### Step 1: Flash MaixPy firmware
 1. Download latest MaixPy firmware from: https://dl.sipeed.com/MAIX/MaixPy/release/
 2. Download Kflash GUI: https://github.com/sipeed/kflash_gui
 3. Connect Maix Duino via USB
 4. Flash firmware using Kflash GUI
 #### Step 2: Prepare SD card
 ```bash
 # Format SD card as FAT32
 # Create directory structure:
 mkdir -p /path/to/sdcard/models
 # Copy the client script
 cp maix_voice_client.py /path/to/sdcard/main.py
 ```
 #### Step 3: Configure WiFi settings
 Edit `/path/to/sdcard/main.py`:
 ```python
 # WiFi Settings
 WIFI_SSID = "YourNetworkName"
 WIFI_PASSWORD = "YourPassword"
 # Server Settings
 VOICE_SERVER_URL = "http://10.1.10.71:5000"
 ```
 #### Step 4: Test the board
 1. Insert SD card into Maix Duino
 2. Connect to serial console (115200 baud)
   ```bash
   screen /dev/ttyUSB0 115200
   # or
   minicom -D /dev/ttyUSB0 -b 115200
   ```
 3. Power on the board
 4. Watch the serial output for connection status
 ### Phase 3: Integration & Testing
 #### Test 1: Basic connectivity
 1. Maix Duino should connect to WiFi and display IP on LCD
 2. Server should show in logs when Maix connects
 #### Test 2: Audio capture
 The current implementation uses amplitude-based wake word detection as a placeholder. To test:
 1. Clap loudly near the microphone
 2. Speak a command (e.g., "turn on the living room lights")
 3. Watch the LCD for transcription and response
 #### Test 3: Home Assistant control
 Supported commands (add more in voice_server.py):
 - "Turn on the living room lights"
 - "Turn off the bedroom lights"
 - "What's the temperature?"
 - "Toggle the kitchen lights"
 ### Phase 4: Wake Word Training (Advanced)
 The placeholder wake word detection uses simple amplitude triggering. For production use:
 #### Option A: Use Porcupine (easiest)
 1. Sign up at: https://console.picovoice.ai/
 2. Train custom wake word
 3. Download .ppn model
 4. Convert to .kmodel for K210
 #### Option B: Use Mycroft Precise (FOSS)
 ```bash
 # On a machine with GPU
 conda create -n precise python=3.6
 conda activate precise
 pip install precise-runner
 # Record wake word samples
 precise-collect
 # Train model
 precise-train -e 60 my-wake-word.net my-wake-word/
 # Convert to .kmodel
 # (requires additional tools - see MaixPy docs)
 ```
 ## Architecture Diagram
 ```
 ┌─────────────────────────────────────────────────────────────┐
 │                     Your Home Network (10.1.10.0/24)        │
 │                                                              │
 │  ┌──────────────┐         ┌──────────────┐                 │
 │  │  Maix Duino  │────────>│  Heimdall    │                 │
 │  │  10.1.10.xxx │ Audio   │  10.1.10.71  │                 │
 │  │              │<────────│              │                 │
 │  │ - Wake Word  │ Response│ - Whisper    │                 │
 │  │ - Mic Input  │         │ - Piper TTS  │                 │
 │  │ - Speaker    │         │ - Flask API  │                 │
 │  └──────────────┘         └──────┬───────┘                 │
 │                                   │                          │
 │                                   │ REST API                │
 │                                   v                          │
 │                          ┌──────────────┐                   │
 │                          │ Home Asst.   │                   │
 │                          │ homeassistant│                   │
 │                          │              │                   │
 │                          │ - Devices    │                   │
 │                          │ - Automation │                   │
 │                          └──────────────┘                   │
 └─────────────────────────────────────────────────────────────┘
 ```
 ## Troubleshooting
 ### Maix Duino won't connect to WiFi
 ```python
 # Check serial output for errors
 # Common issues:
 # - Wrong SSID/password
 # - WPA3 not supported (use WPA2)
 # - 5GHz network (use 2.4GHz)
 ```
 ### Whisper transcription is slow
 ```bash
 # Use a smaller model on Heimdall
 # Edit ~/voice-assistant/config/.env:
 WHISPER_MODEL=base  # or tiny for fastest
 ```
 ### Home Assistant commands don't work
 ```bash
 # Check server logs
 journalctl -u voice-assistant -f
 # Test HA connection manually
 curl -H "Authorization: Bearer YOUR_TOKEN" \
     http://your-ha:8123/api/states
 ```
 ### Audio quality is poor
 1. Check microphone connections
 2. Adjust `SAMPLE_RATE` in maix_voice_client.py
 3. Test with USB microphone first
 4. Consider microphone array for better pickup
 ### Out of memory on Maix Duino
 ```python
 # In main_loop(), add more frequent GC:
 if gc.mem_free() < 200000:  # Increase threshold
    gc.collect()
 ```
 ## Adding New Intents
 Edit `voice_server.py` and add patterns to `IntentParser.PATTERNS`:
 ```python
 PATTERNS = {
    # Existing patterns...
    'set_temperature': [
        r'set (?:the )?temperature to (\d+)',
        r'make it (\d+) degrees',
    ],
 }
 ```
 Then add the handler in `execute_intent()`:
 ```python
 elif intent == 'set_temperature':
    temp = params.get('temperature')
    success = ha_client.call_service(
        'climate', 'set_temperature',
        entity_id, temperature=temp
    )
    return f"Set temperature to {temp} degrees"
 ```
 ## Entity Mapping
 Add your Home Assistant entities to `IntentParser.ENTITY_MAP`:
 ```python
 ENTITY_MAP = {
    # Lights
    'living room light': 'light.living_room',
    'bedroom light': 'light.bedroom',
    # Climate
    'thermostat': 'climate.main_floor',
    'temperature': 'sensor.main_floor_temperature',
    # Switches
    'coffee maker': 'switch.coffee_maker',
    'fan': 'switch.bedroom_fan',
    # Media
    'tv': 'media_player.living_room_tv',
    'music': 'media_player.whole_house',
 }
 ```
 ## Performance Tuning
 ### Reduce latency
 1. Use Whisper `tiny` or `base` model
 2. Implement streaming audio (currently batch)
 3. Pre-load TTS models
 4. Use faster TTS engine (e.g., espeak)
 ### Improve accuracy
 1. Use Whisper `large` model (slower)
 2. Train custom wake word
 3. Add NLU layer (Rasa, spaCy)
 4. Collect and fine-tune on your voice
 ## Next Steps
 ### Short term
 - [ ] Add more Home Assistant entity mappings
 - [ ] Implement Piper TTS playback on Maix Duino
 - [ ] Train custom wake word model
 - [ ] Add LED animations for better feedback
 - [ ] Implement conversation context
 ### Medium term
 - [ ] Multi-room support (multiple Maix Duino units)
 - [ ] Voice profiles for different users
 - [ ] Integration with Plex for media control
 - [ ] Calendar and reminder functionality
 - [ ] Weather updates from local weather station
 ### Long term
 - [ ] Custom skills/plugins system
 - [ ] Integration with other services (Nextcloud, Matrix)
 - [ ] Sound event detection (doorbell, smoke alarm)
 - [ ] Intercom functionality between rooms
 - [ ] Voice-controlled automation creation
 ## Alternatives & Fallbacks
 If the Maix Duino proves limiting:
 ### Raspberry Pi Zero 2 W
 - More processing power
 - Better software support
 - USB audio support
 - Cost: ~$15
 ### ESP32-S3
 - Better WiFi
 - More RAM (8MB)
 - Cheaper (~$10)
 - Good community support
 ### Orange Pi Zero 2
 - ARM Cortex-A53 quad-core
 - 512MB-1GB RAM
 - Full Linux support
 - Cost: ~$20
 ## Resources
 ### Documentation
 - Maix Duino: https://wiki.sipeed.com/hardware/en/maix/
 - MaixPy: https://maixpy.sipeed.com/
 - Whisper: https://github.com/openai/whisper
 - Piper TTS: https://github.com/rhasspy/piper
 - Home Assistant API: https://developers.home-assistant.io/
 ### Community Projects
 - Rhasspy: https://rhasspy.readthedocs.io/
 - Willow: https://github.com/toverainc/willow
 - Mycroft: https://mycroft.ai/
 ### Wake Word Tools
 - Porcupine: https://picovoice.ai/platform/porcupine/
 - Mycroft Precise: https://github.com/MycroftAI/mycroft-precise
 - Snowboy (archived): https://github.com/Kitt-AI/snowboy
 ## Getting Help
 ### Check logs
 ```bash
 # Server logs (if using systemd)
 sudo journalctl -u voice-assistant -f
 # Or manual log file
 tail -f ~/voice-assistant/logs/voice_assistant.log
 # Maix Duino serial console
 screen /dev/ttyUSB0 115200
 ```
 ### Common issues and solutions
 See the Troubleshooting section above
 ### Useful commands
 ```bash
 # Restart service
 sudo systemctl restart voice-assistant
 # Check service status
 sudo systemctl status voice-assistant
 # Test HA connection
 curl http://10.1.10.71:5000/health
 # Monitor Maix Duino
 minicom -D /dev/ttyUSB0 -b 115200
 ```
 ## Cost Breakdown
 | Item | Cost | Status |
 |------|------|--------|
 | Maix Duino | $30 | Have it! |
 | I2S Microphone | $5-10 | Need |
 | Speaker | $10 | Need (or use existing) |
 | MicroSD Card | $5 | Have it? |
 | **Total** | **$15-25** | (vs $50+ commercial) |
 **Benefits of local solution:**
 - No subscription fees
 - Complete privacy (no cloud)
 - Customizable to your needs
 - Integration with existing infrastructure
 - Learning experience!
 ## Conclusion
 You now have everything you need to build a local, privacy-focused voice assistant! The setup leverages your existing infrastructure (Heimdall for processing, Home Assistant for automation) while keeping costs minimal.
 Start with the basic setup, test each component, then iterate and improve. The beauty of this approach is you can enhance it over time without being locked into a commercial platform.
 Good luck, and enjoy your new voice assistant! 🎙️
--- a/docs/WAKE_WORD_ADVANCED.md
+++ b/docs/WAKE_WORD_ADVANCED.md
@ -0,0 +1,723 @@
 # Wake Word Models: Pre-trained, Multiple, and Voice Adaptation
 ## Pre-trained Wake Word Models
 ### Yes! "Hey Mycroft" Already Exists
 Mycroft provides several pre-trained models that you can use immediately:
 #### Available Pre-trained Models
 **Hey Mycroft** (Official)
 ```bash
 # Download from Mycroft's model repository
 cd ~/precise-models/pretrained
 wget https://github.com/MycroftAI/precise-data/raw/models-dev/hey-mycroft.tar.gz
 tar xzf hey-mycroft.tar.gz
 # Test immediately
 conda activate precise
 precise-listen hey-mycroft.net
 # Should detect "Hey Mycroft" right away!
 ```
 **Other Available Models:**
 - **Hey Mycroft** - Best tested, most reliable
 - **Christopher** - Alternative wake word
 - **Hey Jarvis** - Community contributed
 - **Computer** - Star Trek style
 #### Using Pre-trained Models
 **Option 1: Use as-is**
 ```bash
 # Just point your server to the pre-trained model
 python voice_server.py \
    --enable-precise \
    --precise-model ~/precise-models/pretrained/hey-mycroft.net \
    --precise-sensitivity 0.5
 ```
 **Option 2: Fine-tune for your voice**
 ```bash
 # Use pre-trained as starting point, add your samples
 cd ~/precise-models/my-hey-mycroft
 # Record additional samples
 precise-collect
 # Train from checkpoint (much faster than from scratch!)
 precise-train -e 30 my-hey-mycroft.net . \
    --from-checkpoint ~/precise-models/pretrained/hey-mycroft.net
 # This adds your voice/environment while keeping the base model
 ```
 **Option 3: Ensemble with custom**
 ```python
 # Use both pre-trained and custom model
 # Require both to agree (reduces false positives)
 # See implementation below
 ```
 ### Advantages of Pre-trained Models
 ✅ **Instant deployment** - No training required  
 ✅ **Proven accuracy** - Tested by thousands of users  
 ✅ **Good starting point** - Fine-tune rather than train from scratch  
 ✅ **Multiple speakers** - Already includes diverse voices  
 ✅ **Save time** - Skip 1-2 hours of training  
 ### Disadvantages
 ❌ **Generic** - Not optimized for your voice/environment  
 ❌ **May need tuning** - Threshold adjustment required  
 ❌ **Limited choice** - Only a few wake words available  
 ### Recommendation
 **Start with "Hey Mycroft"** pre-trained model:
 1. Deploy immediately (zero training time)
 2. Test in your environment
 3. Collect false positives/negatives
 4. Fine-tune with your examples
 5. Best of both worlds!
 ## Multiple Wake Words
 ### Can You Have Multiple Wake Words?
 **Short answer:** Yes, but with tradeoffs.
 ### Implementation Approaches
 #### Approach 1: Server-Side Multiple Models (Recommended)
 Run multiple Precise models in parallel on Heimdall:
 ```python
 # In voice_server.py
 from precise_runner import PreciseEngine, PreciseRunner
 # Global runners for each wake word
 precise_runners = {}
 wake_word_configs = {
    'hey_mycroft': {
        'model': '~/precise-models/pretrained/hey-mycroft.net',
        'sensitivity': 0.5,
        'response': 'Yes?'
    },
    'hey_computer': {
        'model': '~/precise-models/hey-computer/hey-computer.net',
        'sensitivity': 0.5,
        'response': 'I\'m listening'
    },
    'jarvis': {
        'model': '~/precise-models/jarvis/jarvis.net',
        'sensitivity': 0.6,
        'response': 'At your service, sir'
    }
 }
 def on_wake_word_detected(wake_word_name):
    """Callback with wake word identifier"""
    def callback():
        print(f"Wake word detected: {wake_word_name}")
        wake_word_queue.put({
            'timestamp': time.time(),
            'wake_word': wake_word_name,
            'response': wake_word_configs[wake_word_name]['response']
        })
    return callback
 def start_multiple_wake_words():
    """Start multiple Precise listeners"""
    for name, config in wake_word_configs.items():
        engine = PreciseEngine(
            '/usr/local/bin/precise-engine',
            os.path.expanduser(config['model'])
        )
        runner = PreciseRunner(
            engine,
            sensitivity=config['sensitivity'],
            on_activation=on_wake_word_detected(name)
        )
        runner.start()
        precise_runners[name] = runner
        print(f"Started wake word listener: {name}")
 ```
 **Resource Usage:**
 - CPU: ~5-10% per model (3 models = ~15-30%)
 - RAM: ~100-200MB per model
 - Still very manageable on Heimdall
 **Pros:**
 ✅ Different wake words for different purposes  
 ✅ Family members can choose preferred wake word  
 ✅ Context-aware responses  
 ✅ Easy to add/remove models  
 **Cons:**
 ❌ Higher CPU usage (scales linearly)  
 ❌ Increased false positive risk (3x models = 3x chance)  
 ❌ More complex configuration  
 #### Approach 2: Edge Multiple Models (K210)
 **Challenge:** K210 has limited resources
 **Option A: Sequential checking** (Feasible)
 ```python
 # Check each model in sequence
 models = ['hey-mycroft.kmodel', 'hey-computer.kmodel']
 for model in models:
    kpu_task = kpu.load(f"/sd/models/{model}")
    result = kpu.run(kpu_task, audio_features)
    if result > threshold:
        return model  # Wake word detected
 ```
 **Resource impact:**
 - Latency: +50-100ms per additional model
 - Memory: Models must fit in 6MB total
 - CPU: ~30% per model check
 **Option B: Combined model** (Advanced)
 ```python
 # Train a single model that recognizes multiple phrases
 # Each phrase maps to different output class
 # More complex training but single inference
 ```
 **Recommendation for edge:**
 - **1-2 wake words max** on K210
 - **Server-side** for 3+ wake words
 #### Approach 3: Contextual Wake Words
 Different wake words trigger different behaviors:
 ```python
 wake_word_contexts = {
    'hey_mycroft': 'general',      # General commands
    'hey_assistant': 'general',    # Alternative general
    'emergency': 'priority',       # High priority
    'goodnight': 'bedtime',        # Bedtime routine
 }
 def handle_wake_word(wake_word, command):
    context = wake_word_contexts[wake_word]
    if context == 'priority':
        # Skip queue, process immediately
        # Maybe call emergency contact
        pass
    elif context == 'bedtime':
        # Trigger bedtime automation
        # Lower volume for responses
        pass
    else:
        # Normal processing
        pass
 ```
 ### Best Practices for Multiple Wake Words
 1. **Start with one** - Get it working well first
 2. **Add gradually** - One at a time, test thoroughly
 3. **Different purposes** - Each wake word should have a reason
 4. **Monitor performance** - Track false positives per wake word
 5. **User preference** - Let family members choose their favorite
 ### Recommended Configuration
 **For most users:**
 ```python
 wake_words = {
    'hey_mycroft': 'primary',    # Main wake word (pre-trained)
    'hey_computer': 'alternative' # Custom trained for your voice
 }
 ```
 **For power users:**
 ```python
 wake_words = {
    'hey_mycroft': 'general',
    'jarvis': 'personal_assistant',  # Custom responses
    'computer': 'technical_queries', # Different intent parser
 }
 ```
 **For families:**
 ```python
 wake_words = {
    'hey_mycroft': 'shared',        # Everyone can use
    'dad': 'user_alan',             # Personalized
    'mom': 'user_sarah',            # Personalized
    'kids': 'user_children',        # Kid-safe responses
 }
 ```
 ## Voice Adaptation and Multi-User Support
 ### Challenge: Different Voices, Same Wake Word
 When multiple people use the system:
 - Different accents
 - Different speech patterns
 - Different pronunciations
 - Different vocal characteristics
 ### Solution Approaches
 #### Approach 1: Diverse Training Data (Recommended)
 **During initial training:**
 ```bash
 # Have everyone in household record samples
 cd ~/precise-models/hey-computer
 # Alan records 30 samples
 precise-collect  # Record as user 1
 # Sarah records 30 samples  
 precise-collect  # Record as user 2
 # Kids record 20 samples
 precise-collect  # Record as user 3
 # Combine all in training set
 # Train one model that works for everyone
 ./3-train-model.sh
 ```
 **Pros:**
 ✅ Single model for everyone  
 ✅ No user switching needed  
 ✅ Simple to maintain  
 ✅ Works immediately for all users  
 **Cons:**
 ❌ May have lower per-person accuracy  
 ❌ Requires upfront time from everyone  
 ❌ Hard to add new users later  
 #### Approach 2: Incremental Training
 Start with your voice, add others over time:
 ```bash
 # Week 1: Train with Alan's voice
 cd ~/precise-models/hey-computer
 # Record and train with Alan's samples
 precise-train -e 60 hey-computer.net .
 # Week 2: Sarah wants to use it
 # Collect Sarah's samples
 mkdir -p sarah-samples/wake-word
 precise-collect  # Sarah records 20-30 samples
 # Add to existing training set
 cp sarah-samples/wake-word/* wake-word/
 # Retrain (continue from checkpoint)
 precise-train -e 30 hey-computer.net . \
    --from-checkpoint hey-computer.net
 # Now works for both Alan and Sarah!
 ```
 **Pros:**
 ✅ Gradual improvement  
 ✅ Don't need everyone upfront  
 ✅ Easy to add new users  
 ✅ Maintains accuracy for existing users  
 **Cons:**
 ❌ May not work well for new users initially  
 ❌ Requires retraining periodically  
 #### Approach 3: Per-User Models with Speaker Identification
 Train separate models + identify who's speaking:
 **Step 1: Train per-user wake word models**
 ```bash
 # Alan's model
 ~/precise-models/hey-computer-alan/
 # Sarah's model
 ~/precise-models/hey-computer-sarah/
 # Kids' model
 ~/precise-models/hey-computer-kids/
 ```
 **Step 2: Use speaker identification**
 ```python
 # Pseudo-code for speaker identification
 def identify_speaker(audio):
    """
    Identify speaker from voice characteristics
    Using speaker embeddings (x-vectors, d-vectors)
    """
    # Extract speaker embedding
    embedding = speaker_encoder.encode(audio)
    # Compare to known users
    similarities = {
        'alan': cosine_similarity(embedding, alan_embedding),
        'sarah': cosine_similarity(embedding, sarah_embedding),
        'kids': cosine_similarity(embedding, kids_embedding),
    }
    # Return most similar
    return max(similarities, key=similarities.get)
 def process_command(audio):
    # Detect wake word with all models
    wake_detected = check_all_models(audio)
    if wake_detected:
        # Identify speaker
        speaker = identify_speaker(audio)
        # Use speaker-specific model for better accuracy
        model = f'~/precise-models/hey-computer-{speaker}/'
        # Continue with speaker context
        process_with_context(audio, speaker)
 ```
 **Speaker identification libraries:**
 - **Resemblyzer** - Simple speaker verification
 - **speechbrain** - Complete toolkit
 - **pyannote.audio** - You already use this for diarization!
 **Implementation:**
 ```bash
 # You already have pyannote for diarization!
 conda activate voice-assistant
 pip install pyannote.audio --break-system-packages
 # Can use speaker embeddings for identification
 ```
 ```python
 from pyannote.audio import Inference
 # Load speaker embedding model
 inference = Inference(
    "pyannote/embedding",
    use_auth_token=hf_token
 )
 # Extract embeddings for known users
 alan_embedding = inference("alan_voice_sample.wav")
 sarah_embedding = inference("sarah_voice_sample.wav")
 # Compare with incoming audio
 unknown_embedding = inference(audio_buffer)
 from scipy.spatial.distance import cosine
 alan_similarity = 1 - cosine(unknown_embedding, alan_embedding)
 sarah_similarity = 1 - cosine(unknown_embedding, sarah_embedding)
 if alan_similarity > 0.8:
    user = 'alan'
 elif sarah_similarity > 0.8:
    user = 'sarah'
 else:
    user = 'unknown'
 ```
 **Pros:**
 ✅ Personalized responses per user  
 ✅ Better accuracy (model optimized for each voice)  
 ✅ User-specific preferences/permissions  
 ✅ Can track who said what  
 **Cons:**
 ❌ More complex setup  
 ❌ Higher resource usage  
 ❌ Requires voice samples from each user  
 ❌ Privacy considerations  
 #### Approach 4: Adaptive/Online Learning
 Model improves automatically based on usage:
 ```python
 class AdaptiveWakeWord:
    def __init__(self, base_model):
        self.base_model = base_model
        self.user_samples = []
        self.retrain_threshold = 50  # Retrain after N samples
    def on_detection(self, audio, user_confirmed=True):
        """User confirms this was correct detection"""
        if user_confirmed:
            self.user_samples.append(audio)
            # Periodically retrain
            if len(self.user_samples) >= self.retrain_threshold:
                self.retrain_with_samples()
                self.user_samples = []
    def retrain_with_samples(self):
        """Background retraining with collected samples"""
        # Add samples to training set
        # Retrain model
        # Swap in new model
        pass
 ```
 **Pros:**
 ✅ Automatic improvement  
 ✅ Adapts to user's voice over time  
 ✅ No manual retraining  
 ✅ Gets better with use  
 **Cons:**
 ❌ Complex implementation  
 ❌ Requires user feedback mechanism  
 ❌ Risk of drift/degradation  
 ❌ Background training overhead  
 ## Recommended Strategy
 ### Phase 1: Single Wake Word, Single Model
 ```bash
 # Week 1-2
 # Use pre-trained "Hey Mycroft"
 # OR train custom "Hey Computer" with all family members' voices
 # Keep it simple, get it working
 ```
 ### Phase 2: Add Fine-tuning
 ```bash
 # Week 3-4
 # Collect false positives/negatives
 # Retrain with household-specific data
 # Optimize threshold
 ```
 ### Phase 3: Consider Multiple Wake Words
 ```bash
 # Month 2
 # If needed, add second wake word
 # "Hey Mycroft" for general
 # "Jarvis" for personal assistant tasks
 ```
 ### Phase 4: Personalization
 ```bash
 # Month 3+
 # If desired, add speaker identification
 # Personalized responses
 # User-specific preferences
 ```
 ## Practical Examples
 ### Example 1: Family of 4, Single Model
 ```bash
 # Training session with everyone
 cd ~/precise-models/hey-mycroft-family
 # Dad records 25 samples
 precise-collect  
 # Mom records 25 samples
 precise-collect
 # Kid 1 records 15 samples
 precise-collect
 # Kid 2 records 15 samples
 precise-collect
 # Collect shared negative samples (200+)
 # TV, music, conversation, etc.
 precise-collect -f not-wake-word/household.wav
 # Train single model for everyone
 precise-train -e 60 hey-mycroft-family.net .
 # Deploy
 python voice_server.py \
    --enable-precise \
    --precise-model hey-mycroft-family.net
 ```
 **Result:** Everyone can use it, one model, simple.
 ### Example 2: Two Wake Words, Different Purposes
 ```python
 # voice_server.py configuration
 wake_words = {
    'hey_mycroft': {
        'model': 'hey-mycroft.net',
        'sensitivity': 0.5,
        'intent_parser': 'general',  # All commands
        'response': 'Yes?'
    },
    'emergency': {
        'model': 'emergency.net',
        'sensitivity': 0.7,  # Higher threshold
        'intent_parser': 'emergency',  # Limited commands
        'response': 'Emergency mode activated'
    }
 }
 # "Hey Mycroft, turn on the lights" - works
 # "Emergency, call for help" - triggers emergency protocol
 ```
 ### Example 3: Speaker Identification + Personalization
 ```python
 # Enhanced processing with speaker ID
 def process_with_speaker_id(audio, speaker):
    # Different HA entity based on speaker
    entity_maps = {
        'alan': {
            'bedroom_light': 'light.master_bedroom',
            'office_light': 'light.alan_office',
        },
        'sarah': {
            'bedroom_light': 'light.master_bedroom',
            'office_light': 'light.sarah_office',
        },
        'kids': {
            'bedroom_light': 'light.kids_bedroom',
            'tv': None,  # Kids can't control TV
        }
    }
    # Transcribe command
    text = whisper_transcribe(audio)
    # "Turn on bedroom light"
    if 'bedroom light' in text:
        entity = entity_maps[speaker]['bedroom_light']
        ha_client.turn_on(entity)
        response = f"Turned on your bedroom light"
    return response
 ```
 ## Resource Requirements
 ### Single Wake Word
 - **CPU:** 5-10% (Heimdall)
 - **RAM:** 100-200MB
 - **Model size:** 1-3MB
 - **Training time:** 30-60 min
 ### Multiple Wake Words (3 models)
 - **CPU:** 15-30% (Heimdall)
 - **RAM:** 300-600MB
 - **Model size:** 3-9MB total
 - **Training time:** 90-180 min
 ### With Speaker Identification
 - **CPU:** +5-10% for speaker ID
 - **RAM:** +200-300MB for embedding model
 - **Model size:** +50MB for speaker model
 - **Setup time:** +30-60 min for voice enrollment
 ### K210 Edge (Maix Duino)
 - **Single model:** Feasible, ~30% CPU
 - **2 models:** Feasible, ~60% CPU, higher latency
 - **3+ models:** Not recommended
 - **Speaker ID:** Not feasible (limited RAM/compute)
 ## Quick Decision Guide
 **Just getting started?**
 → Use pre-trained "Hey Mycroft"
 **Want custom wake word?**
 → Train one model with all family voices
 **Need multiple wake words?**
 → Start server-side with 2-3 models
 **Want personalization?**
 → Add speaker identification
 **Deploying to edge (K210)?**
 → Stick to 1-2 wake words maximum
 **Family of 4+ people?**
 → Train single model with everyone's voice
 **Privacy is paramount?**
 → Skip speaker ID, use single universal model
 ## Testing Multiple Wake Words
 ```bash
 # Test all wake words quickly
 conda activate precise
 # Terminal 1: Hey Mycroft
 precise-listen hey-mycroft.net
 # Terminal 2: Hey Computer  
 precise-listen hey-computer.net
 # Terminal 3: Emergency
 precise-listen emergency.net
 # Say each wake word, verify correct detection
 ```
 ## Conclusion
 ### For Your Maix Duino Project:
 **Recommended approach:**
 1. **Start with "Hey Mycroft"** - Use pre-trained model
 2. **Fine-tune if needed** - Add your household's voices
 3. **Consider 2nd wake word** - Only if you have a specific use case
 4. **Speaker ID** - Phase 2/3 enhancement, not critical for MVP
 5. **Keep it simple** - One wake word works great for most users
 **The pre-trained "Hey Mycroft" model saves you 1-2 hours** and works immediately. You can always fine-tune or add custom wake words later!
 **Multiple wake words are cool but not necessary** - Most commercial products use just one. Focus on making one wake word work really well before adding more.
 **Voice adaptation** - Training with multiple voices upfront is simpler than per-user models. Save speaker ID for later if you need personalization.
 ## Quick Start with Pre-trained
 ```bash
 # On Heimdall
 cd ~/precise-models/pretrained
 wget https://github.com/MycroftAI/precise-data/raw/models-dev/hey-mycroft.tar.gz
 tar xzf hey-mycroft.tar.gz
 # Test it
 conda activate precise
 precise-listen hey-mycroft.net
 # Deploy
 cd ~/voice-assistant
 python voice_server.py \
    --enable-precise \
    --precise-model ~/precise-models/pretrained/hey-mycroft.net
 # You're done! No training needed!
 ```
 **That's it - you have a working wake word in 5 minutes!** 🎉
--- a/docs/WAKE_WORD_QUICK_REF.md
+++ b/docs/WAKE_WORD_QUICK_REF.md
@ -0,0 +1,411 @@
 # Wake Word Quick Reference Card
 ## 🎯 TL;DR: What Should I Do?
 ### Recommendation for Your Setup
 **Week 1:** Use pre-trained "Hey Mycroft"
 ```bash
 ./download_pretrained_models.sh --model hey-mycroft
 precise-listen ~/precise-models/pretrained/hey-mycroft.net
 ```
 **Week 2-3:** Fine-tune with all family members' voices
 ```bash
 cd ~/precise-models/hey-mycroft-family
 precise-train -e 30 custom.net . --from-checkpoint ../pretrained/hey-mycroft.net
 ```
 **Week 4+:** Add speaker identification
 ```bash
 pip install resemblyzer
 python enroll_speaker.py --name Alan --duration 20
 python enroll_speaker.py --name [Family] --duration 20
 ```
 **Month 2+:** Add second wake word (Hey Jarvis for Plex?)
 ```bash
 ./download_pretrained_models.sh --model hey-jarvis
 # Run both in parallel on server
 ```
 ---
 ## 📋 Pre-trained Models
 ### Available Models (Ready to Use!)
 | Wake Word | Download | Best For |
 |-----------|----------|----------|
 | **Hey Mycroft** ⭐ | `--model hey-mycroft` | Default choice, most data |
 | **Hey Jarvis** | `--model hey-jarvis` | Pop culture, media control |
 | **Christopher** | `--model christopher` | Unique, less common |
 | **Hey Ezra** | `--model hey-ezra` | Alternative option |
 ### Quick Download
 ```bash
 # Download one
 ./download_pretrained_models.sh --model hey-mycroft
 # Download all
 ./download_pretrained_models.sh --test-all
 # Test immediately
 precise-listen ~/precise-models/pretrained/hey-mycroft.net
 ```
 ---
 ## 🔢 Multiple Wake Words
 ### Option 1: Multiple Models (Server-Side) ⭐ RECOMMENDED
 **What:** Run 2-3 different wake word models simultaneously  
 **Where:** Heimdall (server)  
 **Performance:** ~15-30% CPU for 3 models  
 ```bash
 # Start with multiple wake words
 python voice_server.py \
    --enable-precise \
    --precise-models "\
 hey-mycroft:~/models/hey-mycroft.net:0.5,\
 hey-jarvis:~/models/hey-jarvis.net:0.5"
 ```
 **Pros:**
 - ✅ Can identify which wake word was used
 - ✅ Different contexts (Mycroft=commands, Jarvis=media)
 - ✅ Easy to add/remove wake words
 - ✅ Each can have different sensitivity
 **Cons:**
 - ❌ Only works server-side (not on Maix Duino)
 - ❌ Higher CPU usage (but still reasonable)
 **Use When:**
 - You want different wake words for different purposes
 - Server has CPU to spare (yours does!)
 - Want flexibility to add wake words later
 ### Option 2: Single Multi-Phrase Model (Edge-Compatible)
 **What:** One model responds to multiple phrases  
 **Where:** Server OR Maix Duino  
 **Performance:** Same as single model  
 ```bash
 # Train on multiple phrases
 cd ~/precise-models/multi-wake
 # Record "Hey Mycroft" samples → wake-word/
 # Record "Hey Computer" samples → wake-word/
 # Record negatives → not-wake-word/
 precise-train -e 60 multi-wake.net .
 ```
 **Pros:**
 - ✅ Single model = less compute
 - ✅ Works on edge (K210)
 - ✅ Simple deployment
 **Cons:**
 - ❌ Can't tell which wake word was used
 - ❌ May reduce accuracy
 - ❌ Higher false positive risk
 **Use When:**
 - Deploying to Maix Duino (edge)
 - Want backup wake words
 - Don't care which was used
 ---
 ## 👥 Multi-User Support
 ### Option 1: Inclusive Training ⭐ START HERE
 **What:** One model, all voices  
 **How:** All family members record samples  
 ```bash
 cd ~/precise-models/family-wake
 # Alice records 30 samples
 # Bob records 30 samples  
 # You record 30 samples
 precise-train -e 60 family-wake.net .
 ```
 **Pros:**
 - ✅ Everyone can use it
 - ✅ Simple deployment
 - ✅ Single model
 **Cons:**
 - ❌ Can't identify who spoke
 - ❌ No personalization
 **Use When:**
 - Just getting started
 - Don't need to know who spoke
 - Want simplicity
 ### Option 2: Speaker Identification (Week 4+)
 **What:** Detect wake word, then identify speaker  
 **How:** Voice embeddings (resemblyzer or pyannote)  
 ```bash
 # Install
 pip install resemblyzer
 # Enroll users
 python enroll_speaker.py --name Alan --duration 20
 python enroll_speaker.py --name Alice --duration 20
 python enroll_speaker.py --name Bob --duration 20
 # Server identifies speaker automatically
 ```
 **Pros:**
 - ✅ Personalized responses
 - ✅ User-specific permissions
 - ✅ Better privacy
 - ✅ Track preferences
 **Cons:**
 - ❌ More complex
 - ❌ Requires enrollment
 - ❌ +100-200ms latency
 - ❌ May fail with similar voices
 **Use When:**
 - Want personalization
 - Need user-specific commands
 - Ready for advanced features
 ### Option 3: Per-User Wake Words (Advanced)
 **What:** Each person has their own wake word  
 **How:** Multiple models, one per person  
 ```bash
 # Alice: "Hey Mycroft"
 # Bob: "Hey Jarvis"
 # You: "Hey Computer"
 # Run all 3 models in parallel
 ```
 **Pros:**
 - ✅ Automatic user ID
 - ✅ Highest accuracy per user
 - ✅ Clear separation
 **Cons:**
 - ❌ 3x models = 3x CPU
 - ❌ Users must remember their word
 - ❌ Server-only (not edge)
 **Use When:**
 - Need automatic user ID
 - Have CPU to spare
 - Users want their own wake word
 ---
 ## 🎯 Decision Tree
 ```
 START: Want to use voice assistant
  │
  ├─ Single user or don't care who spoke?
  │   └─ Use: Inclusive Training (Option 1)
  │       └─ Download: Hey Mycroft (pre-trained)
  │
  ├─ Multiple users AND need to know who spoke?
  │   └─ Use: Speaker Identification (Option 2)
  │       └─ Start with: Hey Mycroft + resemblyzer
  │
  ├─ Want different wake words for different purposes?
  │   └─ Use: Multiple Models (Option 1)
  │       └─ Download: Hey Mycroft + Hey Jarvis
  │
  └─ Deploying to Maix Duino (edge)?
      └─ Use: Single Multi-Phrase Model (Option 2)
          └─ Train: Custom model with 2-3 phrases
 ```
 ---
 ## 📊 Comparison Table
 | Feature | Inclusive | Speaker ID | Per-User Wake | Multiple Wake |
 |---------|-----------|------------|---------------|---------------|
 | **Setup Time** | 2 hours | 4 hours | 6 hours | 3 hours |
 | **Complexity** | ⭐ Easy | ⭐⭐⭐ Medium | ⭐⭐⭐⭐ Hard | ⭐⭐ Easy |
 | **CPU Usage** | 5-10% | 10-15% | 15-30% | 15-30% |
 | **Latency** | 100ms | 300ms | 100ms | 100ms |
 | **User ID** | ❌ No | ✅ Yes | ✅ Yes | ❌ No |
 | **Edge Deploy** | ✅ Yes | ⚠️ Maybe | ❌ No | ⚠️ Partial |
 | **Personalize** | ❌ No | ✅ Yes | ✅ Yes | ⚠️ Partial |
 ---
 ## 🚀 Recommended Timeline
 ### Week 1: Get It Working
 ```bash
 # Use pre-trained Hey Mycroft
 ./download_pretrained_models.sh --model hey-mycroft
 # Test it
 precise-listen ~/precise-models/pretrained/hey-mycroft.net
 # Deploy to server
 python voice_server.py --enable-precise \
    --precise-model ~/precise-models/pretrained/hey-mycroft.net
 ```
 ### Week 2-3: Make It Yours
 ```bash
 # Fine-tune with your family's voices
 cd ~/precise-models/hey-mycroft-family
 # Have everyone record 20-30 samples
 precise-collect  # Alice
 precise-collect  # Bob
 precise-collect  # You
 # Train
 precise-train -e 30 custom.net . \
    --from-checkpoint ../pretrained/hey-mycroft.net
 ```
 ### Week 4+: Add Intelligence
 ```bash
 # Speaker identification
 pip install resemblyzer
 python enroll_speaker.py --name Alan --duration 20
 python enroll_speaker.py --name Alice --duration 20
 # Now server knows who's speaking!
 ```
 ### Month 2+: Expand Features
 ```bash
 # Add second wake word for media control
 ./download_pretrained_models.sh --model hey-jarvis
 # Run both: Mycroft for commands, Jarvis for Plex
 python voice_server.py --enable-precise \
    --precise-models "mycroft:hey-mycroft.net:0.5,jarvis:hey-jarvis.net:0.5"
 ```
 ---
 ## 💡 Pro Tips
 ### Wake Word Selection
 - ✅ **DO:** Choose clear, distinct wake words
 - ✅ **DO:** Test in your environment
 - ❌ **DON'T:** Use similar-sounding words
 - ❌ **DON'T:** Use common phrases
 ### Training
 - ✅ **DO:** Include all intended users
 - ✅ **DO:** Record in various conditions
 - ✅ **DO:** Add false positives to training
 - ❌ **DON'T:** Rush the training process
 ### Deployment
 - ✅ **DO:** Start simple (one wake word)
 - ✅ **DO:** Test thoroughly before adding features
 - ✅ **DO:** Monitor false positive rate
 - ❌ **DON'T:** Deploy too many wake words at once
 ### Speaker ID
 - ✅ **DO:** Use 20+ seconds for enrollment
 - ✅ **DO:** Re-enroll if accuracy drops
 - ✅ **DO:** Test threshold values
 - ❌ **DON'T:** Expect 100% accuracy
 ---
 ## 🔧 Quick Commands
 ```bash
 # Download pre-trained model
 ./download_pretrained_models.sh --model hey-mycroft
 # Test model
 precise-listen ~/precise-models/pretrained/hey-mycroft.net
 # Fine-tune from pre-trained
 precise-train -e 30 custom.net . \
    --from-checkpoint ~/precise-models/pretrained/hey-mycroft.net
 # Enroll speaker
 python enroll_speaker.py --name Alan --duration 20
 # Start with single wake word
 python voice_server.py --enable-precise \
    --precise-model hey-mycroft.net
 # Start with multiple wake words
 python voice_server.py --enable-precise \
    --precise-models "mycroft:hey-mycroft.net:0.5,jarvis:hey-jarvis.net:0.5"
 # Check status
 curl http://10.1.10.71:5000/wake-word/status
 # Monitor detections
 curl http://10.1.10.71:5000/wake-word/detections
 ```
 ---
 ## 📚 See Also
 - **Full guide:** [ADVANCED_WAKE_WORD_TOPICS.md](ADVANCED_WAKE_WORD_TOPICS.md)
 - **Training:** [MYCROFT_PRECISE_GUIDE.md](MYCROFT_PRECISE_GUIDE.md)
 - **Deployment:** [PRECISE_DEPLOYMENT.md](PRECISE_DEPLOYMENT.md)
 - **Getting started:** [QUICKSTART.md](QUICKSTART.md)
 ---
 ## ❓ FAQ
 **Q: Can I use "Hey Mycroft" right away?**  
 A: Yes! Download with `./download_pretrained_models.sh --model hey-mycroft`
 **Q: How many wake words can I run at once?**  
 A: 2-3 comfortably on server. Maix Duino can handle 1.
 **Q: Can I train my own custom wake word?**  
 A: Yes! See MYCROFT_PRECISE_GUIDE.md Phase 2.
 **Q: Does speaker ID work with multiple wake words?**  
 A: Yes! Wake word detected → Speaker identified → Personalized response.
 **Q: Can I use this on Maix Duino?**  
 A: Server-side (start here), then convert to KMODEL (advanced).
 **Q: How accurate is speaker identification?**  
 A: 85-95% with good enrollment. Re-enroll if accuracy drops.
 **Q: What if someone has a cold?**  
 A: May reduce accuracy temporarily. System should recover when voice returns to normal.
 **Q: Can kids use it?**  
 A: Yes! Include their voices in training or enroll them separately.
 ---
 **Quick Decision:** Start with pre-trained Hey Mycroft. Add features later!
 ```bash
 ./download_pretrained_models.sh --model hey-mycroft
 precise-listen ~/precise-models/pretrained/hey-mycroft.net
 # It just works! ✨
 ```
--- a/docs/maix-voice-assistant-architecture.md
+++ b/docs/maix-voice-assistant-architecture.md
@ -0,0 +1,347 @@
 # Maix Duino Voice Assistant - System Architecture
 ## Overview
 Local voice assistant using Sipeed Maix Duino board integrated with Home Assistant, leveraging existing home lab infrastructure for AI processing.
 ## Hardware Components
 ### Maix Duino Board
 - **Processor**: K210 dual-core RISC-V @ 400MHz
 - **AI Accelerator**: KPU for neural network inference
 - **Audio**: I2S microphone + speaker output
 - **Connectivity**: ESP32 for WiFi/BLE
 - **Programming**: MaixPy (MicroPython)
 ### Recommended Accessories
 - I2S MEMS microphone (or microphone array for better pickup)
 - Small speaker (3-5W) or audio output to existing speakers
 - USB-C power supply (5V/2A minimum)
 ## Software Architecture
 ### Edge Layer (Maix Duino)
 ```
 ┌─────────────────────────────────────┐
 │   Maix Duino (MaixPy)              │
 ├─────────────────────────────────────┤
 │ • Wake Word Detection (KPU)        │
 │ • Audio Capture (I2S)               │
 │ • Audio Streaming → Heimdall        │
 │ • Audio Playback ← Heimdall         │
 │ • LED Feedback (listening status)   │
 └─────────────────────────────────────┘
           ↕ WiFi/HTTP/WebSocket
 ┌─────────────────────────────────────┐
 │   Voice Processing Server           │
 │   (Heimdall - 10.1.10.71)          │
 ├─────────────────────────────────────┤
 │ • Whisper STT (existing setup!)     │
 │ • Intent Recognition (Rasa/custom)  │
 │ • Piper TTS                         │
 │ • Home Assistant API Client         │
 └─────────────────────────────────────┘
           ↕ REST API/MQTT
 ┌─────────────────────────────────────┐
 │   Home Assistant                    │
 │   (Your HA instance)                │
 ├─────────────────────────────────────┤
 │ • Device Control                    │
 │ • State Management                  │
 │ • Automation Triggers               │
 └─────────────────────────────────────┘
 ```
 ## Communication Flow
 ### 1. Wake Word Detection (Local)
 ```
 User says "Hey Assistant"
    ↓
 Maix Duino KPU detects wake word
    ↓
 LED turns on (listening mode)
    ↓
 Start audio streaming to Heimdall
 ```
 ### 2. Speech Processing (Heimdall)
 ```
 Audio stream received
    ↓
 Whisper transcribes to text
    ↓
 Intent parser extracts command
    ↓
 Query Home Assistant API
    ↓
 Generate response text
    ↓
 Piper TTS creates audio
    ↓
 Stream audio back to Maix Duino
 ```
 ### 3. Playback & Feedback
 ```
 Receive audio stream
    ↓
 Play through speaker
    ↓
 LED indicates completion
    ↓
 Return to wake word detection
 ```
 ## Network Configuration
 ### Maix Duino Network Settings
 - **IP**: 10.1.10.xxx (assign static via DHCP reservation)
 - **Gateway**: 10.1.10.1
 - **DNS**: 10.1.10.4 (Pi-hole)
 ### Service Endpoints
 - **Voice Processing Server**: http://10.1.10.71:5000
 - **Home Assistant**: (your existing HA URL)
 - **MQTT Broker**: (optional, if using MQTT)
 ### Caddy Reverse Proxy Entry
 Add to `/mnt/project/epona_-_Caddyfile`:
 ```caddy
 # Voice Assistant API
 handle /voice-assistant* {
    uri strip_prefix /voice-assistant
    reverse_proxy http://10.1.10.71:5000
 }
 ```
 ## Software Stack
 ### Maix Duino (MaixPy)
 - **Firmware**: Latest MaixPy release
 - **Libraries**:
  - `Maix.KPU` - Neural network inference
  - `Maix.I2S` - Audio capture/playback
  - `socket` - Network communication
  - `ujson` - JSON handling
 ### Heimdall Server (Python)
 - **Environment**: Create new conda env
  ```bash
  conda create -n voice-assistant python=3.10
  conda activate voice-assistant
  ```
 - **Dependencies**:
  - `openai-whisper` (already installed!)
  - `piper-tts` - Text-to-speech
  - `flask` - REST API server
  - `requests` - HTTP client
  - `pyaudio` - Audio handling
  - `websockets` - Real-time streaming
 ### Optional: Intent Recognition
 - **Rasa** - Full NLU framework (heavier but powerful)
 - **Simple pattern matching** - Lightweight, start here
 - **LLM-based** - Use your existing LLM setup on Heimdall
 ## Data Flow Examples
 ### Example 1: Turn on lights
 ```
 User: "Hey Assistant, turn on the living room lights"
    ↓
 Wake word detected → Start recording
    ↓
 Whisper STT: "turn on the living room lights"
    ↓
 Intent Parser: {
  "action": "turn_on",
  "entity": "light.living_room"
 }
    ↓
 Home Assistant API:
  POST /api/services/light/turn_on
  {"entity_id": "light.living_room"}
    ↓
 Response: "Living room lights turned on"
    ↓
 Piper TTS → Audio playback
 ```
 ### Example 2: Get status
 ```
 User: "What's the temperature?"
    ↓
 Whisper STT: "what's the temperature"
    ↓
 Intent Parser: {
  "action": "get_state",
  "entity": "sensor.temperature"
 }
    ↓
 Home Assistant API:
  GET /api/states/sensor.temperature
    ↓
 Response: "The temperature is 72 degrees"
    ↓
 Piper TTS → Audio playback
 ```
 ## Phase 1 Implementation Plan
 ### Step 1: Maix Duino Setup (Week 1)
 - [ ] Flash latest MaixPy firmware
 - [ ] Test audio input/output
 - [ ] Implement basic network communication
 - [ ] Test streaming audio to server
 ### Step 2: Server Setup (Week 1-2)
 - [ ] Create conda environment on Heimdall
 - [ ] Set up Flask API server
 - [ ] Integrate Whisper (already have this!)
 - [ ] Install and test Piper TTS
 - [ ] Create basic Home Assistant API client
 ### Step 3: Wake Word Training (Week 2)
 - [ ] Record wake word samples
 - [ ] Train custom wake word model
 - [ ] Convert model for K210 KPU
 - [ ] Test on-device detection
 ### Step 4: Integration (Week 3)
 - [ ] Connect all components
 - [ ] Test end-to-end flow
 - [ ] Add error handling
 - [ ] Implement fallbacks
 ### Step 5: Enhancement (Week 4+)
 - [ ] Add more intents
 - [ ] Improve NLU accuracy
 - [ ] Add multi-room support
 - [ ] Implement conversation context
 ## Development Tools
 ### Testing Wake Word
 ```python
 # Use existing diarization.py for testing audio quality
 python3 /path/to/diarization.py test_audio.wav \
  --format vtt \
  --model medium
 ```
 ### Monitoring
 - Heimdall logs: `/var/log/voice-assistant/`
 - Maix Duino serial console: 115200 baud
 - Home Assistant logs: Standard HA logging
 ## Security Considerations
 1. **No external cloud services** - Everything local
 2. **Network isolation** - Keep on 10.1.10.0/24
 3. **Authentication** - Use HA long-lived tokens
 4. **Rate limiting** - Prevent abuse
 5. **Audio privacy** - Only stream after wake word
 ## Resource Requirements
 ### Heimdall
 - **CPU**: Minimal (< 5% idle, spikes during STT)
 - **RAM**: ~2GB for Whisper medium model
 - **Storage**: ~5GB for models
 - **Network**: Low bandwidth (16kHz audio stream)
 ### Maix Duino
 - **Power**: ~1-2W typical
 - **Storage**: 16MB flash (plenty for wake word model)
 - **RAM**: 8MB SRAM (sufficient for audio buffering)
 ## Alternative Architectures
 ### Option A: Fully On-Device (Limited)
 - Everything on Maix Duino
 - Very limited vocabulary
 - No internet required
 - Lower accuracy
 ### Option B: Hybrid (Recommended)
 - Wake word on Maix Duino
 - Processing on Heimdall
 - Best balance of speed/accuracy
 ### Option C: Raspberry Pi Alternative
 - If K210 proves limiting
 - More processing power
 - Still local/FOSS
 - Higher cost
 ## Expansion Ideas
 ### Future Enhancements
 1. **Multi-room**: Deploy multiple Maix Duino units
 2. **Music playback**: Integrate with Plex
 3. **Timers/Reminders**: Local scheduling
 4. **Weather**: Pull from local weather station
 5. **Calendar**: Sync with Nextcloud
 6. **Intercom**: Room-to-room communication
 7. **Sound events**: Doorbell, smoke alarm detection
 ### Integration with Existing Infrastructure
 - **Plex**: Voice control for media playback
 - **qBittorrent**: Status queries, torrent management
 - **Nextcloud**: Calendar/contact queries
 - **Matrix**: Send messages via voice
 ## Cost Estimate
 - Maix Duino board: ~$20-30 (already have!)
 - Microphone: ~$5-10 (if not included)
 - Speaker: ~$10-15 (or use existing)
 - **Total**: $0-55 (mostly already have)
 Compare to commercial solutions:
 - Google Home Mini: $50 (requires cloud)
 - Amazon Echo Dot: $50 (requires cloud)
 - Apple HomePod Mini: $99 (requires cloud)
 ## Success Criteria
 ### Minimum Viable Product (MVP)
 - ✓ Wake word detection < 1 second
 - ✓ Speech-to-text accuracy > 90%
 - ✓ Home Assistant command execution
 - ✓ Response time < 3 seconds total
 - ✓ All processing local (no cloud)
 ### Enhanced Version
 - ✓ Multi-intent conversations
 - ✓ Context awareness
 - ✓ Multiple wake words
 - ✓ Room-aware responses
 - ✓ Custom voice training
 ## Resources & Documentation
 ### Official Documentation
 - Maix Duino: https://wiki.sipeed.com/hardware/en/maix/
 - MaixPy: https://maixpy.sipeed.com/
 - Home Assistant API: https://developers.home-assistant.io/
 ### Wake Word Tools
 - Mycroft Precise: https://github.com/MycroftAI/mycroft-precise
 - Porcupine: https://github.com/Picovoice/porcupine
 ### TTS Options
 - Piper: https://github.com/rhasspy/piper
 - Coqui TTS: https://github.com/coqui-ai/TTS
 ### Community Projects
 - Rhasspy: https://rhasspy.readthedocs.io/ (full voice assistant framework)
 - Willow: https://github.com/toverainc/willow (ESP32-based alternative)
 ## Next Steps
 1. **Test current setup**: Verify Maix Duino boots and can connect to WiFi
 2. **Audio test**: Record and playback test on the board
 3. **Server setup**: Create conda environment and install dependencies
 4. **Simple prototype**: Wake word → beep (no processing yet)
 5. **Iterate**: Add complexity step by step
--- a/hardware/maixduino/MICROPYTHON_QUIRKS.md
+++ b/hardware/maixduino/MICROPYTHON_QUIRKS.md
@ -0,0 +1,348 @@
 # MicroPython/MaixPy Quirks and Compatibility Notes
 **Date:** 2025-12-03
 **MicroPython Version:** v0.6.2-89-gd8901fd22 on 2024-06-17
 **Hardware:** Sipeed Maixduino (K210)
 This document captures all the compatibility issues and workarounds discovered while developing the voice assistant client for Maixduino.
 ---
 ## String Formatting
 ### ❌ F-strings NOT supported
 ```python
 # WRONG - SyntaxError
 message = f"IP: {ip}"
 temperature = f"Temp: {temp}°C"
 ```
 ### ✅ Use string concatenation
 ```python
 # CORRECT
 message = "IP: " + str(ip)
 temperature = "Temp: " + str(temp) + "°C"
 ```
 ---
 ## Conditional Expressions (Ternary Operator)
 ### ❌ Inline ternary expressions NOT supported
 ```python
 # WRONG - SyntaxError
 plural = "s" if count > 1 else ""
 message = "Found " + str(count) + " item" + ("s" if count > 1 else "")
 ```
 ### ✅ Use explicit if/else blocks
 ```python
 # CORRECT
 if count > 1:
    plural = "s"
 else:
    plural = ""
 message = "Found " + str(count) + " item" + plural
 ```
 ---
 ## String Methods
 ### ❌ decode() doesn't accept keyword arguments
 ```python
 # WRONG - TypeError: function doesn't take keyword arguments
 text = response.decode('utf-8', errors='ignore')
 ```
 ### ✅ Use positional arguments only (or catch exceptions)
 ```python
 # CORRECT
 try:
    text = response.decode('utf-8')
 except:
    text = str(response)
 ```
 ---
 ## Display/LCD Color Format
 ### ❌ RGB tuples NOT accepted
 ```python
 # WRONG - TypeError: can't convert tuple to int
 COLOR_RED = (255, 0, 0)
 lcd.draw_string(10, 50, "Hello", COLOR_RED, 0)
 ```
 ### ✅ Use bit-packed integers
 ```python
 # CORRECT - Pack RGB into 16-bit or 24-bit integer
 def rgb_to_int(r, g, b):
    return (r << 16) | (g << 8) | b
 COLOR_RED = rgb_to_int(255, 0, 0)
 lcd.draw_string(10, 50, "Hello", COLOR_RED, 0)
 ```
 ---
 ## Network - WiFi Module
 ### ❌ Standard network.WLAN NOT available
 ```python
 # WRONG - AttributeError: 'module' object has no attribute 'WLAN'
 import network
 nic = network.WLAN(network.STA_IF)
 ```
 ### ✅ Use network.ESP32_SPI for Maixduino
 ```python
 # CORRECT - Requires full pin configuration
 from network import ESP32_SPI
 from fpioa_manager import fm
 # Register all 6 SPI pins
 fm.register(25, fm.fpioa.GPIOHS10, force=True)  # CS
 fm.register(8, fm.fpioa.GPIOHS11, force=True)   # RST
 fm.register(9, fm.fpioa.GPIOHS12, force=True)   # RDY
 fm.register(28, fm.fpioa.GPIOHS13, force=True)  # MOSI
 fm.register(26, fm.fpioa.GPIOHS14, force=True)  # MISO
 fm.register(27, fm.fpioa.GPIOHS15, force=True)  # SCLK
 nic = ESP32_SPI(
    cs=fm.fpioa.GPIOHS10,
    rst=fm.fpioa.GPIOHS11,
    rdy=fm.fpioa.GPIOHS12,
    mosi=fm.fpioa.GPIOHS13,
    miso=fm.fpioa.GPIOHS14,
    sclk=fm.fpioa.GPIOHS15
 )
 nic.connect(SSID, PASSWORD)
 ```
 ### ❌ active() method NOT available
 ```python
 # WRONG - AttributeError: 'ESP32_SPI' object has no attribute 'active'
 nic.active(True)
 ```
 ### ✅ Just use connect() directly
 ```python
 # CORRECT
 nic.connect(SSID, PASSWORD)
 ```
 ---
 ## I2S Audio
 ### ❌ record() doesn't accept size parameter only
 ```python
 # WRONG - TypeError: object with buffer protocol required
 chunk = i2s_dev.record(1024)
 ```
 ### ✅ Returns Audio object, use to_bytes()
 ```python
 # CORRECT
 audio_obj = i2s_dev.record(total_bytes)
 audio_data = audio_obj.to_bytes()
 ```
 **Note:** Audio data often comes in unexpected formats:
 - Expected: 16-bit mono PCM
 - Reality: Often 32-bit or stereo (4x expected size)
 - Solution: Implement format detection and conversion
 ---
 ## Memory Management
 ### Memory is VERY limited (~6MB total, much less available)
 **Problems encountered:**
 - Creating large bytearrays fails (>100KB can fail)
 - Multiple allocations cause fragmentation
 - In-place operations preferred over creating new buffers
 ### ❌ Creating new buffers
 ```python
 # WRONG - MemoryError on large data
 compressed = bytearray()
 for i in range(0, len(data), 4):
    compressed.extend(data[i:i+2])  # Allocates new memory
 ```
 ### ✅ Work with smaller chunks or compress during transmission
 ```python
 # CORRECT - Process in smaller pieces
 chunk_size = 512
 for i in range(0, len(data), chunk_size):
    chunk = data[i:i+chunk_size]
    process_chunk(chunk)  # Handle incrementally
 ```
 **Solutions implemented:**
 1. Reduce recording duration (3s → 1s)
 2. Compress audio (μ-law: 50% size reduction)
 3. Stream transmission in small chunks (512 bytes)
 4. Add delays between sends to prevent buffer overflow
 ---
 ## String Operations
 ### ❌ Arithmetic in string concatenation
 ```python
 # WRONG - SyntaxError (sometimes)
 message = "Count: #" + str(count + 1)
 ```
 ### ✅ Separate arithmetic from concatenation
 ```python
 # CORRECT
 next_count = count + 1
 message = "Count: #" + str(next_count)
 ```
 ---
 ## Bytearray Operations
 ### ❌ Item deletion NOT supported
 ```python
 # WRONG - TypeError: 'bytearray' object doesn't support item deletion
 del audio_data[expected_size:]
 ```
 ### ✅ Create new bytearray with slice
 ```python
 # CORRECT
 audio_data = audio_data[:expected_size]
 # Or create new buffer
 trimmed = bytearray(expected_size)
 trimmed[:] = audio_data[:expected_size]
 ```
 ---
 ## HTTP Requests
 ### ❌ urequests module NOT available
 ```python
 # WRONG - ImportError: no module named 'urequests'
 import urequests
 response = urequests.post(url, data=data)
 ```
 ### ✅ Use raw socket HTTP
 ```python
 # CORRECT
 import socket
 s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
 s.connect((host, port))
 # Manual HTTP headers
 headers = "POST /path HTTP/1.1\r\n"
 headers += "Host: " + host + "\r\n"
 headers += "Content-Type: audio/wav\r\n"
 headers += "Content-Length: " + str(len(data)) + "\r\n"
 headers += "Connection: close\r\n\r\n"
 s.send(headers.encode())
 s.send(data)
 response = s.recv(1024)
 s.close()
 ```
 **Socket I/O errors common:**
 - `[Errno 5] EIO` - Buffer overflow or disconnect
 - Solutions:
  - Send smaller chunks (512-1024 bytes)
  - Add delays between sends (`time.sleep_ms(10)`)
  - Enable keepalive if supported
 ---
 ## Best Practices for MaixPy
 1. **Avoid complex expressions** - Break into simple steps
 2. **Pre-allocate when possible** - Reduce fragmentation
 3. **Use small buffers** - 512-1024 byte chunks work well
 4. **Add delays in loops** - Prevent watchdog/buffer issues
 5. **Explicit type conversions** - Always use `str()`, `int()`, etc.
 6. **Test incrementally** - Memory errors appear suddenly
 7. **Monitor serial output** - Errors often give hints
 8. **Simplify, simplify** - Complexity = bugs in MicroPython
 ---
 ## Testing Methodology
 When porting Python code to MaixPy:
 1. Start with simplest version (hardcoded values)
 2. Test each function individually via REPL
 3. Add features incrementally
 4. Watch for memory errors (usually allocation failures)
 5. If error occurs, simplify the last change
 6. Use print statements liberally (no debugger available)
 ---
 ## Hardware-Specific Notes
 ### Maixduino ESP32 WiFi
 - Requires manual pin registration
 - 6 pins must be configured (CS, RST, RDY, MOSI, MISO, SCLK)
 - Connection can be slow (20+ seconds)
 - Stability improves with smaller packet sizes
 ### I2S Microphone
 - Returns Audio objects, not raw bytes
 - Format is often different than configured
 - May return stereo when mono requested
 - May return 32-bit when 16-bit requested
 - Always implement format detection/conversion
 ### BOOT Button (GPIO 16)
 - Active low (0 = pressed, 1 = released)
 - Requires pull-up configuration
 - Debounce by waiting for release
 - Can be used without interrupts (polling is fine)
 ---
 ## Resources
 - **MaixPy Documentation:** https://maixpy.sipeed.com/
 - **K210 Datasheet:** https://canaan.io/product/kendryteai
 - **ESP32 SPI Firmware:** https://github.com/sipeed/MaixPy_scripts/tree/master/network
 ---
 ## Summary of Successful Patterns
 ```python
 # Audio recording and transmission pipeline
 1. Record audio → Audio object (128KB for 1 second)
 2. Convert to bytes → to_bytes() (still 128KB)
 3. Detect format → Check size vs expected
 4. Convert to mono 16-bit → In-place copy (32KB)
 5. Compress with μ-law → 50% reduction (16KB)
 6. Send in chunks → 512 bytes at a time with delays
 7. Parse response → Simple string operations
 # Total: ~85% size reduction, fits in memory!
 ```
 This approach works reliably on K210 with ~6MB RAM.
 ---
 **Last Updated:** 2025-12-03
 **Status:** Fully tested and working
--- a/hardware/maixduino/README.md
+++ b/hardware/maixduino/README.md
@ -0,0 +1,184 @@
 # Maixduino Scripts
 Scripts to copy/paste into MaixPy IDE for running on the Maix Duino board.
 ## Files
 ### 1. maix_test_simple.py
 **Purpose:** Hardware and connectivity test
 **Use:** Copy/paste into MaixPy IDE to test before deploying full application
 **Tests:**
 - LCD display functionality
 - WiFi connection
 - Network connection to Heimdall server (port 3006)
 - I2S audio hardware initialization
 **Before running:**
 1. Edit WiFi credentials (lines 16-17):
   ```python
   WIFI_SSID = "YourNetworkName"
   WIFI_PASSWORD = "YourPassword"
   ```
 2. Verify server URL is correct (line 18):
   ```python
   SERVER_URL = "http://10.1.10.71:3006"
   ```
 3. Copy entire file contents
 4. Paste into MaixPy IDE
 5. Click RUN button
 **Expected output:**
 - Display will show test results
 - Serial console will print detailed progress
 - Will report OK/FAIL for each test
 ---
 ### 2. maix_voice_client.py
 **Purpose:** Full voice assistant client
 **Use:** Copy/paste into MaixPy IDE after test passes
 **Features:**
 - Wake word detection (placeholder - uses amplitude trigger)
 - Audio recording after wake word
 - Sends audio to Heimdall server for processing
 - Displays transcription and response on LCD
 - LED feedback for status
 **Before running:**
 1. Edit WiFi credentials (lines 38-39)
 2. Verify server URL (line 42)
 3. Adjust audio settings if needed (lines 45-62)
 **For SD card deployment:**
 1. Copy this file to SD card as `main.py`
 2. Board will auto-run on boot
 ---
 ## Deployment Workflow
 ### Step 1: Test Hardware (maix_test_simple.py)
 ```
 1. Edit WiFi settings
 2. Paste into MaixPy IDE
 3. Click RUN
 4. Verify all tests pass
 ```
 ### Step 2: Deploy Full Client (maix_voice_client.py)
 **Option A - IDE Testing:**
 ```
 1. Edit WiFi settings
 2. Paste into MaixPy IDE
 3. Click RUN for testing
 ```
 **Option B - Permanent SD Card:**
 ```
 1. Edit WiFi settings
 2. Save to SD card as: /sd/main.py
 3. Reboot board - auto-runs on boot
 ```
 ---
 ## Hardware Requirements
 ### Maix Duino Board
 - K210 processor with KPU
 - LCD display (built-in)
 - I2S microphone (check connections)
 - ESP32 WiFi module (built-in)
 ### I2S Pin Configuration (Default)
 ```python
 Pin 20: I2S0_IN_D0  (Data)
 Pin 19: I2S0_WS     (Word Select)
 Pin 18: I2S0_SCLK   (Clock)
 ```
 **Note:** If your microphone uses different pins, edit the pin assignments in the scripts.
 ---
 ## Troubleshooting
 ### WiFi Won't Connect
 - Verify SSID and password are correct
 - Ensure WiFi is 2.4GHz (not 5GHz - Maix doesn't support 5GHz)
 - Check signal strength
 - Try moving closer to router
 ### Server Connection Fails
 - Verify Heimdall server is running on port 3006
 - Check firewall allows port 3006
 - Ensure Maix is on same network (10.1.10.0/24)
 - Test from another device: `curl http://10.1.10.71:3006/health`
 ### Audio Initialization Fails
 - Check microphone is properly connected
 - Verify I2S pins match your hardware
 - Try alternate pin configuration if needed
 - Check microphone requires 3.3V (not 5V)
 ### Script Errors in MaixPy IDE
 - Ensure using latest MaixPy firmware
 - Check for typos when editing WiFi credentials
 - Verify entire script was copied (check for truncation)
 - Look at serial console for detailed error messages
 ---
 ## MaixPy IDE Tips
 ### Running Scripts
 1. Connect board via USB
 2. Select correct board model: Tools → Select Board
 3. Click connect button (turns red when connected)
 4. Paste code into editor
 5. Click run button (red triangle)
 6. Watch serial console and LCD for output
 ### Stopping Scripts
 - Click run button again to stop
 - Or press reset button on board
 ### Serial Console
 - Shows detailed debug output
 - Useful for troubleshooting
 - Can copy errors for debugging
 ---
 ## Network Configuration
 - **Heimdall Server:** 10.1.10.71:3006
 - **Maix Duino:** Gets IP via DHCP (shown on LCD during test)
 - **Network:** 10.1.10.0/24
 ---
 ## Next Steps
 After both scripts work:
 1. Verify Heimdall server is processing audio
 2. Test wake word detection
 3. Integrate with Home Assistant (optional)
 4. Train custom wake word (optional)
 5. Deploy to SD card for permanent installation
 ---
 ## Related Documentation
 - **Project overview:** `../PROJECT_SUMMARY.md`
 - **Heimdall setup:** `../QUICKSTART.md`
 - **Wake word training:** `../MYCROFT_PRECISE_GUIDE.md`
 - **Server deployment:** `../docs/PRECISE_DEPLOYMENT.md`
 ---
 **Last Updated:** 2025-12-03
 **Location:** `/Library/Development/devl/Devops/projects/mycroft-precise/maixduino-scripts/`
--- a/hardware/maixduino/SESSION_PROGRESS_2025-12-03.md
+++ b/hardware/maixduino/SESSION_PROGRESS_2025-12-03.md
@ -0,0 +1,376 @@
 # Maixduino Voice Assistant - Session Progress
 **Date:** 2025-12-03
 **Session Duration:** ~4 hours
 **Goal:** Get audio recording and transcription working on Maixduino → Heimdall server
 ---
 ## 🎉 Major Achievements
 ### ✅ Full Audio Pipeline Working!
 We successfully built and tested the complete audio capture → compression → transmission → transcription pipeline:
 1. **WiFi Connection** - Maixduino connects to network (10.1.10.98)
 2. **Audio Recording** - I2S microphone captures audio (MSM261S4030H0 MEMS mic)
 3. **Format Conversion** - Converts 32-bit stereo to 16-bit mono (4x size reduction)
 4. **μ-law Compression** - Compresses PCM audio by 50%
 5. **HTTP Transmission** - Sends compressed WAV to Heimdall server
 6. **Whisper Transcription** - Server transcribes and returns text
 7. **LCD Display** - Shows transcription on Maixduino screen
 8. **Button Loop** - Press BOOT button for repeated recordings
 **Total size reduction:** 128KB → 32KB (mono) → 16KB (compressed) = **87.5% reduction!**
 ---
 ## 🔧 Technical Accomplishments
 ### Audio Recording Pipeline
 - **Initial Problem:** `i2s_dev.record()` returned immediately (1ms instead of 1000ms)
 - **Root Cause:** Recording API is asynchronous/non-blocking
 - **Solution:** Use chunked recording with `wait_record()` blocking calls
 - **Pattern:**
  ```python
  for i in range(frame_cnt):
      audio_chunk = i2s_dev.record(chunk_size)
      i2s_dev.wait_record()  # CRITICAL: blocks until complete
      chunks.append(audio_chunk.to_bytes())
  ```
 ### Memory Management
 - **K210 has very limited RAM** (~6MB total, much less available)
 - Successfully handled 128KB → 16KB data transformation without OOM errors
 - Techniques used:
  - Record in small chunks (2048 samples)
  - Stream HTTP transmission (512-byte chunks with delays)
  - In-place data conversion where possible
  - Explicit garbage collection hints (`audio_data = None`)
 ### Network Communication
 - **Raw socket HTTP** (no urequests library available)
 - **Chunked streaming** with flow control (10ms delays)
 - **Simple WAV format** with μ-law compression (format code 7)
 - **Robust error handling** with serial output debugging
 ---
 ## 🐛 MicroPython/MaixPy Quirks Discovered
 ### String Operations
 - ❌ **F-strings NOT supported** - Must use `"text " + str(var)` concatenation
 - ❌ **Ternary operators fail** - Use explicit `if/else` blocks instead
 - ❌ **`split()` needs explicit delimiter** - `text.split(" ")` not `text.split()`
 - ❌ **Escape sequences problematic** - Avoid `\n` in strings, causes syntax errors
 ### Data Types & Methods
 - ❌ **`decode()` doesn't accept kwargs** - Use `decode('utf-8')` not `decode('utf-8', errors='ignore')`
 - ❌ **RGB tuples not accepted** - Must convert to packed integers: `(r << 16) | (g << 8) | b`
 - ❌ **Bytearray item deletion unsupported** - `del arr[n:]` fails, use slicing instead
 - ❌ **Arithmetic in string concat** - Separate calculations: `next = count + 1; "text" + str(next)`
 ### I2S Audio Specific
 - ❌ **`record()` is non-blocking** - Returns immediately, must use `wait_record()`
 - ❌ **Audio object not directly iterable** - Must call `.to_bytes()` first
 - ⚠️ **Data format mismatch** - Hardware returns 32-bit stereo even when configured for 16-bit mono (4x expected size)
 ### Network/WiFi
 - ❌ **`network.WLAN` not available** - Must use `network.ESP32_SPI` with full pin config
 - ❌ **`active()` method doesn't exist** - Just call `connect()` directly
 - ⚠️ **Requires ALL 6 pins configured** - CS, RST, RDY, MOSI, MISO, SCLK
 ### General Syntax
 - ⚠️ **`if __name__ == "__main__"` sometimes causes syntax errors** - Safer to just call `main()` directly
 - ⚠️ **Import statements mid-function can cause syntax errors** - Keep imports at top of file
 - ⚠️ **Some valid Python causes "invalid syntax" for unknown reasons** - Simplify complex expressions
 ---
 ## 📊 Current Status
 ### ✅ Working
 - WiFi connectivity (ESP32 SPI)
 - I2S audio initialization
 - Chunked audio recording with `wait_record()`
 - Audio format detection and conversion (32-bit stereo → 16-bit mono)
 - μ-law compression (50% size reduction)
 - HTTP transmission to server (chunked streaming)
 - Whisper transcription (server-side)
 - JSON response parsing
 - LCD display (with word wrapping)
 - Button-triggered recording loop
 - Countdown timer before recording
 ### ⚠️ Partially Working
 - **Recording duration** - Currently getting ~0.9 seconds instead of full 1 second
  - Formula: `frame_cnt = seconds * sample_rate // chunk_size`
  - Current: `7 frames × (2048/16000) = 0.896s`
  - May need to increase `frame_cnt` or adjust chunk size
 ### ❌ Not Yet Implemented
 - Mycroft Precise wake word detection
 - Full voice assistant loop
 - Command processing
 - Home Assistant integration
 - Multi-second recording support
 - Real-time audio streaming
 ---
 ## 🔬 Technical Details
 ### Hardware Configuration
 **Maixduino Board:**
 - Processor: K210 dual-core RISC-V @ 400MHz
 - RAM: ~6MB total (limited available memory)
 - WiFi: ESP32 module via SPI
 - Microphone: MSM261S4030H0 MEMS (onboard)
 - IP Address: 10.1.10.98
 **I2S Pins:**
 - Pin 20: I2S0_IN_D0 (data)
 - Pin 19: I2S0_WS (word select)
 - Pin 18: I2S0_SCLK (clock)
 **ESP32 SPI Pins:**
 - Pin 25: CS (chip select)
 - Pin 8: RST (reset)
 - Pin 9: RDY (ready)
 - Pin 28: MOSI (master out)
 - Pin 26: MISO (master in)
 - Pin 27: SCLK (clock)
 **GPIO:**
 - Pin 16: BOOT button (active low, pull-up)
 ### Server Configuration
 **Heimdall Server:**
 - IP: 10.1.10.71
 - Port: 3006
 - Framework: Flask
 - Model: Whisper base
 - Environment: Conda `whisper_cli`
 **Endpoints:**
 - `/health` - Health check
 - `/transcribe` - POST audio for transcription
 ### Audio Format
 **Recording:**
 - Sample Rate: 16kHz
 - Hardware Output: 32-bit stereo (128KB for 1 second)
 - After Conversion: 16-bit mono (32KB for 1 second)
 - After Compression: 8-bit μ-law (16KB for 1 second)
 **WAV Header:**
 - Format Code: 7 (μ-law)
 - Channels: 1 (mono)
 - Sample Rate: 16000 Hz
 - Bits per Sample: 8
 - Includes `fact` chunk (required for μ-law)
 ---
 ## 📝 Code Files
 ### Main Script
 **File:** `/Library/Development/devl/Devops/projects/mycroft-precise/maixduino-scripts/maix_simple_record_test.py`
 **Key Functions:**
 - `init_wifi()` - ESP32 SPI WiFi connection
 - `init_audio()` - I2S microphone setup
 - `record_audio()` - Chunked recording with `wait_record()`
 - `convert_to_mono_16bit()` - Format conversion (32-bit stereo → 16-bit mono)
 - `compress_ulaw()` - μ-law compression
 - `create_wav_header()` - WAV file header generation
 - `send_to_server()` - HTTP POST with chunked streaming
 - `display_transcription()` - LCD output with word wrapping
 - `main()` - Button loop for repeated recordings
 ### Server Script
 **File:** `/devl/voice-assistant/simple_transcribe_server.py`
 **Features:**
 - Accepts raw WAV or multipart uploads
 - Whisper base model transcription
 - JSON response with transcription text
 - Handles μ-law compressed audio
 ### Documentation
 **File:** `/Library/Development/devl/Devops/projects/mycroft-precise/maixduino-scripts/MICROPYTHON_QUIRKS.md`
 Complete reference of all MicroPython compatibility issues discovered during development.
 ---
 ## 🎯 Next Steps
 ### Immediate (Tonight)
 1. ✅ Switch to Linux laptop with direct serial access
 2. ⏭️ Tune recording duration to get full 1 second
   - Try `frame_cnt = 8` instead of 7
   - Or adjust chunk size to get exact timing
 3. ⏭️ Test transcription quality with proper-length recordings
 ### Short Term (This Week)
 1. Increase recording duration to 2-3 seconds for better transcription
 2. Test memory limits with longer recordings
 3. Optimize compression/transmission for speed
 4. Add visual feedback during transmission
 ### Medium Term (Next Week)
 1. Install Mycroft Precise in `whisper_cli` environment
 2. Test "hey mycroft" wake word detection on server
 3. Integrate wake word into recording loop
 4. Add command processing and Home Assistant integration
 ### Long Term (Future)
 1. Explore edge wake word detection (Precise on K210)
 2. Multi-device deployment
 3. Continuous listening mode
 4. Voice profiles and speaker identification
 ---
 ## 🐛 Known Issues
 ### Recording Duration
 - **Issue:** Recording is ~0.9 seconds instead of 1.0 seconds
 - **Cause:** Integer division `16000 // 2048 = 7.8` rounds down to 7 frames
 - **Impact:** Minor - transcription still works
 - **Fix:** Increase `frame_cnt` to 8 or adjust chunk size
 ### Data Format Mismatch
 - **Issue:** Hardware returns 4x expected data (128KB vs 32KB)
 - **Cause:** I2S outputting 32-bit stereo despite 16-bit mono config
 - **Impact:** None - conversion function handles it
 - **Status:** Working as intended
 ### Syntax Error Sensitivity
 - **Issue:** Some valid Python causes "invalid syntax" in MicroPython
 - **Patterns:** Import statements mid-function, certain arithmetic expressions
 - **Workaround:** Simplify code, avoid complex expressions
 - **Status:** Documented in MICROPYTHON_QUIRKS.md
 ---
 ## 💡 Key Learnings
 ### I2S Recording Pattern
 The correct pattern for MaixPy I2S recording:
 ```python
 chunk_size = 2048
 frame_cnt = seconds * sample_rate // chunk_size
 for i in range(frame_cnt):
    audio_chunk = i2s_dev.record(chunk_size)
    i2s_dev.wait_record()  # BLOCKS until recording complete
    data.append(audio_chunk.to_bytes())
 ```
 **Critical:** `wait_record()` is REQUIRED or recording returns immediately!
 ### Memory Management
 K210 has very limited RAM. Successful strategies:
 - Work in small chunks (512-2048 bytes)
 - Stream data instead of buffering
 - Free variables explicitly when done
 - Avoid creating large intermediate buffers
 ### MicroPython Compatibility
 MicroPython is NOT Python. Many standard features missing:
 - F-strings, ternary operators, keyword arguments
 - Some string methods, complex expressions
 - Standard libraries (urequests, json parsing)
 **Rule:** Test incrementally, simplify everything, check quirks doc.
 ---
 ## 📚 Resources Used
 ### Documentation
 - [MaixPy I2S API Reference](https://wiki.sipeed.com/soft/maixpy/en/api_reference/Maix/i2s.html)
 - [MaixPy I2S Usage Guide](https://wiki.sipeed.com/soft/maixpy/en/modules/on_chip/i2s.html)
 - [Maixduino Hardware Wiki](https://wiki.sipeed.com/hardware/en/maix/maixpy_develop_kit_board/maix_duino.html)
 ### Code Examples
 - [Official record_wav.py](https://github.com/sipeed/MaixPy-v1_scripts/blob/master/multimedia/audio/record_wav.py)
 - [MaixPy Scripts Repository](https://github.com/sipeed/MaixPy-v1_scripts)
 ### Tools
 - MaixPy IDE (copy/paste to board)
 - Serial monitor (debugging)
 - Heimdall server (Whisper transcription)
 ---
 ## 🔄 Ready for Next Session
 ### Current State
 - ✅ Code is working and stable
 - ✅ Can record, compress, transmit, transcribe, display
 - ✅ Button loop allows repeated testing
 - ⚠️ Recording duration slightly short (~0.9s)
 ### Files Ready
 - `/Library/Development/devl/Devops/projects/mycroft-precise/maixduino-scripts/maix_simple_record_test.py`
 - `/Library/Development/devl/Devops/projects/mycroft-precise/maixduino-scripts/MICROPYTHON_QUIRKS.md`
 - `/devl/voice-assistant/simple_transcribe_server.py`
 ### For Serial Access Session
 1. Connect Maixduino via USB to Linux laptop
 2. Install pyserial: `pip install pyserial`
 3. Find device: `ls /dev/ttyUSB*` or `/dev/ttyACM*`
 4. Connect: `screen /dev/ttyUSB0 115200` or use MaixPy IDE
 5. Can directly modify code, test immediately, see serial output
 ### Quick Test Commands
 ```python
 # Test WiFi
 from network import ESP32_SPI
 # ... (full init code in maix_test_simple.py)
 # Test I2S
 from Maix import I2S
 rx = I2S(I2S.DEVICE_0)
 # ...
 # Test recording
 audio = rx.record(2048)
 rx.wait_record()
 print(len(audio.to_bytes()))
 ```
 ---
 ## 🎊 Success Metrics
 Today we achieved:
 - ✅ WiFi connection working
 - ✅ Audio recording working (with proper blocking)
 - ✅ Format conversion working (4x reduction)
 - ✅ Compression working (2x reduction)
 - ✅ Network transmission working (chunked streaming)
 - ✅ Server transcription working
 - ✅ Display output working
 - ✅ Button loop working
 - ✅ End-to-end pipeline complete!
 **Total:** 9/9 core features working! 🚀
 Minor tuning needed, but the foundation is solid and ready for wake word integration.
 ---
 **Session Summary:** Massive progress! From zero to working audio transcription pipeline in one session. Overcame significant MicroPython compatibility challenges and memory limitations. Ready for next phase: wake word detection.
 **Status:** ✅ Ready for Linux serial access and fine-tuning
 **Next Session:** Tune recording duration, then integrate Mycroft Precise wake word detection
 ---
 *End of Session Report - 2025-12-03*
--- a/hardware/maixduino/maix_debug_wifi.py
+++ b/hardware/maixduino/maix_debug_wifi.py
@ -0,0 +1,41 @@
 # Debug script to discover WiFi module methods
 # This will help us figure out the correct API
 import lcd
 lcd.init()
 lcd.clear()
 print("=" * 40)
 print("WiFi Module Debug")
 print("=" * 40)
 # Try to import WiFi module
 try:
    from network_esp32 import wifi
    print("SUCCESS: Imported network_esp32.wifi")
    lcd.draw_string(10, 10, "WiFi module found!", 0xFFFF, 0x0000)
    # List all attributes/methods
    print("\nAvailable methods:")
    lcd.draw_string(10, 30, "Checking methods...", 0xFFFF, 0x0000)
    attrs = dir(wifi)
    y = 50
    for i, attr in enumerate(attrs):
        if not attr.startswith('_'):
            print("  - " + attr)
            if i < 10:  # Only show first 10 on screen
                lcd.draw_string(10, y, attr[:20], 0x07E0, 0x0000)
                y += 15
    print("\nTotal methods: " + str(len(attrs)))
 except Exception as e:
    print("ERROR importing wifi: " + str(e))
    lcd.draw_string(10, 10, "WiFi import failed!", 0xF800, 0x0000)
    lcd.draw_string(10, 30, str(e)[:30], 0xF800, 0x0000)
 print("\n" + "=" * 40)
 print("Debug complete - check serial output")
 print("=" * 40)
--- a/hardware/maixduino/maix_discover_modules.py
+++ b/hardware/maixduino/maix_discover_modules.py
@ -0,0 +1,51 @@
 # Discover what network/WiFi modules are actually available
 import lcd
 import sys
 lcd.init()
 lcd.clear()
 print("=" * 40)
 print("Module Discovery")
 print("=" * 40)
 # Try different possible module names
 modules_to_try = [
    "network",
    "network_esp32",
    "network_esp8285",
    "esp32_spi",
    "esp8285",
    "wifi",
    "ESP32_SPI",
    "WIFI"
 ]
 found = []
 y = 10
 for module_name in modules_to_try:
    try:
        mod = __import__(module_name)
        msg = "FOUND: " + module_name
        print(msg)
        lcd.draw_string(10, y, msg[:25], 0x07E0, 0x0000)  # Green
        y += 15
        found.append(module_name)
        # Show methods
        print("  Methods: " + str(dir(mod)))
    except Exception as e:
        msg = "NONE: " + module_name
        print(msg + " (" + str(e) + ")")
 print("\n" + "=" * 40)
 if found:
    print("Found modules: " + str(found))
    lcd.draw_string(10, y + 20, "Found: " + str(len(found)), 0xFFFF, 0x0000)
 else:
    print("No WiFi modules found!")
    lcd.draw_string(10, y + 20, "No WiFi found!", 0xF800, 0x0000)
 print("=" * 40)
--- a/hardware/maixduino/maix_simple_record_test.py
+++ b/hardware/maixduino/maix_simple_record_test.py
@ -0,0 +1,461 @@
 # Simple Audio Recording and Transcription Test
 # Record audio for 3 seconds, send to server, display transcription
 #
 # This tests the full audio pipeline without wake word detection
 import time
 import lcd
 import socket
 import struct
 from Maix import GPIO, I2S
 from fpioa_manager import fm
 # ===== CONFIGURATION =====
 # Load credentials from secrets.py (gitignored)
 try:
    from secrets import SECRETS
 except ImportError:
    SECRETS = {}
 WIFI_SSID = "Tell My WiFi Love Her"
 WIFI_PASSWORD = SECRETS.get("wifi_password", "")  # set in secrets.py
 SERVER_HOST = "10.1.10.71"
 SERVER_PORT = 3006
 RECORD_SECONDS = 1  # Reduced to 1 second to save memory
 SAMPLE_RATE = 16000
 # ==========================
 # Colors
 def rgb_to_int(r, g, b):
    return (r << 16) | (g << 8) | b
 COLOR_BLACK = 0
 COLOR_WHITE = rgb_to_int(255, 255, 255)
 COLOR_RED = rgb_to_int(255, 0, 0)
 COLOR_GREEN = rgb_to_int(0, 255, 0)
 COLOR_BLUE = rgb_to_int(0, 0, 255)
 COLOR_YELLOW = rgb_to_int(255, 255, 0)
 COLOR_CYAN = 0x00FFFF  # Cyan: rgb_to_int(0, 255, 255)
 def display_msg(msg, color=COLOR_WHITE, y=50, clear=False):
    """Display message on LCD"""
    if clear:
        lcd.clear(COLOR_BLACK)
    lcd.draw_string(10, y, msg[:30], color, COLOR_BLACK)
    print(msg)
 def init_wifi():
    """Initialize WiFi connection"""
    from network import ESP32_SPI
    lcd.init()
    lcd.clear(COLOR_BLACK)
    display_msg("Connecting WiFi...", COLOR_BLUE, 10)
    # Register ESP32 SPI pins
    fm.register(25, fm.fpioa.GPIOHS10, force=True)  # CS
    fm.register(8, fm.fpioa.GPIOHS11, force=True)   # RST
    fm.register(9, fm.fpioa.GPIOHS12, force=True)   # RDY
    fm.register(28, fm.fpioa.GPIOHS13, force=True)  # MOSI
    fm.register(26, fm.fpioa.GPIOHS14, force=True)  # MISO
    fm.register(27, fm.fpioa.GPIOHS15, force=True)  # SCLK
    nic = ESP32_SPI(
        cs=fm.fpioa.GPIOHS10, rst=fm.fpioa.GPIOHS11, rdy=fm.fpioa.GPIOHS12,
        mosi=fm.fpioa.GPIOHS13, miso=fm.fpioa.GPIOHS14, sclk=fm.fpioa.GPIOHS15
    )
    nic.connect(WIFI_SSID, WIFI_PASSWORD)
    # Wait for connection
    timeout = 20
    while timeout > 0:
        time.sleep(1)
        if nic.isconnected():
            ip = nic.ifconfig()[0]
            display_msg("WiFi OK: " + str(ip), COLOR_GREEN, 30)
            return nic
        timeout -= 1
    display_msg("WiFi FAILED!", COLOR_RED, 30)
    return None
 def init_audio():
    """Initialize I2S audio"""
    display_msg("Init audio...", COLOR_BLUE, 50)
    # Register I2S pins
    fm.register(20, fm.fpioa.I2S0_IN_D0, force=True)
    fm.register(19, fm.fpioa.I2S0_WS, force=True)
    fm.register(18, fm.fpioa.I2S0_SCLK, force=True)
    # Initialize I2S
    rx = I2S(I2S.DEVICE_0)
    rx.channel_config(rx.CHANNEL_0, rx.RECEIVER, align_mode=I2S.STANDARD_MODE)
    rx.set_sample_rate(SAMPLE_RATE)
    display_msg("Audio OK!", COLOR_GREEN, 70)
    return rx
 def convert_to_mono_16bit(audio_data):
    """Convert audio to mono 16-bit by returning a slice"""
    expected_size = SAMPLE_RATE * RECORD_SECONDS * 2  # 16-bit mono
    actual_size = len(audio_data)
    print("Expected size: " + str(expected_size) + ", Actual: " + str(actual_size))
    # If we got 4x the expected data, downsample to mono
    if actual_size == expected_size * 4:
        print("Extracting mono from stereo/32-bit...")
        # Create new buffer with only the data we need (every 4th pair of bytes)
        mono_data = bytearray(expected_size)
        write_pos = 0
        # Read every 4 bytes, take first 2 bytes only
        for read_pos in range(0, actual_size, 4):
            if write_pos + 1 < expected_size and read_pos + 1 < actual_size:
                mono_data[write_pos] = audio_data[read_pos]
                mono_data[write_pos + 1] = audio_data[read_pos + 1]
                write_pos += 2
        # Free original buffer explicitly
        audio_data = None
        return mono_data
    # If we got 2x the expected data, extract mono
    elif actual_size == expected_size * 2:
        print("Extracting mono from stereo...")
        mono_data = bytearray(expected_size)
        write_pos = 0
        for read_pos in range(0, actual_size, 4):
            if write_pos + 1 < expected_size and read_pos + 1 < actual_size:
                mono_data[write_pos] = audio_data[read_pos]
                mono_data[write_pos + 1] = audio_data[read_pos + 1]
                write_pos += 2
        # Free original
        audio_data = None
        return mono_data
    # Otherwise assume it's already correct format
    print("Audio data appears to be correct format")
    return audio_data
 def record_audio(i2s_dev, seconds):
    """Record audio for specified seconds using chunked recording with wait"""
    # Clear screen and show big recording indicator
    lcd.clear(COLOR_BLACK)
    # Show large "RECORDING" text
    display_msg("*** RECORDING ***", COLOR_RED, 60)
    display_msg("Speak now!", COLOR_YELLOW, 100)
    display_msg("(listening...)", COLOR_WHITE, 130)
    chunk_size = 2048
    channels = 1
    # Calculate number of chunks needed
    frame_cnt = seconds * SAMPLE_RATE // chunk_size
    print("Recording " + str(frame_cnt) + " frames...")
    # Recording loop with wait
    all_chunks = []
    for i in range(frame_cnt):
        # Start recording this chunk
        audio_chunk = i2s_dev.record(chunk_size * channels)
        # CRITICAL: Wait for recording to complete
        i2s_dev.wait_record()
        # Convert to bytes and store
        chunk_bytes = audio_chunk.to_bytes()
        all_chunks.append(chunk_bytes)
    # Combine all chunks
    print("Combining " + str(len(all_chunks)) + " chunks...")
    audio_data = bytearray()
    for chunk in all_chunks:
        audio_data.extend(chunk)
    print("Recorded " + str(len(audio_data)) + " bytes")
    # Convert to mono 16-bit if needed
    audio_data = convert_to_mono_16bit(audio_data)
    print("Final size: " + str(len(audio_data)) + " bytes")
    return audio_data
 def compress_ulaw(data):
    """Compress 16-bit PCM to 8-bit μ-law (50% size reduction)"""
    # μ-law compression lookup table (simplified)
    BIAS = 0x84
    CLIP = 32635
    compressed = bytearray()
    # Process 16-bit samples (2 bytes each)
    for i in range(0, len(data), 2):
        # Get 16-bit sample (little endian)
        sample = struct.unpack('<h', data[i:i+2])[0]
        # Get sign and magnitude
        sign = 0x80 if sample < 0 else 0x00
        if sample < 0:
            sample = -sample
        if sample > CLIP:
            sample = CLIP
        # Add bias
        sample = sample + BIAS
        # Find exponent (position of highest bit)
        exponent = 7
        for exp in range(7, -1, -1):
            if sample & (1 << (exp + 7)):
                exponent = exp
                break
        # Get mantissa (top 4 bits after exponent)
        mantissa = (sample >> (exponent + 3)) & 0x0F
        # Combine: sign (1 bit) + exponent (3 bits) + mantissa (4 bits)
        ulaw_byte = sign | (exponent << 4) | mantissa
        # Invert bits (μ-law standard)
        compressed.append(ulaw_byte ^ 0xFF)
    return compressed
 def create_wav_header(data_size, sample_rate=16000, is_ulaw=False):
    """Create WAV file header"""
    header = bytearray()
    # RIFF header
    header.extend(b'RIFF')
    header.extend(struct.pack('<I', 50 + data_size))  # Larger header for μ-law
    header.extend(b'WAVE')
    # fmt chunk
    header.extend(b'fmt ')
    header.extend(struct.pack('<I', 18))  # Chunk size (with extension)
    header.extend(struct.pack('<H', 7 if is_ulaw else 1))   # 7=μ-law, 1=PCM
    header.extend(struct.pack('<H', 1))   # Mono
    header.extend(struct.pack('<I', sample_rate))
    header.extend(struct.pack('<I', sample_rate * (1 if is_ulaw else 2)))  # Byte rate
    header.extend(struct.pack('<H', 1 if is_ulaw else 2))   # Block align
    header.extend(struct.pack('<H', 8 if is_ulaw else 16))  # Bits per sample
    header.extend(struct.pack('<H', 0))   # Extension size
    # fact chunk (required for μ-law)
    if is_ulaw:
        header.extend(b'fact')
        header.extend(struct.pack('<I', 4))
        header.extend(struct.pack('<I', data_size))  # Sample count
    # data chunk
    header.extend(b'data')
    header.extend(struct.pack('<I', data_size))
    return header
 def send_to_server(audio_data):
    """Send audio to server and get transcription"""
    lcd.clear(COLOR_BLACK)
    display_msg("Processing...", COLOR_BLUE, 60)
    display_msg("Compressing audio", COLOR_WHITE, 100)
    print("Sending to server...")
    try:
        # Compress audio using μ-law (50% size reduction)
        print("Compressing audio...")
        compressed_data = compress_ulaw(audio_data)
        print("Compressed: " + str(len(audio_data)) + " -> " + str(len(compressed_data)) + " bytes")
        # Update display
        display_msg("Sending to server", COLOR_WHITE, 130)
        # Create WAV file with μ-law format
        wav_header = create_wav_header(len(compressed_data), is_ulaw=True)
        wav_size = len(wav_header) + len(compressed_data)
        # Simple HTTP POST with raw WAV data
        headers = "POST /transcribe HTTP/1.1\r\n"
        headers += "Host: " + SERVER_HOST + "\r\n"
        headers += "Content-Type: audio/wav\r\n"
        headers += "Content-Length: " + str(wav_size) + "\r\n"
        headers += "Connection: close\r\n\r\n"
        # Connect with better socket settings
        s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        s.settimeout(30)
        # Try to set socket options for better stability
        try:
            s.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
        except:
            pass  # Some MicroPython builds don't support this
        print("Connecting to " + SERVER_HOST + ":" + str(SERVER_PORT))
        s.connect((SERVER_HOST, SERVER_PORT))
        # Send headers
        print("Sending headers...")
        sent = s.send(headers.encode())
        print("Sent " + str(sent) + " bytes of headers")
        # Send WAV header
        print("Sending WAV header...")
        sent = s.send(wav_header)
        print("Sent " + str(sent) + " bytes of WAV header")
        # Send audio data in small chunks with delay
        print("Sending audio data (" + str(len(compressed_data)) + " bytes)...")
        chunk_size = 512  # Even smaller chunks for stability
        total_chunks = (len(compressed_data) + chunk_size - 1) // chunk_size
        bytes_sent = 0
        for i in range(0, len(compressed_data), chunk_size):
            chunk = compressed_data[i:i+chunk_size]
            try:
                sent = s.send(chunk)
                bytes_sent += sent
                chunk_num = i // chunk_size + 1
                if chunk_num % 10 == 0:  # Progress update every 10 chunks
                    print("Sent " + str(bytes_sent) + "/" + str(len(compressed_data)) + " bytes")
                # Small delay to let socket buffer drain
                time.sleep_ms(10)
            except Exception as e:
                print("Send error at byte " + str(bytes_sent) + ": " + str(e))
                raise
        print("All data sent! Total: " + str(bytes_sent) + " bytes")
        # Update display for waiting
        lcd.clear(COLOR_BLACK)
        display_msg("Transcribing...", COLOR_CYAN, 60)
        display_msg("Please wait", COLOR_WHITE, 100)
        # Read response
        response = b""
        while True:
            chunk = s.recv(1024)
            if not chunk:
                break
            response += chunk
        s.close()
        # Parse response (MicroPython decode doesn't accept keyword args)
        try:
            response_str = response.decode('utf-8')
        except:
            response_str = str(response)
        print("Response: " + response_str[:200])
        # Extract JSON from response
        if '{"' in response_str:
            json_start = response_str.index('{"')
            json_str = response_str[json_start:]
            # Simple JSON parsing (MicroPython doesn't have json module)
            if '"text":' in json_str:
                text_start = json_str.index('"text":') + 7
                text_str = json_str[text_start:]
                # Find the value between quotes
                if '"' in text_str:
                    quote_start = text_str.index('"') + 1
                    quote_end = text_str.index('"', quote_start)
                    transcription = text_str[quote_start:quote_end]
                    return transcription
        return "Error parsing response"
    except Exception as e:
        print("Error: " + str(e))
        return "Error: " + str(e)
 def display_transcription(text):
    """Display transcription on LCD"""
    lcd.clear(COLOR_BLACK)
    display_msg("TRANSCRIPTION:", COLOR_GREEN, 10)
    # Simple line splitting every 20 chars
    y = 40
    while len(text) > 0:
        chunk = text[:20]
        display_msg(chunk, COLOR_WHITE, y)
        text = text[20:]
        y += 20
        if y > 200:
            break
    print("Transcription: " + text)
 def main():
    """Main program with loop for multiple recordings"""
    print("=" * 40)
    print("Simple Audio Recording Test")
    print("=" * 40)
    # Initialize
    nic = init_wifi()
    if not nic:
        return
    i2s = init_audio()
    # Setup button (boot button on GPIO 16)
    fm.register(16, fm.fpioa.GPIOHS0, force=True)
    button = GPIO(GPIO.GPIOHS0, GPIO.IN, GPIO.PULL_UP)
    display_msg("Ready!", COLOR_GREEN, 110, clear=True)
    display_msg("Press BOOT button", COLOR_WHITE, 130)
    display_msg("to record", COLOR_WHITE, 150)
    print("Press BOOT button to record, or Ctrl+C to exit")
    recording_count = 0
    # Main loop
    while True:
        # Wait for button press (button is active low)
        if button.value() == 0:
            recording_count += 1
            print("\n--- Recording #" + str(recording_count) + " ---")
            # Debounce - wait for button release
            while button.value() == 0:
                time.sleep_ms(10)
            # Give user time to prepare (countdown)
            lcd.clear(COLOR_BLACK)
            display_msg("GET READY!", COLOR_YELLOW, 80)
            display_msg("3...", COLOR_WHITE, 120)
            time.sleep(1)
            display_msg("2...", COLOR_WHITE, 140)
            time.sleep(1)
            display_msg("1...", COLOR_WHITE, 160)
            time.sleep(1)
            # Record
            audio_data = record_audio(i2s, RECORD_SECONDS)
            # Send to server
            transcription = send_to_server(audio_data)
            # Display result
            display_transcription(transcription)
            # Wait a bit before showing ready again
            time.sleep(2)
            # Show ready for next recording
            display_msg("Ready!", COLOR_GREEN, 110, clear=True)
            display_msg("Press BOOT button", COLOR_WHITE, 130)
            next_count = recording_count + 1
            display_msg("to record (#" + str(next_count) + ")", COLOR_WHITE, 150)
            print("Ready for next recording. Press BOOT button.")
        time.sleep_ms(50)  # Small delay to reduce CPU usage
 # Run main
 main()
--- a/hardware/maixduino/maix_test_simple.py
+++ b/hardware/maixduino/maix_test_simple.py
@ -0,0 +1,252 @@
 # Maix Duino - Simple Test Script
 # Copy/paste this into MaixPy IDE and click RUN
 #
 # This script tests:
 # 1. LCD display
 # 2. WiFi connectivity
 # 3. Network connection to Heimdall server
 # 4. I2S audio initialization (without recording yet)
 import time
 import lcd
 from Maix import GPIO, I2S
 from fpioa_manager import fm
 # Import the correct network module
 try:
    import network
    # Create ESP32_SPI instance (for Maix Duino with ESP32)
    nic = None  # Will be initialized in test_wifi
 except Exception as e:
    print("Network module import error: " + str(e))
    nic = None
 # ===== CONFIGURATION - EDIT THESE =====
 # Load credentials from secrets.py (gitignored)
 try:
    from secrets import SECRETS
 except ImportError:
    SECRETS = {}
 WIFI_SSID = "Tell My WiFi Love Her"        # <<< CHANGE THIS
 WIFI_PASSWORD = SECRETS.get("wifi_password", "")  # set in secrets.py        # <<< CHANGE THIS
 SERVER_URL = "http://10.1.10.71:3006"  # Heimdall voice server
 # =======================================
 # Colors (as tuples for easy reference)
 COLOR_BLACK = (0, 0, 0)
 COLOR_WHITE = (255, 255, 255)
 COLOR_RED = (255, 0, 0)
 COLOR_GREEN = (0, 255, 0)
 COLOR_BLUE = (0, 0, 255)
 COLOR_YELLOW = (255, 255, 0)
 def display_msg(msg, color=COLOR_WHITE, y=50):
    """Display message on LCD"""
    # lcd.draw_string needs RGB as separate ints: lcd.draw_string(x, y, text, color_int, bg_color_int)
    # Convert RGB tuple to single integer: (R << 16) | (G << 8) | B
    color_int = (color[0] << 16) | (color[1] << 8) | color[2]
    bg_int = 0  # Black background
    lcd.draw_string(10, y, msg, color_int, bg_int)
    print(msg)
 def test_lcd():
    """Test LCD display"""
    lcd.init()
    lcd.clear(COLOR_BLACK)
    display_msg("MaixDuino Test", COLOR_YELLOW, 10)
    display_msg("Initializing...", COLOR_WHITE, 30)
    time.sleep(1)
    return True
 def test_wifi():
    """Test WiFi connection"""
    global nic
    display_msg("Connecting WiFi...", COLOR_BLUE, 50)
    try:
        # Initialize ESP32_SPI network interface
        print("Initializing ESP32_SPI...")
        # Create network interface instance with Maix Duino pins
        # Maix Duino ESP32 default pins:
        # CS=25, RST=8, RDY=9, MOSI=28, MISO=26, SCLK=27
        from network import ESP32_SPI
        from fpioa_manager import fm
        from Maix import GPIO
        # Register pins for ESP32 SPI communication
        fm.register(25, fm.fpioa.GPIOHS10, force=True)  # CS
        fm.register(8, fm.fpioa.GPIOHS11, force=True)   # RST
        fm.register(9, fm.fpioa.GPIOHS12, force=True)   # RDY
        fm.register(28, fm.fpioa.GPIOHS13, force=True)  # MOSI
        fm.register(26, fm.fpioa.GPIOHS14, force=True)  # MISO
        fm.register(27, fm.fpioa.GPIOHS15, force=True)  # SCLK
        nic = ESP32_SPI(
            cs=fm.fpioa.GPIOHS10,
            rst=fm.fpioa.GPIOHS11,
            rdy=fm.fpioa.GPIOHS12,
            mosi=fm.fpioa.GPIOHS13,
            miso=fm.fpioa.GPIOHS14,
            sclk=fm.fpioa.GPIOHS15
        )
        print("Connecting to " + WIFI_SSID + "...")
        # Connect to WiFi (no need to call active() first)
        nic.connect(WIFI_SSID, WIFI_PASSWORD)
        # Wait for connection
        timeout = 20
        while timeout > 0:
            time.sleep(1)
            timeout -= 1
            if nic.isconnected():
                # Successfully connected!
                ip_info = nic.ifconfig()
                ip = ip_info[0] if ip_info else "Unknown"
                display_msg("WiFi OK!", COLOR_GREEN, 70)
                display_msg("IP: " + str(ip), COLOR_WHITE, 90)
                print("Connected! IP: " + str(ip))
                time.sleep(2)
                return True
            else:
                print("Waiting... " + str(timeout) + "s")
        # Timeout reached
        display_msg("WiFi FAILED!", COLOR_RED, 70)
        print("Connection timeout")
        return False
    except Exception as e:
        display_msg("WiFi error!", COLOR_RED, 70)
        print("WiFi error: " + str(e))
        import sys
        sys.print_exception(e)
        return False
 def test_server():
    """Test connection to Heimdall server"""
    display_msg("Testing server...", COLOR_BLUE, 110)
    try:
        # Try socket connection to server
        import socket
        url = SERVER_URL + "/health"
        print("Trying: " + url)
        # Parse URL to get host and port
        host = "10.1.10.71"
        port = 3006
        # Create socket
        s = socket.socket()
        s.settimeout(5)
        print("Connecting to " + host + ":" + str(port))
        s.connect((host, port))
        # Send HTTP GET request
        request = "GET /health HTTP/1.1\r\nHost: " + host + "\r\nConnection: close\r\n\r\n"
        s.send(request.encode())
        # Read response
        response = s.recv(1024).decode()
        s.close()
        print("Server response received")
        if "200" in response or "OK" in response:
            display_msg("Server OK!", COLOR_GREEN, 130)
            print("Server is reachable!")
            time.sleep(2)
            return True
        else:
            display_msg("Server responded", COLOR_YELLOW, 130)
            print("Response: " + response[:100])
            return True  # Still counts as success if we got a response
    except Exception as e:
        display_msg("Server FAILED!", COLOR_RED, 130)
        error_msg = str(e)[:30]
        display_msg(error_msg, COLOR_RED, 150)
        print("Server connection failed: " + str(e))
        return False
 def test_audio():
    """Test I2S audio initialization"""
    display_msg("Testing audio...", COLOR_BLUE, 170)
    try:
        # Register I2S pins (Maix Duino pinout)
        fm.register(20, fm.fpioa.I2S0_IN_D0, force=True)
        fm.register(19, fm.fpioa.I2S0_WS, force=True)
        fm.register(18, fm.fpioa.I2S0_SCLK, force=True)
        # Initialize I2S
        rx = I2S(I2S.DEVICE_0)
        rx.channel_config(rx.CHANNEL_0, rx.RECEIVER, align_mode=I2S.STANDARD_MODE)
        rx.set_sample_rate(16000)
        display_msg("Audio OK!", COLOR_GREEN, 190)
        print("I2S initialized: " + str(rx))
        time.sleep(2)
        return True
    except Exception as e:
        display_msg("Audio FAILED!", COLOR_RED, 190)
        print("Audio init failed: " + str(e))
        return False
 def main():
    """Run all tests"""
    print("=" * 40)
    print("MaixDuino Voice Assistant Test")
    print("=" * 40)
    # Test LCD
    if not test_lcd():
        print("LCD test failed!")
        return
    # Test WiFi
    if not test_wifi():
        print("WiFi test failed!")
        red_int = (255 << 16) | (0 << 8) | 0  # Red color
        lcd.draw_string(10, 210, "STOPPED - Check WiFi", red_int, 0)
        return
    # Test server connection
    server_ok = test_server()
    # Test audio
    audio_ok = test_audio()
    # Summary
    lcd.clear(COLOR_BLACK)
    display_msg("=== TEST RESULTS ===", COLOR_YELLOW, 10)
    display_msg("LCD:    OK", COLOR_GREEN, 40)
    display_msg("WiFi:   OK", COLOR_GREEN, 60)
    if server_ok:
        display_msg("Server: OK", COLOR_GREEN, 80)
    else:
        display_msg("Server: FAIL", COLOR_RED, 80)
    if audio_ok:
        display_msg("Audio:  OK", COLOR_GREEN, 100)
    else:
        display_msg("Audio:  FAIL", COLOR_RED, 100)
    if server_ok and audio_ok:
        display_msg("Ready for voice app!", COLOR_GREEN, 140)
    else:
        display_msg("Fix errors first", COLOR_YELLOW, 140)
    print("\nTest complete!")
 # Run the test
 if __name__ == "__main__":
    main()
--- a/hardware/maixduino/maix_voice_client.py
+++ b/hardware/maixduino/maix_voice_client.py
@ -0,0 +1,465 @@
 # Maix Duino Voice Assistant Client
 # Path: maix_voice_client.py (upload to Maix Duino SD card)
 #
 # Purpose and usage:
 #     This script runs on the Maix Duino board and handles:
 #     - Wake word detection using KPU
 #     - Audio capture from I2S microphone
 #     - Streaming audio to voice processing server
 #     - Playing back TTS responses
 #     - LED feedback for user interaction
 #
 # Requirements:
 #     - MaixPy firmware (latest version)
 #     - I2S microphone connected
 #     - Speaker or audio output connected
 #     - WiFi configured (see config below)
 #
 # Upload to board:
 #     1. Copy this file to SD card as boot.py or main.py
 #     2. Update WiFi credentials below
 #     3. Update server URL to your Heimdall IP
 #     4. Power cycle the board
 import time
 import audio
 import image
 from Maix import GPIO
 from fpioa_manager import fm
 from machine import I2S
 import KPU as kpu
 import sensor
 import lcd
 import gc
 # ----- Configuration -----
 # WiFi Settings
 WIFI_SSID = "YourSSID"
 WIFI_PASSWORD = "YourPassword"
 # Server Settings
 VOICE_SERVER_URL = "http://10.1.10.71:5000"
 PROCESS_ENDPOINT = "/process"
 # Audio Settings
 SAMPLE_RATE = 16000  # 16kHz for Whisper
 CHANNELS = 1  # Mono
 SAMPLE_WIDTH = 2  # 16-bit
 CHUNK_SIZE = 1024
 # Wake Word Settings
 WAKE_WORD_THRESHOLD = 0.7  # Confidence threshold (0.0-1.0)
 WAKE_WORD_MODEL = "/sd/models/wake_word.kmodel"  # Path to wake word model
 # LED Pin for feedback
 LED_PIN = 13  # Onboard LED (adjust if needed)
 # Recording Settings
 MAX_RECORD_TIME = 10  # Maximum seconds to record after wake word
 SILENCE_THRESHOLD = 500  # Amplitude threshold for silence detection
 SILENCE_DURATION = 2  # Seconds of silence before stopping recording
 # ----- Color definitions for LCD -----
 COLOR_RED = (255, 0, 0)
 COLOR_GREEN = (0, 255, 0)
 COLOR_BLUE = (0, 0, 255)
 COLOR_YELLOW = (255, 255, 0)
 COLOR_BLACK = (0, 0, 0)
 COLOR_WHITE = (255, 255, 255)
 # ----- Global Variables -----
 led = None
 i2s_dev = None
 kpu_task = None
 listening = False
 def init_hardware():
    """Initialize hardware components"""
    global led, i2s_dev
    # Initialize LED
    fm.register(LED_PIN, fm.fpioa.GPIO0)
    led = GPIO(GPIO.GPIO0, GPIO.OUT)
    led.value(0)  # Turn off initially
    # Initialize LCD
    lcd.init()
    lcd.clear(COLOR_BLACK)
    lcd.draw_string(lcd.width()//2 - 50, lcd.height()//2, 
                    "Initializing...", 
                    lcd.WHITE, lcd.BLACK)
    # Initialize I2S for audio (microphone)
    # Note: Pin configuration may vary based on your specific hardware
    fm.register(20, fm.fpioa.I2S0_IN_D0)
    fm.register(19, fm.fpioa.I2S0_WS)
    fm.register(18, fm.fpioa.I2S0_SCLK)
    i2s_dev = I2S(I2S.DEVICE_0)
    i2s_dev.channel_config(I2S.CHANNEL_0, I2S.RECEIVER, 
                          align_mode=I2S.STANDARD_MODE,
                          data_width=I2S.RESOLUTION_16_BIT)
    i2s_dev.set_sample_rate(SAMPLE_RATE)
    print("Hardware initialized")
 def init_network():
    """Initialize WiFi connection"""
    import network
    lcd.clear(COLOR_BLACK)
    lcd.draw_string(10, 50, "Connecting to WiFi...", COLOR_WHITE, COLOR_BLACK)
    wlan = network.WLAN(network.STA_IF)
    wlan.active(True)
    if not wlan.isconnected():
        print(f"Connecting to {WIFI_SSID}...")
        wlan.connect(WIFI_SSID, WIFI_PASSWORD)
        # Wait for connection
        timeout = 20
        while not wlan.isconnected() and timeout > 0:
            time.sleep(1)
            timeout -= 1
            print(f"Waiting for connection... {timeout}s")
        if not wlan.isconnected():
            print("Failed to connect to WiFi")
            lcd.clear(COLOR_BLACK)
            lcd.draw_string(10, 50, "WiFi Failed!", COLOR_RED, COLOR_BLACK)
            return False
    print("Network connected:", wlan.ifconfig())
    lcd.clear(COLOR_BLACK)
    lcd.draw_string(10, 50, "WiFi Connected", COLOR_GREEN, COLOR_BLACK)
    lcd.draw_string(10, 70, f"IP: {wlan.ifconfig()[0]}", COLOR_WHITE, COLOR_BLACK)
    time.sleep(2)
    return True
 def load_wake_word_model():
    """Load wake word detection model"""
    global kpu_task
    try:
        # This is a placeholder - you'll need to train and convert a wake word model
        # For now, we'll skip KPU wake word and use a simpler approach
        print("Wake word model loading skipped (implement after model training)")
        return True
    except Exception as e:
        print(f"Failed to load wake word model: {e}")
        return False
 def detect_wake_word():
    """
    Detect wake word in audio stream
    Returns:
        True if wake word detected, False otherwise
    Note: This is a simplified version. For production, you should:
    1. Train a wake word model using Mycroft Precise or similar
    2. Convert the model to .kmodel format for K210
    3. Load and run inference using KPU
    For now, we'll use a simple amplitude-based trigger
    """
    # Simple amplitude-based detection (placeholder)
    # Replace with actual KPU inference
    audio_data = i2s_dev.record(CHUNK_SIZE)
    if audio_data:
        # Calculate amplitude
        amplitude = 0
        for i in range(0, len(audio_data), 2):
            sample = int.from_bytes(audio_data[i:i+2], 'little', True)
            amplitude += abs(sample)
        amplitude = amplitude / (len(audio_data) // 2)
        # Simple threshold detection (replace with KPU inference)
        if amplitude > 3000:  # Adjust threshold based on your microphone
            return True
    return False
 def record_audio(max_duration=MAX_RECORD_TIME):
    """
    Record audio until silence or max duration
    Returns:
        bytes: Recorded audio data in WAV format
    """
    print(f"Recording audio (max {max_duration}s)...")
    audio_buffer = bytearray()
    start_time = time.time()
    silence_start = None
    # Record in chunks
    while True:
        elapsed = time.time() - start_time
        # Check max duration
        if elapsed > max_duration:
            print("Max recording duration reached")
            break
        # Record chunk
        chunk = i2s_dev.record(CHUNK_SIZE)
        if chunk:
            audio_buffer.extend(chunk)
            # Calculate amplitude for silence detection
            amplitude = 0
            for i in range(0, len(chunk), 2):
                sample = int.from_bytes(chunk[i:i+2], 'little', True)
                amplitude += abs(sample)
            amplitude = amplitude / (len(chunk) // 2)
            # Silence detection
            if amplitude < SILENCE_THRESHOLD:
                if silence_start is None:
                    silence_start = time.time()
                elif time.time() - silence_start > SILENCE_DURATION:
                    print("Silence detected, stopping recording")
                    break
            else:
                silence_start = None
        # Update LCD with recording time
        if int(elapsed) % 1 == 0:
            lcd.clear(COLOR_BLACK)
            lcd.draw_string(10, 50, f"Recording... {int(elapsed)}s", 
                          COLOR_RED, COLOR_BLACK)
    print(f"Recorded {len(audio_buffer)} bytes")
    # Convert to WAV format
    return create_wav(audio_buffer)
 def create_wav(audio_data):
    """Create WAV file header and combine with audio data"""
    import struct
    # WAV header
    sample_rate = SAMPLE_RATE
    channels = CHANNELS
    sample_width = SAMPLE_WIDTH
    data_size = len(audio_data)
    # RIFF header
    wav = bytearray(b'RIFF')
    wav.extend(struct.pack('<I', 36 + data_size))  # File size - 8
    wav.extend(b'WAVE')
    # fmt chunk
    wav.extend(b'fmt ')
    wav.extend(struct.pack('<I', 16))  # fmt chunk size
    wav.extend(struct.pack('<H', 1))   # PCM format
    wav.extend(struct.pack('<H', channels))
    wav.extend(struct.pack('<I', sample_rate))
    wav.extend(struct.pack('<I', sample_rate * channels * sample_width))
    wav.extend(struct.pack('<H', channels * sample_width))
    wav.extend(struct.pack('<H', sample_width * 8))
    # data chunk
    wav.extend(b'data')
    wav.extend(struct.pack('<I', data_size))
    wav.extend(audio_data)
    return bytes(wav)
 def send_audio_to_server(audio_data):
    """
    Send audio to voice processing server and get response
    Returns:
        dict: Response from server or None on failure
    """
    import urequests
    try:
        # Prepare multipart form data
        url = f"{VOICE_SERVER_URL}{PROCESS_ENDPOINT}"
        print(f"Sending audio to {url}...")
        lcd.clear(COLOR_BLACK)
        lcd.draw_string(10, 50, "Processing...", COLOR_YELLOW, COLOR_BLACK)
        # Send POST request with audio file
        # Note: MaixPy's urequests doesn't support multipart, so we need a workaround
        # For now, send raw audio with appropriate headers
        headers = {
            'Content-Type': 'audio/wav',
        }
        response = urequests.post(url, data=audio_data, headers=headers)
        if response.status_code == 200:
            result = response.json()
            response.close()
            return result
        else:
            print(f"Server error: {response.status_code}")
            response.close()
            return None
    except Exception as e:
        print(f"Error sending audio: {e}")
        return None
 def display_response(response_text):
    """Display response on LCD"""
    lcd.clear(COLOR_BLACK)
    # Word wrap for LCD
    words = response_text.split()
    lines = []
    current_line = ""
    for word in words:
        test_line = current_line + word + " "
        if len(test_line) * 8 > lcd.width() - 20:  # Rough character width
            if current_line:
                lines.append(current_line.strip())
            current_line = word + " "
        else:
            current_line = test_line
    if current_line:
        lines.append(current_line.strip())
    # Display lines
    y = 30
    for line in lines[:5]:  # Max 5 lines
        lcd.draw_string(10, y, line, COLOR_GREEN, COLOR_BLACK)
        y += 20
 def set_led(state):
    """Control LED state"""
    if led:
        led.value(1 if state else 0)
 def main_loop():
    """Main voice assistant loop"""
    global listening
    # Show ready status
    lcd.clear(COLOR_BLACK)
    lcd.draw_string(10, lcd.height()//2 - 10, "Say wake word...", 
                   COLOR_BLUE, COLOR_BLACK)
    print("Voice assistant ready. Listening for wake word...")
    while True:
        try:
            # Listen for wake word
            if detect_wake_word():
                print("Wake word detected!")
                # Visual feedback
                set_led(True)
                lcd.clear(COLOR_BLACK)
                lcd.draw_string(10, 50, "Listening...", COLOR_RED, COLOR_BLACK)
                # Small delay to skip the wake word itself
                time.sleep(0.5)
                # Record command
                audio_data = record_audio()
                # Send to server
                response = send_audio_to_server(audio_data)
                if response and response.get('success'):
                    transcription = response.get('transcription', '')
                    response_text = response.get('response', 'No response')
                    print(f"You said: {transcription}")
                    print(f"Response: {response_text}")
                    # Display response
                    display_response(response_text)
                    # TODO: Play TTS audio response
                else:
                    lcd.clear(COLOR_BLACK)
                    lcd.draw_string(10, 50, "Error processing", 
                                  COLOR_RED, COLOR_BLACK)
                # Turn off LED
                set_led(False)
                # Pause before listening again
                time.sleep(2)
                # Reset display
                lcd.clear(COLOR_BLACK)
                lcd.draw_string(10, lcd.height()//2 - 10, "Say wake word...", 
                               COLOR_BLUE, COLOR_BLACK)
            # Small delay to prevent tight loop
            time.sleep(0.1)
            # Garbage collection
            if gc.mem_free() < 100000:  # If free memory < 100KB
                gc.collect()
        except KeyboardInterrupt:
            print("Exiting...")
            break
        except Exception as e:
            print(f"Error in main loop: {e}")
            time.sleep(1)
 def main():
    """Main entry point"""
    print("=" * 40)
    print("Maix Duino Voice Assistant")
    print("=" * 40)
    # Initialize hardware
    init_hardware()
    # Connect to network
    if not init_network():
        print("Failed to initialize network. Exiting.")
        return
    # Load wake word model (optional)
    load_wake_word_model()
    # Start main loop
    try:
        main_loop()
    except Exception as e:
        print(f"Fatal error: {e}")
    finally:
        # Cleanup
        set_led(False)
        lcd.clear(COLOR_BLACK)
        lcd.draw_string(10, lcd.height()//2, "Stopped", 
                       COLOR_RED, COLOR_BLACK)
 # Run main program
 if __name__ == "__main__":
    main()
--- a/hardware/maixduino/secrets.py.example
+++ b/hardware/maixduino/secrets.py.example
@ -0,0 +1,7 @@
 # Copy this file to secrets.py and fill in your values
 # secrets.py is gitignored — never commit it
 SECRETS = {
    "wifi_ssid": "YourNetworkName",
    "wifi_password": "YourWiFiPassword",
    "voice_server_url": "http://10.1.10.71:5000",  # replace with your Minerva server IP
 }
--- a/scripts/download_pretrained_models.sh
+++ b/scripts/download_pretrained_models.sh
@ -0,0 +1,409 @@
 #!/usr/bin/env bash
 #
 # Path: download_pretrained_models.sh
 #
 # Purpose and usage:
 #     Downloads and sets up pre-trained Mycroft Precise wake word models
 #     - Downloads Hey Mycroft, Hey Jarvis, and other available models
 #     - Tests each model with microphone
 #     - Configures voice server to use them
 #
 # Requirements:
 #     - Mycroft Precise installed (run setup_precise.sh first)
 #     - Internet connection for downloads
 #     - Microphone for testing
 #
 # Usage:
 #     ./download_pretrained_models.sh [--test-all] [--model MODEL_NAME]
 #
 # Author: PRbL Library
 # Created: $(date +"%Y-%m-%d")
 # ----- PRbL Color and output functions -----
 RED='\033[0;31m'
 GREEN='\033[0;32m'
 YELLOW='\033[0;33m'
 BLUE='\033[0;34m'
 PURPLE='\033[0;35m'
 CYAN='\033[0;36m'
 NC='\033[0m' # No Color
 print_status() {
    local level="$1"
    shift
    case "$level" in
        "info")     echo -e "${BLUE}[INFO]${NC} $*" >&2 ;;
        "success")  echo -e "${GREEN}[SUCCESS]${NC} $*" >&2 ;;
        "warning")  echo -e "${YELLOW}[WARNING]${NC} $*" >&2 ;;
        "error")    echo -e "${RED}[ERROR]${NC} $*" >&2 ;;
        "debug")    [[ "$VERBOSE" == "true" ]] && echo -e "${PURPLE}[DEBUG]${NC} $*" >&2 ;;
        *)          echo -e "$*" >&2 ;;
    esac
 }
 # ----- Configuration -----
 MODELS_DIR="$HOME/precise-models/pretrained"
 TEST_ALL=false
 SPECIFIC_MODEL=""
 VERBOSE=false
 # Available pre-trained models
 declare -A MODELS=(
    ["hey-mycroft"]="https://github.com/MycroftAI/precise-data/raw/models-dev/hey-mycroft.tar.gz"
    ["hey-jarvis"]="https://github.com/MycroftAI/precise-data/raw/models-dev/hey-jarvis.tar.gz"
    ["christopher"]="https://github.com/MycroftAI/precise-data/raw/models-dev/christopher.tar.gz"
    ["hey-ezra"]="https://github.com/MycroftAI/precise-data/raw/models-dev/hey-ezra.tar.gz"
 )
 # ----- Dependency checking -----
 command_exists() {
    command -v "$1" &> /dev/null
 }
 check_dependencies() {
    local missing=()
    if ! command_exists wget; then
        missing+=("wget")
    fi
    if ! command_exists precise-listen; then
        missing+=("precise-listen (run setup_precise.sh first)")
    fi
    if [[ ${#missing[@]} -gt 0 ]]; then
        print_status error "Missing dependencies: ${missing[*]}"
        return 1
    fi
    return 0
 }
 # ----- Parse arguments -----
 parse_args() {
    while [[ $# -gt 0 ]]; do
        case "$1" in
            --test-all)
                TEST_ALL=true
                shift
                ;;
            --model)
                SPECIFIC_MODEL="$2"
                shift 2
                ;;
            -v|--verbose)
                VERBOSE=true
                shift
                ;;
            -h|--help)
                cat << EOF
 Usage: $(basename "$0") [OPTIONS]
 Download and test pre-trained Mycroft Precise wake word models
 Options:
    --test-all          Download and test all available models
    --model NAME        Download and test specific model
    -v, --verbose       Enable verbose output
    -h, --help          Show this help message
 Available models:
    hey-mycroft         Original Mycroft wake word (most data)
    hey-jarvis          Popular alternative  
    christopher         Alternative wake word
    hey-ezra            Another option
 Examples:
    $(basename "$0") --model hey-mycroft
    $(basename "$0") --test-all
 EOF
                exit 0
                ;;
            *)
                print_status error "Unknown option: $1"
                exit 1
                ;;
        esac
    done
 }
 # ----- Functions -----
 create_models_directory() {
    print_status info "Creating models directory: $MODELS_DIR"
    mkdir -p "$MODELS_DIR" || {
        print_status error "Failed to create directory"
        return 1
    }
    return 0
 }
 download_model() {
    local model_name="$1"
    local model_url="${MODELS[${model_name}]}"
    if [[ -z "$model_url" ]]; then
        print_status error "Unknown model: $model_name"
        return 1
    fi
    # Check if already downloaded
    if [[ -f "$MODELS_DIR/${model_name}.net" ]]; then
        print_status info "Model already exists: $model_name"
        return 0
    fi
    print_status info "Downloading $model_name..."
    local temp_file="/tmp/${model_name}-$$.tar.gz"
    wget -q --show-progress -O "$temp_file" "$model_url" || {
        print_status error "Failed to download $model_name"
        rm -f "$temp_file"
        return 1
    }
    # Extract
    print_status info "Extracting $model_name..."
    tar xzf "$temp_file" -C "$MODELS_DIR" || {
        print_status error "Failed to extract $model_name"
        rm -f "$temp_file"
        return 1
    }
    rm -f "$temp_file"
    # Verify extraction
    if [[ -f "$MODELS_DIR/${model_name}.net" ]]; then
        print_status success "Downloaded: $model_name"
        return 0
    else
        print_status error "Extraction failed for $model_name"
        return 1
    fi
 }
 test_model() {
    local model_name="$1"
    local model_file="$MODELS_DIR/${model_name}.net"
    if [[ ! -f "$model_file" ]]; then
        print_status error "Model file not found: $model_file"
        return 1
    fi
    print_status info "Testing model: $model_name"
    echo ""
    echo -e "${CYAN}Instructions:${NC}"
    echo "  - Speak the wake word: '$model_name'"
    echo "  - You should see '!' when detected"
    echo "  - Press Ctrl+C to stop testing"
    echo ""
    read -p "Press Enter to start test..."
    # Activate conda environment if needed
    if command_exists conda; then
        eval "$(conda shell.bash hook)"
        conda activate precise 2>/dev/null || true
    fi
    precise-listen "$model_file" || {
        print_status warning "Test interrupted or failed"
        return 1
    }
    return 0
 }
 create_multi_wake_config() {
    print_status info "Creating multi-wake-word configuration..."
    local config_file="$MODELS_DIR/multi-wake-config.sh"
    cat > "$config_file" << 'EOF'
 #!/bin/bash
 # Multi-wake-word configuration
 # Generated by download_pretrained_models.sh
 # Start voice server with multiple wake words
 cd ~/voice-assistant
 # List of wake word models
 MODELS=""
 EOF
    # Add each downloaded model to config
    for model_name in "${!MODELS[@]}"; do
        if [[ -f "$MODELS_DIR/${model_name}.net" ]]; then
            echo "# Found: $model_name" >> "$config_file"
            echo "MODELS=\"\${MODELS}${model_name}:$MODELS_DIR/${model_name}.net:0.5,\"" >> "$config_file"
        fi
    done
    cat >> "$config_file" << 'EOF'
 # Remove trailing comma
 MODELS="${MODELS%,}"
 # Activate environment
 eval "$(conda shell.bash hook)"
 conda activate precise
 # Start server
 python voice_server.py \
    --enable-precise \
    --precise-models "$MODELS" \
    --ha-token "$HA_TOKEN"
 EOF
    chmod +x "$config_file"
    print_status success "Created: $config_file"
    echo ""
    print_status info "To use multiple wake words, run:"
    print_status info "  $config_file"
    return 0
 }
 list_downloaded_models() {
    print_status info "Downloaded models in $MODELS_DIR:"
    echo ""
    local count=0
    for model_name in "${!MODELS[@]}"; do
        if [[ -f "$MODELS_DIR/${model_name}.net" ]]; then
            local size=$(du -h "$MODELS_DIR/${model_name}.net" | cut -f1)
            echo -e "  ${GREEN}✓${NC} ${model_name}.net (${size})"
            ((count++))
        else
            echo -e "  ${YELLOW}○${NC} ${model_name}.net (not downloaded)"
        fi
    done
    echo ""
    print_status success "Total downloaded: $count"
    return 0
 }
 compare_models() {
    print_status info "Model comparison:"
    echo ""
    cat << 'EOF'
 ┌─────────────────┬──────────────┬─────────────┬─────────────────┐
 │ Wake Word       │ Popularity   │ Difficulty  │ Recommended For │
 ├─────────────────┼──────────────┼─────────────┼─────────────────┤
 │ Hey Mycroft     │ ★★★★★        │ Easy        │ Default choice  │
 │ Hey Jarvis      │ ★★★★☆        │ Easy        │ Pop culture     │
 │ Christopher     │ ★★☆☆☆        │ Medium      │ Unique name     │
 │ Hey Ezra        │ ★★☆☆☆        │ Medium      │ Alternative     │
 └─────────────────┴──────────────┴─────────────┴─────────────────┘
 Recommendations:
  - Start with: Hey Mycroft (most training data)
  - For media: Hey Jarvis (Plex/entertainment)
  - For uniqueness: Christopher or Hey Ezra
 Multiple wake words:
  - Use different wake words for different contexts
  - Example: "Hey Mycroft" for commands, "Hey Jarvis" for media
  - Server can run 2-3 models simultaneously
 EOF
 }
 # ----- Main -----
 main() {
    print_status info "Mycroft Precise Pre-trained Model Downloader"
    echo ""
    # Parse arguments
    parse_args "$@"
    # Check dependencies
    check_dependencies || exit 1
    # Create directory
    create_models_directory || exit 1
    # Show comparison
    if [[ -z "$SPECIFIC_MODEL" && "$TEST_ALL" != "true" ]]; then
        compare_models
        echo ""
        print_status info "Use --model <name> to download a specific model"
        print_status info "Use --test-all to download all models"
        echo ""
        list_downloaded_models
        exit 0
    fi
    # Download models
    if [[ -n "$SPECIFIC_MODEL" ]]; then
        # Download specific model
        download_model "$SPECIFIC_MODEL" || exit 1
        # Offer to test
        echo ""
        read -p "Test this model now? (y/N): " -n 1 -r
        echo
        if [[ $REPLY =~ ^[Yy]$ ]]; then
            test_model "$SPECIFIC_MODEL"
        fi
    elif [[ "$TEST_ALL" == "true" ]]; then
        # Download all models
        for model_name in "${!MODELS[@]}"; do
            download_model "$model_name"
            echo ""
        done
        # Offer to test each
        echo ""
        print_status success "All models downloaded"
        echo ""
        read -p "Test each model? (y/N): " -n 1 -r
        echo
        if [[ $REPLY =~ ^[Yy]$ ]]; then
            for model_name in "${!MODELS[@]}"; do
                if [[ -f "$MODELS_DIR/${model_name}.net" ]]; then
                    echo ""
                    test_model "$model_name"
                fi
            done
        fi
    fi
    # List results
    echo ""
    list_downloaded_models
    # Create multi-wake config if multiple models
    local model_count=$(find "$MODELS_DIR" -name "*.net" | wc -l)
    if [[ $model_count -gt 1 ]]; then
        echo ""
        create_multi_wake_config
    fi
    # Final instructions
    echo ""
    print_status success "Setup complete!"
    echo ""
    print_status info "Next steps:"
    print_status info "1. Test a model: precise-listen $MODELS_DIR/hey-mycroft.net"
    print_status info "2. Use in server: python voice_server.py --enable-precise --precise-model $MODELS_DIR/hey-mycroft.net"
    print_status info "3. Fine-tune: precise-train -e 30 custom.net . --from-checkpoint $MODELS_DIR/hey-mycroft.net"
    if [[ $model_count -gt 1 ]]; then
        echo ""
        print_status info "For multiple wake words:"
        print_status info "  $MODELS_DIR/multi-wake-config.sh"
    fi
 }
 # Run main
 main "$@"
--- a/scripts/quick_start_hey_mycroft.sh
+++ b/scripts/quick_start_hey_mycroft.sh
@ -0,0 +1,456 @@
 #!/usr/bin/env bash
 #
 # Path: quick_start_hey_mycroft.sh
 #
 # Purpose and usage:
 #     Zero-training quick start using pre-trained "Hey Mycroft" model
 #     Gets you a working voice assistant in 5 minutes!
 #
 # Requirements:
 #     - Heimdall already setup (ran setup_voice_assistant.sh)
 #     - Mycroft Precise installed (ran setup_precise.sh)
 #
 # Usage:
 #     ./quick_start_hey_mycroft.sh [--test-only]
 #
 # Author: PRbL Library
 # ----- PRbL Color and output functions -----
 RED='\033[0;31m'
 GREEN='\033[0;32m'
 YELLOW='\033[0;33m'
 BLUE='\033[0;34m'
 PURPLE='\033[0;35m'
 CYAN='\033[0;36m'
 NC='\033[0m'
 print_status() {
    local level="$1"
    shift
    case "$level" in
        "info")     echo -e "${BLUE}[INFO]${NC} $*" >&2 ;;
        "success")  echo -e "${GREEN}[SUCCESS]${NC} $*" >&2 ;;
        "warning")  echo -e "${YELLOW}[WARNING]${NC} $*" >&2 ;;
        "error")    echo -e "${RED}[ERROR]${NC} $*" >&2 ;;
        *)          echo -e "$*" >&2 ;;
    esac
 }
 # ----- Configuration -----
 MODELS_DIR="$HOME/precise-models/pretrained"
 MODEL_URL="https://github.com/MycroftAI/precise-data/raw/models-dev/hey-mycroft.tar.gz"
 MODEL_NAME="hey-mycroft"
 TEST_ONLY=false
 # ----- Parse arguments -----
 parse_args() {
    while [[ $# -gt 0 ]]; do
        case "$1" in
            --test-only)
                TEST_ONLY=true
                shift
                ;;
            -h|--help)
                cat << EOF
 Usage: $(basename "$0") [OPTIONS]
 Quick start with pre-trained "Hey Mycroft" wake word model.
 No training required!
 Options:
    --test-only     Just test the model, don't start server
    -h, --help      Show this help
 Examples:
    $(basename "$0")              # Download, test, and run server
    $(basename "$0") --test-only  # Just download and test
 EOF
                exit 0
                ;;
            *)
                print_status error "Unknown option: $1"
                exit 1
                ;;
        esac
    done
 }
 # ----- Functions -----
 check_prerequisites() {
    print_status info "Checking prerequisites..."
    # Check conda
    if ! command -v conda &> /dev/null; then
        print_status error "conda not found"
        return 1
    fi
    # Check precise environment
    if ! conda env list | grep -q "^precise\s"; then
        print_status error "Precise environment not found"
        print_status info "Run: ./setup_precise.sh first"
        return 1
    fi
    # Check voice-assistant directory
    if [[ ! -d "$HOME/voice-assistant" ]]; then
        print_status error "Voice assistant not setup"
        print_status info "Run: ./setup_voice_assistant.sh first"
        return 1
    fi
    print_status success "Prerequisites OK"
    return 0
 }
 download_pretrained_model() {
    print_status info "Downloading pre-trained 'Hey Mycroft' model..."
    # Create directory
    mkdir -p "$MODELS_DIR"
    # Check if already downloaded
    if [[ -f "$MODELS_DIR/${MODEL_NAME}.net" ]]; then
        print_status info "Model already downloaded"
        return 0
    fi
    # Download
    cd "$MODELS_DIR" || return 1
    print_status info "Fetching from GitHub..."
    wget -q --show-progress "$MODEL_URL" || {
        print_status error "Failed to download model"
        return 1
    }
    # Extract
    print_status info "Extracting model..."
    tar xzf hey-mycroft.tar.gz || {
        print_status error "Failed to extract model"
        return 1
    }
    # Verify
    if [[ ! -f "${MODEL_NAME}.net" ]]; then
        print_status error "Model file not found after extraction"
        return 1
    fi
    print_status success "Model downloaded: $MODELS_DIR/${MODEL_NAME}.net"
    return 0
 }
 test_model() {
    print_status info "Testing wake word model..."
    cd "$MODELS_DIR" || return 1
    # Activate conda
    eval "$(conda shell.bash hook)"
    conda activate precise || {
        print_status error "Failed to activate precise environment"
        return 1
    }
    cat << EOF
 ${CYAN}═══════════════════════════════════════════════════${NC}
 ${CYAN}    Wake Word Test: "Hey Mycroft"${NC}
 ${CYAN}═══════════════════════════════════════════════════${NC}
 ${YELLOW}Instructions:${NC}
 1. Speak "Hey Mycroft" into your microphone
 2. You should see ${GREEN}"!"${NC} when detected
 3. Try other phrases - should ${RED}not${NC} trigger
 4. Press ${RED}Ctrl+C${NC} when done testing
 ${CYAN}Starting in 3 seconds...${NC}
 EOF
    sleep 3
    # Test the model
    precise-listen "${MODEL_NAME}.net" || {
        print_status error "Model test failed"
        return 1
    }
    print_status success "Model test complete!"
    return 0
 }
 update_config() {
    print_status info "Updating voice assistant configuration..."
    local config_file="$HOME/voice-assistant/config/.env"
    if [[ ! -f "$config_file" ]]; then
        print_status error "Config file not found: $config_file"
        return 1
    fi
    # Update PRECISE_MODEL if exists, otherwise add it
    if grep -q "^PRECISE_MODEL=" "$config_file"; then
        sed -i "s|^PRECISE_MODEL=.*|PRECISE_MODEL=$MODELS_DIR/${MODEL_NAME}.net|" "$config_file"
    else
        echo "PRECISE_MODEL=$MODELS_DIR/${MODEL_NAME}.net" >> "$config_file"
    fi
    # Update sensitivity if not set
    if ! grep -q "^PRECISE_SENSITIVITY=" "$config_file"; then
        echo "PRECISE_SENSITIVITY=0.5" >> "$config_file"
    fi
    print_status success "Configuration updated"
    return 0
 }
 start_server() {
    print_status info "Starting voice assistant server..."
    cd "$HOME/voice-assistant" || return 1
    # Activate conda
    eval "$(conda shell.bash hook)"
    conda activate precise || {
        print_status error "Failed to activate environment"
        return 1
    }
    cat << EOF
 ${CYAN}═══════════════════════════════════════════════════${NC}
 ${GREEN}    Starting Voice Assistant Server${NC}
 ${CYAN}═══════════════════════════════════════════════════${NC}
 ${YELLOW}Configuration:${NC}
  Wake word: ${GREEN}Hey Mycroft${NC}
  Model: ${MODEL_NAME}.net
  Server: http://0.0.0.0:5000
 ${YELLOW}What to do next:${NC}
  1. Wait for "Precise listening started" message
  2. Say ${GREEN}"Hey Mycroft"${NC} to test wake word
  3. Say a command like ${GREEN}"turn on the lights"${NC}
  4. Check server logs for activity
 ${YELLOW}Press Ctrl+C to stop the server${NC}
 ${CYAN}Starting server...${NC}
 EOF
    # Check if HA token is set
    if ! grep -q "^HA_TOKEN=..*" config/.env; then
        print_status warning "Home Assistant token not set!"
        print_status warning "Commands won't execute without it."
        print_status info "Edit config/.env and add your HA token"
        echo
        read -p "Continue anyway? (y/N): " -n 1 -r
        echo
        if [[ ! $REPLY =~ ^[Yy]$ ]]; then
            return 1
        fi
    fi
    # Start server
    python voice_server.py \
        --enable-precise \
        --precise-model "$MODELS_DIR/${MODEL_NAME}.net" \
        --precise-sensitivity 0.5
    return $?
 }
 create_systemd_service() {
    print_status info "Creating systemd service..."
    local service_file="/etc/systemd/system/voice-assistant.service"
    # Check if we should update existing service
    if [[ -f "$service_file" ]]; then
        print_status warning "Service file already exists"
        read -p "Update with Hey Mycroft configuration? (y/N): " -n 1 -r
        echo
        if [[ ! $REPLY =~ ^[Yy]$ ]]; then
            return 0
        fi
    fi
    # Create service file
    sudo tee "$service_file" > /dev/null << EOF
 [Unit]
 Description=Voice Assistant with Hey Mycroft Wake Word
 After=network.target
 [Service]
 Type=simple
 User=$USER
 WorkingDirectory=$HOME/voice-assistant
 Environment="PATH=$HOME/miniconda3/envs/precise/bin:/usr/local/bin:/usr/bin:/bin"
 EnvironmentFile=$HOME/voice-assistant/config/.env
 ExecStart=$HOME/miniconda3/envs/precise/bin/python voice_server.py \\
    --enable-precise \\
    --precise-model $MODELS_DIR/${MODEL_NAME}.net \\
    --precise-sensitivity 0.5
 Restart=on-failure
 RestartSec=10
 StandardOutput=append:$HOME/voice-assistant/logs/voice_assistant.log
 StandardError=append:$HOME/voice-assistant/logs/voice_assistant_error.log
 [Install]
 WantedBy=multi-user.target
 EOF
    # Reload systemd
    sudo systemctl daemon-reload
    print_status success "Systemd service created"
    cat << EOF
 ${CYAN}To enable and start the service:${NC}
  sudo systemctl enable voice-assistant
  sudo systemctl start voice-assistant
  sudo systemctl status voice-assistant
 ${CYAN}To view logs:${NC}
  journalctl -u voice-assistant -f
 EOF
    read -p "Enable service now? (y/N): " -n 1 -r
    echo
    if [[ $REPLY =~ ^[Yy]$ ]]; then
        sudo systemctl enable voice-assistant
        sudo systemctl start voice-assistant
        sleep 2
        sudo systemctl status voice-assistant
    fi
 }
 print_next_steps() {
    cat << EOF
 ${GREEN}═══════════════════════════════════════════════════${NC}
 ${GREEN}    Success! Your voice assistant is ready!${NC}
 ${GREEN}═══════════════════════════════════════════════════${NC}
 ${CYAN}What you have:${NC}
  ✓ Pre-trained "Hey Mycroft" wake word
  ✓ Voice assistant server configured
  ✓ Ready to control Home Assistant
 ${CYAN}Quick test:${NC}
  1. Say: ${GREEN}"Hey Mycroft"${NC}
  2. Say: ${GREEN}"Turn on the living room lights"${NC}
  3. Check if command executed
 ${CYAN}Next steps:${NC}
  1. ${YELLOW}Configure Home Assistant entities${NC}
     Edit: ~/voice-assistant/config/.env
     Add: HA_TOKEN=your_token_here
  2. ${YELLOW}Add more entity mappings${NC}
     Edit: voice_server.py
     Update: IntentParser.ENTITY_MAP
  3. ${YELLOW}Fine-tune for your voice (optional)${NC}
     cd ~/precise-models/hey-mycroft-custom
     ./1-record-wake-word.sh
     # Record 20-30 samples
     precise-train -e 30 hey-mycroft-custom.net . \\
         --from-checkpoint $MODELS_DIR/${MODEL_NAME}.net
  4. ${YELLOW}Setup Maix Duino${NC}
     See: QUICKSTART.md Phase 2
 ${CYAN}Useful commands:${NC}
  # Test wake word only
  cd $MODELS_DIR && conda activate precise
  precise-listen ${MODEL_NAME}.net
  # Check server health
  curl http://localhost:5000/health
  # Monitor logs
  journalctl -u voice-assistant -f
 ${CYAN}Documentation:${NC}
  README.md               - Project overview
  WAKE_WORD_ADVANCED.md   - Multiple wake words guide
  QUICKSTART.md           - Complete setup guide
 ${GREEN}Happy voice assisting! 🎙️${NC}
 EOF
 }
 # ----- Main -----
 main() {
    cat << EOF
 ${CYAN}═══════════════════════════════════════════════════${NC}
 ${CYAN}    Quick Start: Hey Mycroft Wake Word${NC}
 ${CYAN}═══════════════════════════════════════════════════${NC}
 ${YELLOW}This script will:${NC}
  1. Download pre-trained "Hey Mycroft" model
  2. Test wake word detection
  3. Configure voice assistant server
  4. Start the server (optional)
 ${YELLOW}Total time: ~5 minutes (no training!)${NC}
 EOF
    # Parse arguments
    parse_args "$@"
    # Check prerequisites
    check_prerequisites || exit 1
    # Download model
    download_pretrained_model || exit 1
    # Test model
    print_status info "Ready to test wake word"
    read -p "Test now? (Y/n): " -n 1 -r
    echo
    if [[ ! $REPLY =~ ^[Nn]$ ]]; then
        test_model
    fi
    # If test-only mode, stop here
    if [[ "$TEST_ONLY" == "true" ]]; then
        print_status success "Test complete!"
        print_status info "Model location: $MODELS_DIR/${MODEL_NAME}.net"
        exit 0
    fi
    # Update configuration
    update_config || exit 1
    # Start server
    read -p "Start voice assistant server now? (Y/n): " -n 1 -r
    echo
    if [[ ! $REPLY =~ ^[Nn]$ ]]; then
        start_server
    else
        # Offer to create systemd service
        read -p "Create systemd service instead? (y/N): " -n 1 -r
        echo
        if [[ $REPLY =~ ^[Yy]$ ]]; then
            create_systemd_service
        fi
    fi
    # Print next steps
    print_next_steps
 }
 # Run main
 main "$@"
--- a/scripts/setup_precise.sh
+++ b/scripts/setup_precise.sh
@ -0,0 +1,630 @@
 #!/usr/bin/env bash
 #
 # Path: setup_precise.sh
 #
 # Purpose and usage:
 #     Sets up Mycroft Precise wake word detection on Heimdall
 #     - Creates conda environment for Precise
 #     - Installs TensorFlow 1.x and dependencies
 #     - Downloads precise-engine
 #     - Sets up training directories
 #     - Provides helper scripts for training
 #
 # Requirements:
 #     - conda/miniconda installed
 #     - Internet connection for downloads
 #     - Microphone for recording samples
 #
 # Usage:
 #     ./setup_precise.sh [--wake-word "phrase"] [--env-name NAME]
 #
 # Author: PRbL Library
 # Created: $(date +"%Y-%m-%d")
 # ----- PRbL Color and output functions -----
 RED='\033[0;31m'
 GREEN='\033[0;32m'
 YELLOW='\033[0;33m'
 BLUE='\033[0;34m'
 PURPLE='\033[0;35m'
 CYAN='\033[0;36m'
 NC='\033[0m' # No Color
 print_status() {
    local level="$1"
    shift
    case "$level" in
        "info")     echo -e "${BLUE}[INFO]${NC} $*" >&2 ;;
        "success")  echo -e "${GREEN}[SUCCESS]${NC} $*" >&2 ;;
        "warning")  echo -e "${YELLOW}[WARNING]${NC} $*" >&2 ;;
        "error")    echo -e "${RED}[ERROR]${NC} $*" >&2 ;;
        "debug")    [[ "$VERBOSE" == "true" ]] && echo -e "${PURPLE}[DEBUG]${NC} $*" >&2 ;;
        *)          echo -e "$*" >&2 ;;
    esac
 }
 # ----- Configuration -----
 CONDA_ENV_NAME="precise"
 WAKE_WORD="hey computer"
 MODELS_DIR="$HOME/precise-models"
 VERBOSE=false
 # ----- Dependency checking -----
 command_exists() {
    command -v "$1" &> /dev/null
 }
 check_conda() {
    if ! command_exists conda; then
        print_status error "conda not found. Please install miniconda first."
        return 1
    fi
    return 0
 }
 # ----- Parse arguments -----
 parse_args() {
    while [[ $# -gt 0 ]]; do
        case "$1" in
            --wake-word)
                WAKE_WORD="$2"
                shift 2
                ;;
            --env-name)
                CONDA_ENV_NAME="$2"
                shift 2
                ;;
            -v|--verbose)
                VERBOSE=true
                shift
                ;;
            -h|--help)
                cat << EOF
 Usage: $(basename "$0") [OPTIONS]
 Options:
    --wake-word "phrase"    Wake word to train (default: "hey computer")
    --env-name NAME         Custom conda environment name (default: precise)
    -v, --verbose           Enable verbose output
    -h, --help              Show this help message
 Examples:
    $(basename "$0") --wake-word "hey jarvis"
    $(basename "$0") --env-name mycroft-precise
 EOF
                exit 0
                ;;
            *)
                print_status error "Unknown option: $1"
                exit 1
                ;;
        esac
    done
 }
 # ----- Setup functions -----
 create_conda_environment() {
    print_status info "Creating conda environment: $CONDA_ENV_NAME"
    # Check if environment already exists
    if conda env list | grep -q "^${CONDA_ENV_NAME}\s"; then
        print_status warning "Environment $CONDA_ENV_NAME already exists"
        read -p "Remove and recreate? (y/N): " -n 1 -r
        echo
        if [[ $REPLY =~ ^[Yy]$ ]]; then
            print_status info "Removing existing environment..."
            conda env remove -n "$CONDA_ENV_NAME" -y
        else
            print_status info "Using existing environment"
            return 0
        fi
    fi
    # Create new environment with Python 3.7 (required for TF 1.15)
    print_status info "Creating Python 3.7 environment..."
    conda create -n "$CONDA_ENV_NAME" python=3.7 -y || {
        print_status error "Failed to create conda environment"
        return 1
    }
    print_status success "Conda environment created"
    return 0
 }
 install_tensorflow() {
    print_status info "Installing TensorFlow 1.15..."
    # Activate conda environment
    eval "$(conda shell.bash hook)"
    conda activate "$CONDA_ENV_NAME" || {
        print_status error "Failed to activate conda environment"
        return 1
    }
    # Install TensorFlow 1.15 (last 1.x version)
    pip install tensorflow==1.15.5 --break-system-packages || {
        print_status error "Failed to install TensorFlow"
        return 1
    }
    # Verify installation
    python -c "import tensorflow as tf; print(f'TensorFlow {tf.__version__} installed')" || {
        print_status error "TensorFlow installation verification failed"
        return 1
    }
    print_status success "TensorFlow 1.15 installed"
    return 0
 }
 install_precise() {
    print_status info "Installing Mycroft Precise..."
    # Activate conda environment
    eval "$(conda shell.bash hook)"
    conda activate "$CONDA_ENV_NAME" || {
        print_status error "Failed to activate conda environment"
        return 1
    }
    # Install audio dependencies
    print_status info "Installing system audio dependencies..."
    if command_exists apt-get; then
        sudo apt-get update
        sudo apt-get install -y portaudio19-dev sox libatlas-base-dev || {
            print_status warning "Some audio dependencies failed to install"
        }
    fi
    # Install Python audio libraries
    pip install pyaudio --break-system-packages || {
        print_status warning "PyAudio installation failed (may need manual installation)"
    }
    # Install Precise
    pip install mycroft-precise --break-system-packages || {
        print_status error "Failed to install Mycroft Precise"
        return 1
    }
    # Verify installation
    python -c "import precise_runner; print('Precise installed successfully')" || {
        print_status error "Precise installation verification failed"
        return 1
    }
    print_status success "Mycroft Precise installed"
    return 0
 }
 download_precise_engine() {
    print_status info "Downloading precise-engine..."
    local engine_version="0.3.0"
    local engine_url="https://github.com/MycroftAI/mycroft-precise/releases/download/v${engine_version}/precise-engine_${engine_version}_x86_64.tar.gz"
    local temp_dir=$(mktemp -d)
    # Download engine
    wget -q --show-progress -O "$temp_dir/precise-engine.tar.gz" "$engine_url" || {
        print_status error "Failed to download precise-engine"
        rm -rf "$temp_dir"
        return 1
    }
    # Extract
    tar xzf "$temp_dir/precise-engine.tar.gz" -C "$temp_dir" || {
        print_status error "Failed to extract precise-engine"
        rm -rf "$temp_dir"
        return 1
    }
    # Install to /usr/local/bin
    sudo cp "$temp_dir/precise-engine/precise-engine" /usr/local/bin/ || {
        print_status error "Failed to install precise-engine"
        rm -rf "$temp_dir"
        return 1
    }
    sudo chmod +x /usr/local/bin/precise-engine
    # Clean up
    rm -rf "$temp_dir"
    # Verify installation
    precise-engine --version || {
        print_status error "precise-engine installation verification failed"
        return 1
    }
    print_status success "precise-engine installed"
    return 0
 }
 create_training_directory() {
    print_status info "Creating training directory structure..."
    # Sanitize wake word for directory name
    local wake_word_dir=$(echo "$WAKE_WORD" | tr ' ' '-' | tr '[:upper:]' '[:lower:]')
    local project_dir="$MODELS_DIR/$wake_word_dir"
    mkdir -p "$project_dir"/{wake-word,not-wake-word,test/wake-word,test/not-wake-word}
    print_status success "Training directory created: $project_dir"
    # Store project path for later use
    echo "$project_dir" > "$MODELS_DIR/.current_project"
    return 0
 }
 create_training_scripts() {
    print_status info "Creating training helper scripts..."
    local wake_word_dir=$(echo "$WAKE_WORD" | tr ' ' '-' | tr '[:upper:]' '[:lower:]')
    local project_dir="$MODELS_DIR/$wake_word_dir"
    # Create recording script
    cat > "$project_dir/1-record-wake-word.sh" << 'EOF'
 #!/bin/bash
 # Step 1: Record wake word samples
 # Run this script and follow the prompts to record ~50-100 samples
 eval "$(conda shell.bash hook)"
 conda activate precise
 echo "Recording wake word samples..."
 echo "Press SPACE to start/stop recording"
 echo "Press Ctrl+C when done (aim for 50-100 samples)"
 echo ""
 precise-collect
 EOF
    # Create not-wake-word recording script
    cat > "$project_dir/2-record-not-wake-word.sh" << 'EOF'
 #!/bin/bash
 # Step 2: Record "not wake word" samples
 # Record random speech, TV, music, similar-sounding phrases
 eval "$(conda shell.bash hook)"
 conda activate precise
 echo "Recording not-wake-word samples..."
 echo "Record:"
 echo "  - Normal conversation"
 echo "  - TV/music background"
 echo "  - Similar sounding phrases"
 echo "  - Ambient noise"
 echo ""
 echo "Press SPACE to start/stop recording"
 echo "Press Ctrl+C when done (aim for 200-500 samples)"
 echo ""
 precise-collect -f not-wake-word/samples.wav
 EOF
    # Create training script
    cat > "$project_dir/3-train-model.sh" << EOF
 #!/bin/bash
 # Step 3: Train the model
 # This will train for 60 epochs (adjust -e parameter for more/less)
 eval "\$(conda shell.bash hook)"
 conda activate precise
 echo "Training wake word model..."
 echo "This will take 30-60 minutes..."
 echo ""
 # Train model
 precise-train -e 60 ${wake_word_dir}.net .
 echo ""
 echo "Training complete!"
 echo "Test with: precise-listen ${wake_word_dir}.net"
 EOF
    # Create testing script
    cat > "$project_dir/4-test-model.sh" << EOF
 #!/bin/bash
 # Step 4: Test the model with live microphone
 eval "\$(conda shell.bash hook)"
 conda activate precise
 echo "Testing wake word model..."
 echo "Speak your wake word - you should see '!' when detected"
 echo "Speak other phrases - should not trigger"
 echo ""
 echo "Press Ctrl+C to exit"
 echo ""
 precise-listen ${wake_word_dir}.net
 EOF
    # Create evaluation script
    cat > "$project_dir/5-evaluate-model.sh" << EOF
 #!/bin/bash
 # Step 5: Evaluate model on test set
 eval "\$(conda shell.bash hook)"
 conda activate precise
 echo "Evaluating wake word model on test set..."
 echo ""
 precise-test ${wake_word_dir}.net test/
 echo ""
 echo "Check metrics above:"
 echo "  - Wake word accuracy should be >95%"
 echo "  - False positive rate should be <5%"
 EOF
    # Create tuning script
    cat > "$project_dir/6-tune-threshold.sh" << EOF
 #!/bin/bash
 # Step 6: Tune activation threshold
 eval "\$(conda shell.bash hook)"
 conda activate precise
 echo "Testing different thresholds..."
 echo ""
 echo "Default threshold: 0.5"
 echo "Higher = fewer false positives, may miss some wake words"
 echo "Lower = catch more wake words, more false positives"
 echo ""
 for threshold in 0.3 0.5 0.7; do
    echo "Testing threshold: \$threshold"
    echo "Press Ctrl+C to try next threshold"
    precise-listen ${wake_word_dir}.net -t \$threshold
 done
 EOF
    # Make all scripts executable
    chmod +x "$project_dir"/*.sh
    print_status success "Training scripts created in $project_dir"
    return 0
 }
 create_readme() {
    print_status info "Creating README..."
    local wake_word_dir=$(echo "$WAKE_WORD" | tr ' ' '-' | tr '[:upper:]' '[:lower:]')
    local project_dir="$MODELS_DIR/$wake_word_dir"
    cat > "$project_dir/README.md" << EOF
 # Wake Word Training: "$WAKE_WORD"
 ## Quick Start
 Follow these steps in order:
 ### 1. Record Wake Word Samples
 \`\`\`bash
 ./1-record-wake-word.sh
 \`\`\`
 Record 50-100 samples:
 - Vary your tone and speed
 - Different distances from microphone
 - Different background noise levels
 - Have family members record too
 ### 2. Record Not-Wake-Word Samples
 \`\`\`bash
 ./2-record-not-wake-word.sh
 \`\`\`
 Record 200-500 samples of:
 - Normal conversation
 - TV/music in background
 - Similar sounding phrases
 - Ambient household noise
 ### 3. Organize Samples
 Move files into training/test split:
 \`\`\`bash
 # 80% of wake-word samples go to:
 mv wake-word-samples-* wake-word/
 # 20% of wake-word samples go to:
 mv wake-word-samples-* test/wake-word/
 # 80% of not-wake-word samples go to:
 mv not-wake-word-samples-* not-wake-word/
 # 20% of not-wake-word samples go to:
 mv not-wake-word-samples-* test/not-wake-word/
 \`\`\`
 ### 4. Train Model
 \`\`\`bash
 ./3-train-model.sh
 \`\`\`
 Wait 30-60 minutes for training to complete.
 ### 5. Test Model
 \`\`\`bash
 ./4-test-model.sh
 \`\`\`
 Speak your wake word and verify detection.
 ### 6. Evaluate Model
 \`\`\`bash
 ./5-evaluate-model.sh
 \`\`\`
 Check accuracy metrics on test set.
 ### 7. Tune Threshold
 \`\`\`bash
 ./6-tune-threshold.sh
 \`\`\`
 Find the best threshold for your environment.
 ## Tips for Good Training
 1. **Quality over quantity** - Clear samples are better than many poor ones
 2. **Diverse conditions** - Different noise levels, distances, speakers
 3. **Hard negatives** - Include similar-sounding phrases in not-wake-word set
 4. **Regular updates** - Add false positives/negatives and retrain
 ## Next Steps
 Once trained and tested:
 1. Copy model to voice assistant server:
   \`\`\`bash
   cp ${wake_word_dir}.net ~/voice-assistant/models/
   \`\`\`
 2. Update voice assistant config:
   \`\`\`bash
   vim ~/voice-assistant/config/.env
   # Set: PRECISE_MODEL=~/voice-assistant/models/${wake_word_dir}.net
   \`\`\`
 3. Restart voice assistant service:
   \`\`\`bash
   sudo systemctl restart voice-assistant
   \`\`\`
 ## Troubleshooting
 **Low accuracy?**
 - Collect more training samples
 - Increase training epochs (edit 3-train-model.sh, change -e 60 to -e 120)
 - Verify 80/20 train/test split
 **Too many false positives?**
 - Increase threshold (use 6-tune-threshold.sh)
 - Add false trigger audio to not-wake-word set
 - Retrain with more diverse negative samples
 **Misses wake words?**
 - Lower threshold
 - Add missed samples to training set
 - Ensure good audio quality
 ## Resources
 - Mycroft Precise Docs: https://github.com/MycroftAI/mycroft-precise
 - Training Guide: https://mycroft-ai.gitbook.io/docs/mycroft-technologies/precise
 - Community Models: https://github.com/MycroftAI/precise-data
 EOF
    print_status success "README created in $project_dir"
    return 0
 }
 download_pretrained_models() {
    print_status info "Downloading pre-trained models..."
    # Create models directory
    mkdir -p "$MODELS_DIR/pretrained"
    # Download Hey Mycroft model (as example/base)
    local model_url="https://github.com/MycroftAI/precise-data/raw/models-dev/hey-mycroft.tar.gz"
    if [[ ! -f "$MODELS_DIR/pretrained/hey-mycroft.net" ]]; then
        print_status info "Downloading Hey Mycroft model..."
        wget -q --show-progress -O "$MODELS_DIR/pretrained/hey-mycroft.tar.gz" "$model_url" || {
            print_status warning "Failed to download pre-trained model (optional)"
            return 0
        }
        tar xzf "$MODELS_DIR/pretrained/hey-mycroft.tar.gz" -C "$MODELS_DIR/pretrained/" || {
            print_status warning "Failed to extract pre-trained model"
            return 0
        }
        print_status success "Pre-trained model downloaded"
    else
        print_status info "Pre-trained model already exists"
    fi
    return 0
 }
 print_next_steps() {
    local wake_word_dir=$(echo "$WAKE_WORD" | tr ' ' '-' | tr '[:upper:]' '[:lower:]')
    local project_dir="$MODELS_DIR/$wake_word_dir"
    cat << EOF
 ${GREEN}Setup complete!${NC}
 Wake word: "$WAKE_WORD"
 Project directory: $project_dir
 ${BLUE}Next steps:${NC}
 1. ${CYAN}Activate conda environment:${NC}
   conda activate $CONDA_ENV_NAME
 2. ${CYAN}Navigate to project directory:${NC}
   cd $project_dir
 3. ${CYAN}Follow the README or run scripts in order:${NC}
   ./1-record-wake-word.sh      # Record wake word samples
   ./2-record-not-wake-word.sh  # Record negative samples
   # Organize samples into train/test directories
   ./3-train-model.sh           # Train the model (30-60 min)
   ./4-test-model.sh            # Test with microphone
   ./5-evaluate-model.sh        # Check accuracy metrics
   ./6-tune-threshold.sh        # Find best threshold
 ${BLUE}Helpful commands:${NC}
 Test pre-trained model:
  conda activate $CONDA_ENV_NAME
  precise-listen $MODELS_DIR/pretrained/hey-mycroft.net
 Check precise-engine:
  precise-engine --version
 ${BLUE}Resources:${NC}
 Full guide: See MYCROFT_PRECISE_GUIDE.md
 Project README: $project_dir/README.md
 Mycroft Docs: https://github.com/MycroftAI/mycroft-precise
 EOF
 }
 # ----- Main -----
 main() {
    print_status info "Starting Mycroft Precise setup..."
    # Parse arguments
    parse_args "$@"
    # Check dependencies
    check_conda || exit 1
    # Setup steps
    create_conda_environment || exit 1
    install_tensorflow || exit 1
    install_precise || exit 1
    download_precise_engine || exit 1
    create_training_directory || exit 1
    create_training_scripts || exit 1
    create_readme || exit 1
    download_pretrained_models || exit 1
    # Print next steps
    print_next_steps
 }
 # Run main
 main "$@"
--- a/scripts/setup_voice_assistant.sh
+++ b/scripts/setup_voice_assistant.sh
@ -0,0 +1,429 @@
 #!/usr/bin/env bash
 #
 # Path: setup_voice_assistant.sh
 #
 # Purpose and usage:
 #     Sets up the voice assistant server environment on Heimdall
 #     - Creates conda environment
 #     - Installs dependencies (Whisper, Flask, Piper TTS)
 #     - Downloads and configures TTS models
 #     - Sets up systemd service (optional)
 #     - Configures environment variables
 #
 # Requirements:
 #     - conda/miniconda installed
 #     - Internet connection for downloads
 #     - Sudo access (for systemd service setup)
 #
 # Usage:
 #     ./setup_voice_assistant.sh [--no-service] [--env-name NAME]
 #
 # Author: PRbL Library
 # Created: $(date +"%Y-%m-%d")
 # ----- PRbL Color and output functions -----
 RED='\033[0;31m'
 GREEN='\033[0;32m'
 YELLOW='\033[0;33m'
 BLUE='\033[0;34m'
 PURPLE='\033[0;35m'
 CYAN='\033[0;36m'
 NC='\033[0m' # No Color
 print_status() {
    local level="$1"
    shift
    case "$level" in
        "info")     echo -e "${BLUE}[INFO]${NC} $*" >&2 ;;
        "success")  echo -e "${GREEN}[SUCCESS]${NC} $*" >&2 ;;
        "warning")  echo -e "${YELLOW}[WARNING]${NC} $*" >&2 ;;
        "error")    echo -e "${RED}[ERROR]${NC} $*" >&2 ;;
        "debug")    [[ "$VERBOSE" == "true" ]] && echo -e "${PURPLE}[DEBUG]${NC} $*" >&2 ;;
        *)          echo -e "$*" >&2 ;;
    esac
 }
 # ----- Configuration -----
 CONDA_ENV_NAME="voice-assistant"
 PROJECT_DIR="$HOME/voice-assistant"
 INSTALL_SYSTEMD=true
 VERBOSE=false
 # ----- Dependency checking -----
 command_exists() {
    command -v "$1" &> /dev/null
 }
 check_conda() {
    if ! command_exists conda; then
        print_status error "conda not found. Please install miniconda first."
        print_status info "Install with: wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh"
        print_status info "             bash Miniconda3-latest-Linux-x86_64.sh"
        return 1
    fi
    return 0
 }
 # ----- Parse arguments -----
 parse_args() {
    while [[ $# -gt 0 ]]; do
        case "$1" in
            --no-service)
                INSTALL_SYSTEMD=false
                shift
                ;;
            --env-name)
                CONDA_ENV_NAME="$2"
                shift 2
                ;;
            -v|--verbose)
                VERBOSE=true
                shift
                ;;
            -h|--help)
                cat << EOF
 Usage: $(basename "$0") [OPTIONS]
 Options:
    --no-service        Don't install systemd service
    --env-name NAME     Custom conda environment name (default: voice-assistant)
    -v, --verbose       Enable verbose output
    -h, --help          Show this help message
 EOF
                exit 0
                ;;
            *)
                print_status error "Unknown option: $1"
                exit 1
                ;;
        esac
    done
 }
 # ----- Setup functions -----
 create_project_directory() {
    print_status info "Creating project directory: $PROJECT_DIR"
    if [[ ! -d "$PROJECT_DIR" ]]; then
        mkdir -p "$PROJECT_DIR" || {
            print_status error "Failed to create project directory"
            return 1
        }
    fi
    # Create subdirectories
    mkdir -p "$PROJECT_DIR"/{logs,models,config}
    print_status success "Project directory created"
    return 0
 }
 create_conda_environment() {
    print_status info "Creating conda environment: $CONDA_ENV_NAME"
    # Check if environment already exists
    if conda env list | grep -q "^${CONDA_ENV_NAME}\s"; then
        print_status warning "Environment $CONDA_ENV_NAME already exists"
        read -p "Remove and recreate? (y/N): " -n 1 -r
        echo
        if [[ $REPLY =~ ^[Yy]$ ]]; then
            print_status info "Removing existing environment..."
            conda env remove -n "$CONDA_ENV_NAME" -y
        else
            print_status info "Using existing environment"
            return 0
        fi
    fi
    # Create new environment
    print_status info "Creating Python 3.10 environment..."
    conda create -n "$CONDA_ENV_NAME" python=3.10 -y || {
        print_status error "Failed to create conda environment"
        return 1
    }
    print_status success "Conda environment created"
    return 0
 }
 install_python_dependencies() {
    print_status info "Installing Python dependencies..."
    # Activate conda environment
    eval "$(conda shell.bash hook)"
    conda activate "$CONDA_ENV_NAME" || {
        print_status error "Failed to activate conda environment"
        return 1
    }
    # Install base dependencies
    print_status info "Installing base packages..."
    pip install --upgrade pip --break-system-packages || true
    # Install Whisper (OpenAI)
    print_status info "Installing OpenAI Whisper..."
    pip install -U openai-whisper --break-system-packages || {
        print_status error "Failed to install Whisper"
        return 1
    }
    # Install Flask
    print_status info "Installing Flask..."
    pip install flask --break-system-packages || {
        print_status error "Failed to install Flask"
        return 1
    }
    # Install requests
    print_status info "Installing requests..."
    pip install requests --break-system-packages || {
        print_status error "Failed to install requests"
        return 1
    }
    # Install python-dotenv
    print_status info "Installing python-dotenv..."
    pip install python-dotenv --break-system-packages || {
        print_status warning "Failed to install python-dotenv (optional)"
    }
    # Install Piper TTS
    print_status info "Installing Piper TTS..."
    # Note: Piper TTS installation method varies, adjust as needed
    # For now, we'll install the Python package if available
    pip install piper-tts --break-system-packages || {
        print_status warning "Piper TTS pip package not found"
        print_status info "You may need to install Piper manually from: https://github.com/rhasspy/piper"
    }
    # Install PyAudio for audio handling
    print_status info "Installing PyAudio dependencies..."
    if command_exists apt-get; then
        sudo apt-get install -y portaudio19-dev python3-pyaudio || {
            print_status warning "Failed to install portaudio dev packages"
        }
    fi
    pip install pyaudio --break-system-packages || {
        print_status warning "Failed to install PyAudio (may need manual installation)"
    }
    print_status success "Python dependencies installed"
    return 0
 }
 download_piper_models() {
    print_status info "Downloading Piper TTS models..."
    local models_dir="$PROJECT_DIR/models/piper"
    mkdir -p "$models_dir"
    # Download a default voice model
    # Example: en_US-lessac-medium
    local model_url="https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx"
    local config_url="https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx.json"
    if [[ ! -f "$models_dir/en_US-lessac-medium.onnx" ]]; then
        print_status info "Downloading voice model..."
        wget -q --show-progress -O "$models_dir/en_US-lessac-medium.onnx" "$model_url" || {
            print_status warning "Failed to download Piper model (manual download may be needed)"
        }
        wget -q --show-progress -O "$models_dir/en_US-lessac-medium.onnx.json" "$config_url" || {
            print_status warning "Failed to download Piper config"
        }
    else
        print_status info "Piper model already downloaded"
    fi
    print_status success "Piper models ready"
    return 0
 }
 create_config_file() {
    print_status info "Creating configuration file..."
    local config_file="$PROJECT_DIR/config/.env"
    if [[ -f "$config_file" ]]; then
        print_status warning "Config file already exists: $config_file"
        return 0
    fi
    cat > "$config_file" << 'EOF'
 # Voice Assistant Configuration
 # Path: ~/voice-assistant/config/.env
 # Home Assistant Configuration
 HA_URL=http://homeassistant.local:8123
 HA_TOKEN=your_long_lived_access_token_here
 # Server Configuration
 SERVER_HOST=0.0.0.0
 SERVER_PORT=5000
 # Whisper Configuration
 WHISPER_MODEL=medium
 # Piper TTS Configuration
 PIPER_MODEL=/path/to/piper/model.onnx
 PIPER_CONFIG=/path/to/piper/model.onnx.json
 # Logging
 LOG_LEVEL=INFO
 LOG_FILE=/home/$USER/voice-assistant/logs/voice_assistant.log
 EOF
    # Update paths in config
    sed -i "s|/path/to/piper/model.onnx|$PROJECT_DIR/models/piper/en_US-lessac-medium.onnx|g" "$config_file"
    sed -i "s|/path/to/piper/model.onnx.json|$PROJECT_DIR/models/piper/en_US-lessac-medium.onnx.json|g" "$config_file"
    sed -i "s|/home/\$USER|$HOME|g" "$config_file"
    chmod 600 "$config_file"
    print_status success "Config file created: $config_file"
    print_status warning "Please edit $config_file and add your Home Assistant token"
    return 0
 }
 create_systemd_service() {
    if [[ "$INSTALL_SYSTEMD" != "true" ]]; then
        print_status info "Skipping systemd service installation"
        return 0
    fi
    print_status info "Creating systemd service..."
    local service_file="/etc/systemd/system/voice-assistant.service"
    # Create service file
    sudo tee "$service_file" > /dev/null << EOF
 [Unit]
 Description=Voice Assistant Server
 After=network.target
 [Service]
 Type=simple
 User=$USER
 WorkingDirectory=$PROJECT_DIR
 Environment="PATH=$HOME/miniconda3/envs/$CONDA_ENV_NAME/bin:/usr/local/bin:/usr/bin:/bin"
 EnvironmentFile=$PROJECT_DIR/config/.env
 ExecStart=$HOME/miniconda3/envs/$CONDA_ENV_NAME/bin/python $PROJECT_DIR/voice_server.py
 Restart=on-failure
 RestartSec=10
 StandardOutput=append:$PROJECT_DIR/logs/voice_assistant.log
 StandardError=append:$PROJECT_DIR/logs/voice_assistant_error.log
 [Install]
 WantedBy=multi-user.target
 EOF
    # Reload systemd
    sudo systemctl daemon-reload
    print_status success "Systemd service created"
    print_status info "To enable and start the service:"
    print_status info "  sudo systemctl enable voice-assistant"
    print_status info "  sudo systemctl start voice-assistant"
    return 0
 }
 create_test_script() {
    print_status info "Creating test script..."
    local test_script="$PROJECT_DIR/test_server.sh"
    cat > "$test_script" << 'EOF'
 #!/bin/bash
 # Test script for voice assistant server
 # Activate conda environment
 eval "$(conda shell.bash hook)"
 conda activate voice-assistant
 # Load environment variables
 if [[ -f ~/voice-assistant/config/.env ]]; then
    export $(grep -v '^#' ~/voice-assistant/config/.env | xargs)
 fi
 # Run server
 cd ~/voice-assistant
 python voice_server.py --verbose
 EOF
    chmod +x "$test_script"
    print_status success "Test script created: $test_script"
    return 0
 }
 install_voice_server_script() {
    print_status info "Installing voice_server.py..."
    # Check if voice_server.py exists in outputs
    if [[ -f "$HOME/voice_server.py" ]]; then
        cp "$HOME/voice_server.py" "$PROJECT_DIR/voice_server.py"
        print_status success "voice_server.py installed"
    elif [[ -f "./voice_server.py" ]]; then
        cp "./voice_server.py" "$PROJECT_DIR/voice_server.py"
        print_status success "voice_server.py installed"
    else
        print_status warning "voice_server.py not found in current directory"
        print_status info "Please copy voice_server.py to $PROJECT_DIR manually"
    fi
    return 0
 }
 # ----- Main -----
 main() {
    print_status info "Starting voice assistant setup..."
    # Parse arguments
    parse_args "$@"
    # Check dependencies
    check_conda || exit 1
    # Setup steps
    create_project_directory || exit 1
    create_conda_environment || exit 1
    install_python_dependencies || exit 1
    download_piper_models || exit 1
    create_config_file || exit 1
    install_voice_server_script || exit 1
    create_test_script || exit 1
    if [[ "$INSTALL_SYSTEMD" == "true" ]]; then
        create_systemd_service || exit 1
    fi
    # Final instructions
    print_status success "Setup complete!"
    echo
    print_status info "Next steps:"
    print_status info "1. Edit config file: vim $PROJECT_DIR/config/.env"
    print_status info "2. Add your Home Assistant long-lived access token"
    print_status info "3. Test the server: $PROJECT_DIR/test_server.sh"
    print_status info "4. Configure your Maix Duino device"
    if [[ "$INSTALL_SYSTEMD" == "true" ]]; then
        echo
        print_status info "To run as a service:"
        print_status info "  sudo systemctl enable voice-assistant"
        print_status info "  sudo systemctl start voice-assistant"
        print_status info "  sudo systemctl status voice-assistant"
    fi
    echo
    print_status info "Project directory: $PROJECT_DIR"
    print_status info "Conda environment: $CONDA_ENV_NAME"
    print_status info "Activate with: conda activate $CONDA_ENV_NAME"
 }
 # Run main
 main "$@"
--- a/scripts/voice_server.py
+++ b/scripts/voice_server.py
@ -0,0 +1,700 @@
 #!/usr/bin/env python3
 """
 Voice Processing Server for Maix Duino Voice Assistant
 Purpose and usage:
    This server runs on Heimdall (10.1.10.71) and handles:
    - Audio stream reception from Maix Duino
    - Speech-to-text using Whisper
    - Intent recognition and Home Assistant API calls
    - Text-to-speech using Piper
    - Audio response streaming back to device
 Path: /home/alan/voice-assistant/voice_server.py
 Requirements:
    - whisper (already installed)
    - piper-tts
    - flask
    - requests
    - python-dotenv
 Usage:
    python3 voice_server.py [--host HOST] [--port PORT] [--ha-url URL]
 """
 import os
 import sys
 import argparse
 import tempfile
 import wave
 import io
 import re
 import threading
 import queue
 from pathlib import Path
 from typing import Optional, Dict, Any, Tuple
 import whisper
 import requests
 from flask import Flask, request, jsonify, send_file
 from werkzeug.exceptions import BadRequest
 # Try to load environment variables
 try:
    from dotenv import load_dotenv
    load_dotenv()
 except ImportError:
    print("Warning: python-dotenv not installed. Using environment variables only.")
 # Try to import Mycroft Precise
 PRECISE_AVAILABLE = False
 try:
    from precise_runner import PreciseEngine, PreciseRunner
    import pyaudio
    PRECISE_AVAILABLE = True
 except ImportError:
    print("Warning: Mycroft Precise not installed. Wake word detection disabled.")
    print("Install with: pip install mycroft-precise pyaudio")
 # Configuration
 DEFAULT_HOST = "0.0.0.0"
 DEFAULT_PORT = 5000
 DEFAULT_WHISPER_MODEL = "medium"
 DEFAULT_HA_URL = os.getenv("HA_URL", "http://homeassistant.local:8123")
 DEFAULT_HA_TOKEN = os.getenv("HA_TOKEN", "")
 DEFAULT_PRECISE_MODEL = os.getenv("PRECISE_MODEL", "")
 DEFAULT_PRECISE_SENSITIVITY = float(os.getenv("PRECISE_SENSITIVITY", "0.5"))
 DEFAULT_PRECISE_ENGINE = "/usr/local/bin/precise-engine"
 # Initialize Flask app
 app = Flask(__name__)
 app.config['MAX_CONTENT_LENGTH'] = 16 * 1024 * 1024  # 16MB max audio file
 # Global variables for loaded models
 whisper_model = None
 ha_client = None
 precise_runner = None
 precise_enabled = False
 wake_word_queue = queue.Queue()  # Queue for wake word detections
 class HomeAssistantClient:
    """Client for interacting with Home Assistant API"""
    def __init__(self, base_url: str, token: str):
        self.base_url = base_url.rstrip('/')
        self.token = token
        self.session = requests.Session()
        self.session.headers.update({
            'Authorization': f'Bearer {token}',
            'Content-Type': 'application/json'
        })
    def get_state(self, entity_id: str) -> Optional[Dict[str, Any]]:
        """Get the state of an entity"""
        try:
            response = self.session.get(f'{self.base_url}/api/states/{entity_id}')
            response.raise_for_status()
            return response.json()
        except requests.RequestException as e:
            print(f"Error getting state for {entity_id}: {e}")
            return None
    def call_service(self, domain: str, service: str, entity_id: str, 
                    **kwargs) -> bool:
        """Call a Home Assistant service"""
        try:
            data = {'entity_id': entity_id}
            data.update(kwargs)
            response = self.session.post(
                f'{self.base_url}/api/services/{domain}/{service}',
                json=data
            )
            response.raise_for_status()
            return True
        except requests.RequestException as e:
            print(f"Error calling service {domain}.{service}: {e}")
            return False
    def turn_on(self, entity_id: str, **kwargs) -> bool:
        """Turn on an entity"""
        domain = entity_id.split('.')[0]
        return self.call_service(domain, 'turn_on', entity_id, **kwargs)
    def turn_off(self, entity_id: str, **kwargs) -> bool:
        """Turn off an entity"""
        domain = entity_id.split('.')[0]
        return self.call_service(domain, 'turn_off', entity_id, **kwargs)
    def toggle(self, entity_id: str, **kwargs) -> bool:
        """Toggle an entity"""
        domain = entity_id.split('.')[0]
        return self.call_service(domain, 'toggle', entity_id, **kwargs)
 class IntentParser:
    """Simple pattern-based intent recognition"""
    # Intent patterns (can be expanded or replaced with ML-based NLU)
    PATTERNS = {
        'turn_on': [
            r'turn on (the )?(.+)',
            r'switch on (the )?(.+)',
            r'enable (the )?(.+)',
        ],
        'turn_off': [
            r'turn off (the )?(.+)',
            r'switch off (the )?(.+)',
            r'disable (the )?(.+)',
        ],
        'toggle': [
            r'toggle (the )?(.+)',
        ],
        'get_state': [
            r'what(?:\'s| is) (the )?(.+)',
            r'how is (the )?(.+)',
            r'status of (the )?(.+)',
        ],
        'get_temperature': [
            r'what(?:\'s| is) the temperature',
            r'how (?:warm|cold|hot) is it',
        ],
    }
    # Entity name mapping (friendly names to entity IDs)
    ENTITY_MAP = {
        'living room light': 'light.living_room',
        'living room lights': 'light.living_room',
        'bedroom light': 'light.bedroom',
        'bedroom lights': 'light.bedroom',
        'kitchen light': 'light.kitchen',
        'kitchen lights': 'light.kitchen',
        'all lights': 'group.all_lights',
        'temperature': 'sensor.temperature',
        'thermostat': 'climate.thermostat',
    }
    def parse(self, text: str) -> Optional[Tuple[str, str, Dict[str, Any]]]:
        """
        Parse text into intent, entity, and parameters
        Returns:
            (intent, entity_id, params) or None if no match
        """
        text = text.lower().strip()
        for intent, patterns in self.PATTERNS.items():
            for pattern in patterns:
                match = re.match(pattern, text, re.IGNORECASE)
                if match:
                    # Extract entity name from match groups
                    entity_name = None
                    for group in match.groups():
                        if group and group.lower() not in ['the', 'a', 'an']:
                            entity_name = group.lower().strip()
                            break
                    # Map entity name to entity ID
                    entity_id = None
                    if entity_name:
                        entity_id = self.ENTITY_MAP.get(entity_name)
                    # For get_temperature, use default sensor
                    if intent == 'get_temperature':
                        entity_id = self.ENTITY_MAP.get('temperature')
                    if entity_id:
                        return (intent, entity_id, {})
        return None
 def load_whisper_model(model_name: str = DEFAULT_WHISPER_MODEL):
    """Load Whisper model"""
    global whisper_model
    if whisper_model is None:
        print(f"Loading Whisper model: {model_name}")
        whisper_model = whisper.load_model(model_name)
        print("Whisper model loaded successfully")
    return whisper_model
 def transcribe_audio(audio_file_path: str) -> Optional[str]:
    """Transcribe audio file using Whisper"""
    try:
        model = load_whisper_model()
        result = model.transcribe(audio_file_path)
        return result['text'].strip()
    except Exception as e:
        print(f"Error transcribing audio: {e}")
        return None
 def generate_tts(text: str) -> Optional[bytes]:
    """
    Generate speech from text using Piper TTS
    TODO: Implement Piper TTS integration
    For now, returns None - implement based on Piper installation
    """
    # Placeholder for TTS implementation
    print(f"TTS requested for: {text}")
    # You'll need to add Piper TTS integration here
    # Example command: piper --model <model> --output_file <file> < text
    return None
 def on_wake_word_detected():
    """
    Callback when Mycroft Precise detects wake word
    This function is called by the Precise runner when the wake word
    is detected. It signals the main application to start recording
    and processing the user's command.
    """
    print("Wake word detected by Precise!")
    wake_word_queue.put({
        'timestamp': time.time(),
        'source': 'precise'
    })
 def start_precise_listener(model_path: str, sensitivity: float = 0.5,
                          engine_path: str = DEFAULT_PRECISE_ENGINE):
    """
    Start Mycroft Precise wake word detection
    Args:
        model_path: Path to .net model file
        sensitivity: Detection threshold (0.0-1.0, default 0.5)
        engine_path: Path to precise-engine binary
    Returns:
        PreciseRunner instance if successful, None otherwise
    """
    global precise_runner, precise_enabled
    if not PRECISE_AVAILABLE:
        print("Error: Mycroft Precise not available")
        return None
    # Verify model exists
    if not os.path.exists(model_path):
        print(f"Error: Precise model not found: {model_path}")
        return None
    # Verify engine exists
    if not os.path.exists(engine_path):
        print(f"Error: precise-engine not found: {engine_path}")
        print("Download from: https://github.com/MycroftAI/mycroft-precise/releases")
        return None
    try:
        # Create Precise engine
        engine = PreciseEngine(engine_path, model_path)
        # Create runner with callback
        precise_runner = PreciseRunner(
            engine,
            sensitivity=sensitivity,
            on_activation=on_wake_word_detected
        )
        # Start listening
        precise_runner.start()
        precise_enabled = True
        print(f"Precise listening started:")
        print(f"  Model: {model_path}")
        print(f"  Sensitivity: {sensitivity}")
        print(f"  Engine: {engine_path}")
        return precise_runner
    except Exception as e:
        print(f"Error starting Precise: {e}")
        return None
 def stop_precise_listener():
    """Stop Mycroft Precise wake word detection"""
    global precise_runner, precise_enabled
    if precise_runner:
        try:
            precise_runner.stop()
            precise_enabled = False
            print("Precise listener stopped")
        except Exception as e:
            print(f"Error stopping Precise: {e}")
 def record_audio_after_wake(duration: int = 5) -> Optional[bytes]:
    """
    Record audio after wake word is detected
    Args:
        duration: Maximum recording duration in seconds
    Returns:
        WAV audio data or None
    Note: This is for server-side wake word detection where
    the server is also doing audio capture. For Maix Duino
    client-side wake detection, audio comes from the client.
    """
    if not PRECISE_AVAILABLE:
        return None
    try:
        # Audio settings
        CHUNK = 1024
        FORMAT = pyaudio.paInt16
        CHANNELS = 1
        RATE = 16000
        p = pyaudio.PyAudio()
        # Open stream
        stream = p.open(
            format=FORMAT,
            channels=CHANNELS,
            rate=RATE,
            input=True,
            frames_per_buffer=CHUNK
        )
        print(f"Recording for {duration} seconds...")
        frames = []
        for _ in range(0, int(RATE / CHUNK * duration)):
            data = stream.read(CHUNK)
            frames.append(data)
        # Stop and close stream
        stream.stop_stream()
        stream.close()
        p.terminate()
        # Convert to WAV
        wav_buffer = io.BytesIO()
        with wave.open(wav_buffer, 'wb') as wf:
            wf.setnchannels(CHANNELS)
            wf.setsampwidth(p.get_sample_size(FORMAT))
            wf.setframerate(RATE)
            wf.writeframes(b''.join(frames))
        return wav_buffer.getvalue()
    except Exception as e:
        print(f"Error recording audio: {e}")
        return None
 import time  # Add this import at the top if not already there
 def execute_intent(intent: str, entity_id: str, params: Dict[str, Any]) -> str:
    """Execute an intent and return response text"""
    if intent == 'turn_on':
        success = ha_client.turn_on(entity_id)
        if success:
            entity_name = entity_id.split('.')[-1].replace('_', ' ')
            return f"Turned on {entity_name}"
        else:
            return "Sorry, I couldn't turn that on"
    elif intent == 'turn_off':
        success = ha_client.turn_off(entity_id)
        if success:
            entity_name = entity_id.split('.')[-1].replace('_', ' ')
            return f"Turned off {entity_name}"
        else:
            return "Sorry, I couldn't turn that off"
    elif intent == 'toggle':
        success = ha_client.toggle(entity_id)
        if success:
            entity_name = entity_id.split('.')[-1].replace('_', ' ')
            return f"Toggled {entity_name}"
        else:
            return "Sorry, I couldn't toggle that"
    elif intent in ['get_state', 'get_temperature']:
        state = ha_client.get_state(entity_id)
        if state:
            entity_name = entity_id.split('.')[-1].replace('_', ' ')
            value = state.get('state', 'unknown')
            unit = state.get('attributes', {}).get('unit_of_measurement', '')
            return f"The {entity_name} is {value} {unit}".strip()
        else:
            return "Sorry, I couldn't get that information"
    return "I didn't understand that command"
 # Flask routes
@app.route('/health', methods=['GET'])
 def health():
    """Health check endpoint"""
    return jsonify({
        'status': 'healthy',
        'whisper_loaded': whisper_model is not None,
        'ha_connected': ha_client is not None,
        'precise_enabled': precise_enabled,
        'precise_available': PRECISE_AVAILABLE
    })
@app.route('/wake-word/status', methods=['GET'])
 def wake_word_status():
    """Get wake word detection status"""
    return jsonify({
        'enabled': precise_enabled,
        'available': PRECISE_AVAILABLE,
        'model': DEFAULT_PRECISE_MODEL if precise_enabled else None,
        'sensitivity': DEFAULT_PRECISE_SENSITIVITY if precise_enabled else None
    })
@app.route('/wake-word/detections', methods=['GET'])
 def wake_word_detections():
    """
    Get recent wake word detections (non-blocking)
    Returns any wake word detections in the queue.
    Used for testing and monitoring.
    """
    detections = []
    try:
        while not wake_word_queue.empty():
            detections.append(wake_word_queue.get_nowait())
    except queue.Empty:
        pass
    return jsonify({
        'detections': detections,
        'count': len(detections)
    })
@app.route('/transcribe', methods=['POST'])
 def transcribe():
    """
    Transcribe audio file
    Expects: WAV audio file in request body
    Returns: JSON with transcribed text
    """
    if 'audio' not in request.files:
        raise BadRequest('No audio file provided')
    audio_file = request.files['audio']
    # Save to temporary file
    with tempfile.NamedTemporaryFile(suffix='.wav', delete=False) as temp_file:
        audio_file.save(temp_file.name)
        temp_path = temp_file.name
    try:
        # Transcribe
        text = transcribe_audio(temp_path)
        if text:
            return jsonify({
                'success': True,
                'text': text
            })
        else:
            return jsonify({
                'success': False,
                'error': 'Transcription failed'
            }), 500
    finally:
        # Clean up temp file
        if os.path.exists(temp_path):
            os.remove(temp_path)
@app.route('/process', methods=['POST'])
 def process():
    """
    Process complete voice command
    Expects: WAV audio file in request body
    Returns: JSON with response and audio file
    """
    if 'audio' not in request.files:
        raise BadRequest('No audio file provided')
    audio_file = request.files['audio']
    # Save to temporary file
    with tempfile.NamedTemporaryFile(suffix='.wav', delete=False) as temp_file:
        audio_file.save(temp_file.name)
        temp_path = temp_file.name
    try:
        # Step 1: Transcribe
        text = transcribe_audio(temp_path)
        if not text:
            return jsonify({
                'success': False,
                'error': 'Transcription failed'
            }), 500
        print(f"Transcribed: {text}")
        # Step 2: Parse intent
        parser = IntentParser()
        intent_result = parser.parse(text)
        if not intent_result:
            response_text = "I didn't understand that command"
        else:
            intent, entity_id, params = intent_result
            print(f"Intent: {intent}, Entity: {entity_id}")
            # Step 3: Execute intent
            response_text = execute_intent(intent, entity_id, params)
        print(f"Response: {response_text}")
        # Step 4: Generate TTS (placeholder for now)
        # audio_response = generate_tts(response_text)
        return jsonify({
            'success': True,
            'transcription': text,
            'response': response_text,
            # 'audio_available': audio_response is not None
        })
    finally:
        # Clean up temp file
        if os.path.exists(temp_path):
            os.remove(temp_path)
@app.route('/tts', methods=['POST'])
 def tts():
    """
    Generate TTS audio
    Expects: JSON with 'text' field
    Returns: WAV audio file
    """
    data = request.get_json()
    if not data or 'text' not in data:
        raise BadRequest('No text provided')
    text = data['text']
    # Generate TTS
    audio_data = generate_tts(text)
    if audio_data:
        return send_file(
            io.BytesIO(audio_data),
            mimetype='audio/wav',
            as_attachment=True,
            download_name='response.wav'
        )
    else:
        return jsonify({
            'success': False,
            'error': 'TTS generation not implemented yet'
        }), 501
 def main():
    parser = argparse.ArgumentParser(
        description="Voice Processing Server for Maix Duino Voice Assistant"
    )
    parser.add_argument('--host', default=DEFAULT_HOST,
                       help=f'Server host (default: {DEFAULT_HOST})')
    parser.add_argument('--port', type=int, default=DEFAULT_PORT,
                       help=f'Server port (default: {DEFAULT_PORT})')
    parser.add_argument('--whisper-model', default=DEFAULT_WHISPER_MODEL,
                       help=f'Whisper model to use (default: {DEFAULT_WHISPER_MODEL})')
    parser.add_argument('--ha-url', default=DEFAULT_HA_URL,
                       help=f'Home Assistant URL (default: {DEFAULT_HA_URL})')
    parser.add_argument('--ha-token', default=DEFAULT_HA_TOKEN,
                       help='Home Assistant long-lived access token')
    parser.add_argument('--enable-precise', action='store_true',
                       help='Enable Mycroft Precise wake word detection')
    parser.add_argument('--precise-model', default=DEFAULT_PRECISE_MODEL,
                       help='Path to Precise .net model file')
    parser.add_argument('--precise-sensitivity', type=float, 
                       default=DEFAULT_PRECISE_SENSITIVITY,
                       help='Precise sensitivity threshold (0.0-1.0, default: 0.5)')
    parser.add_argument('--precise-engine', default=DEFAULT_PRECISE_ENGINE,
                       help=f'Path to precise-engine binary (default: {DEFAULT_PRECISE_ENGINE})')
    args = parser.parse_args()
    # Validate HA configuration
    if not args.ha_token:
        print("Warning: No Home Assistant token provided!")
        print("Set HA_TOKEN environment variable or use --ha-token")
        print("Commands will not execute without authentication.")
    # Initialize global clients
    global ha_client
    ha_client = HomeAssistantClient(args.ha_url, args.ha_token)
    # Load Whisper model
    print(f"Starting voice processing server on {args.host}:{args.port}")
    load_whisper_model(args.whisper_model)
    # Start Precise if enabled
    if args.enable_precise:
        if not PRECISE_AVAILABLE:
            print("Error: --enable-precise specified but Mycroft Precise not installed")
            print("Install with: pip install mycroft-precise pyaudio")
            sys.exit(1)
        if not args.precise_model:
            print("Error: --enable-precise requires --precise-model")
            sys.exit(1)
        print("\nStarting Mycroft Precise wake word detection...")
        precise_result = start_precise_listener(
            args.precise_model,
            args.precise_sensitivity,
            args.precise_engine
        )
        if not precise_result:
            print("Error: Failed to start Precise listener")
            sys.exit(1)
        print("\nWake word detection active!")
        print("The server will detect wake words and queue them for processing.")
        print("Use /wake-word/detections endpoint to check for detections.\n")
    # Start Flask server
    try:
        app.run(host=args.host, port=args.port, debug=False)
    except KeyboardInterrupt:
        print("\nShutting down...")
        if args.enable_precise:
            stop_precise_listener()
        sys.exit(0)
 if __name__ == '__main__':
    main()
--- a/scripts/voice_server_enhanced.py
+++ b/scripts/voice_server_enhanced.py
@ -0,0 +1,580 @@
 #!/usr/bin/env python3
 """
 Enhanced Voice Server with Multiple Wake Words and Speaker Identification
 Path: /home/alan/voice-assistant/voice_server_enhanced.py
 This enhanced version adds:
 - Multiple wake word support
 - Speaker identification using pyannote.audio
 - Per-user customization
 - Wake word-specific responses
 Usage:
    python3 voice_server_enhanced.py \
        --enable-precise \
        --multi-wake-word \
        --enable-speaker-id
 """
 import os
 import sys
 import json
 import argparse
 import tempfile
 import wave
 import io
 import re
 import threading
 import queue
 import time
 from pathlib import Path
 from typing import Optional, Dict, Any, Tuple, List
 import whisper
 import requests
 from flask import Flask, request, jsonify, send_file
 from werkzeug.exceptions import BadRequest
 try:
    from dotenv import load_dotenv
    load_dotenv()
 except ImportError:
    pass
 # Mycroft Precise
 PRECISE_AVAILABLE = False
 try:
    from precise_runner import PreciseEngine, PreciseRunner
    import pyaudio
    PRECISE_AVAILABLE = True
 except ImportError:
    print("Warning: Mycroft Precise not installed")
 # Speaker identification
 SPEAKER_ID_AVAILABLE = False
 try:
    from pyannote.audio import Inference
    from scipy.spatial.distance import cosine
    import numpy as np
    SPEAKER_ID_AVAILABLE = True
 except ImportError:
    print("Warning: Speaker ID not available. Install: pip install pyannote.audio scipy")
 # Configuration
 DEFAULT_HOST = "0.0.0.0"
 DEFAULT_PORT = 5000
 DEFAULT_WHISPER_MODEL = "medium"
 DEFAULT_HA_URL = os.getenv("HA_URL", "http://homeassistant.local:8123")
 DEFAULT_HA_TOKEN = os.getenv("HA_TOKEN", "")
 DEFAULT_PRECISE_ENGINE = "/usr/local/bin/precise-engine"
 DEFAULT_HF_TOKEN = os.getenv("HF_TOKEN", "")
 # Wake word configurations
 WAKE_WORD_CONFIGS = {
    'hey_mycroft': {
        'model': os.path.expanduser('~/precise-models/pretrained/hey-mycroft.net'),
        'sensitivity': 0.5,
        'response': 'Yes?',
        'enabled': True,
        'context': 'general'
    },
    'hey_computer': {
        'model': os.path.expanduser('~/precise-models/hey-computer/hey-computer.net'),
        'sensitivity': 0.5,
        'response': 'I\'m listening',
        'enabled': False,  # Disabled by default (requires training)
        'context': 'general'
    },
    'jarvis': {
        'model': os.path.expanduser('~/precise-models/jarvis/jarvis.net'),
        'sensitivity': 0.6,
        'response': 'At your service',
        'enabled': False,
        'context': 'personal'
    },
 }
 # Speaker profiles (stored in JSON file)
 SPEAKER_PROFILES_FILE = os.path.expanduser('~/voice-assistant/config/speaker_profiles.json')
 # Flask app
 app = Flask(__name__)
 app.config['MAX_CONTENT_LENGTH'] = 16 * 1024 * 1024
 # Global state
 whisper_model = None
 ha_client = None
 precise_runners = {}
 precise_enabled = False
 speaker_id_enabled = False
 speaker_inference = None
 speaker_profiles = {}
 wake_word_queue = queue.Queue()
 class HomeAssistantClient:
    """Client for Home Assistant API"""
    def __init__(self, base_url: str, token: str):
        self.base_url = base_url.rstrip('/')
        self.token = token
        self.session = requests.Session()
        self.session.headers.update({
            'Authorization': f'Bearer {token}',
            'Content-Type': 'application/json'
        })
    def get_state(self, entity_id: str) -> Optional[Dict[str, Any]]:
        try:
            response = self.session.get(f'{self.base_url}/api/states/{entity_id}')
            response.raise_for_status()
            return response.json()
        except requests.RequestException as e:
            print(f"Error getting state for {entity_id}: {e}")
            return None
    def call_service(self, domain: str, service: str, entity_id: str, **kwargs) -> bool:
        try:
            data = {'entity_id': entity_id}
            data.update(kwargs)
            response = self.session.post(
                f'{self.base_url}/api/services/{domain}/{service}',
                json=data
            )
            response.raise_for_status()
            return True
        except requests.RequestException as e:
            print(f"Error calling service {domain}.{service}: {e}")
            return False
    def turn_on(self, entity_id: str, **kwargs) -> bool:
        domain = entity_id.split('.')[0]
        return self.call_service(domain, 'turn_on', entity_id, **kwargs)
    def turn_off(self, entity_id: str, **kwargs) -> bool:
        domain = entity_id.split('.')[0]
        return self.call_service(domain, 'turn_off', entity_id, **kwargs)
 class SpeakerIdentification:
    """Speaker identification using pyannote.audio"""
    def __init__(self, hf_token: str):
        if not SPEAKER_ID_AVAILABLE:
            raise ImportError("Speaker ID dependencies not available")
        self.inference = Inference(
            "pyannote/embedding",
            use_auth_token=hf_token
        )
        self.profiles = {}
    def enroll_speaker(self, name: str, audio_file: str):
        """Enroll a speaker from audio file"""
        embedding = self.inference(audio_file)
        self.profiles[name] = {
            'embedding': embedding.tolist(),  # Convert to list for JSON
            'enrolled': time.time()
        }
        print(f"Enrolled speaker: {name}")
    def identify_speaker(self, audio_file: str, threshold: float = 0.7) -> Optional[str]:
        """Identify speaker from audio file"""
        if not self.profiles:
            return None
        unknown_embedding = self.inference(audio_file)
        best_match = None
        best_similarity = 0.0
        for name, profile in self.profiles.items():
            known_embedding = np.array(profile['embedding'])
            similarity = 1 - cosine(unknown_embedding, known_embedding)
            if similarity > best_similarity:
                best_similarity = similarity
                best_match = name
        if best_similarity >= threshold:
            return best_match
        return 'unknown'
    def load_profiles(self, filepath: str):
        """Load speaker profiles from JSON"""
        if os.path.exists(filepath):
            with open(filepath, 'r') as f:
                self.profiles = json.load(f)
            print(f"Loaded {len(self.profiles)} speaker profiles")
    def save_profiles(self, filepath: str):
        """Save speaker profiles to JSON"""
        os.makedirs(os.path.dirname(filepath), exist_ok=True)
        with open(filepath, 'w') as f:
            json.dump(self.profiles, f, indent=2)
        print(f"Saved {len(self.profiles)} speaker profiles")
 def load_whisper_model(model_name: str = DEFAULT_WHISPER_MODEL):
    """Load Whisper model"""
    global whisper_model
    if whisper_model is None:
        print(f"Loading Whisper model: {model_name}")
        whisper_model = whisper.load_model(model_name)
        print("Whisper model loaded")
    return whisper_model
 def transcribe_audio(audio_file_path: str) -> Optional[str]:
    """Transcribe audio file"""
    try:
        model = load_whisper_model()
        result = model.transcribe(audio_file_path)
        return result['text'].strip()
    except Exception as e:
        print(f"Error transcribing: {e}")
        return None
 def on_wake_word_detected(wake_word_name: str):
    """Callback factory for wake word detection"""
    def callback():
        config = WAKE_WORD_CONFIGS.get(wake_word_name, {})
        print(f"Wake word detected: {wake_word_name}")
        wake_word_queue.put({
            'timestamp': time.time(),
            'wake_word': wake_word_name,
            'response': config.get('response', 'Yes?'),
            'context': config.get('context', 'general')
        })
    return callback
 def start_multiple_wake_words(configs: Dict[str, Dict], engine_path: str):
    """Start multiple Precise wake word listeners"""
    global precise_runners, precise_enabled
    if not PRECISE_AVAILABLE:
        print("Error: Precise not available")
        return False
    active_count = 0
    for name, config in configs.items():
        if not config.get('enabled', False):
            continue
        model_path = config['model']
        if not os.path.exists(model_path):
            print(f"Warning: Model not found: {model_path} (skipping {name})")
            continue
        try:
            engine = PreciseEngine(engine_path, model_path)
            runner = PreciseRunner(
                engine,
                sensitivity=config.get('sensitivity', 0.5),
                on_activation=on_wake_word_detected(name)
            )
            runner.start()
            precise_runners[name] = runner
            active_count += 1
            print(f"✓ Started wake word: {name}")
            print(f"  Model: {model_path}")
            print(f"  Sensitivity: {config.get('sensitivity', 0.5)}")
        except Exception as e:
            print(f"✗ Failed to start {name}: {e}")
    if active_count > 0:
        precise_enabled = True
        print(f"\nTotal active wake words: {active_count}")
        return True
    return False
 def stop_all_wake_words():
    """Stop all wake word listeners"""
    global precise_runners, precise_enabled
    for name, runner in precise_runners.items():
        try:
            runner.stop()
            print(f"Stopped wake word: {name}")
        except Exception as e:
            print(f"Error stopping {name}: {e}")
    precise_runners = {}
    precise_enabled = False
 def init_speaker_identification(hf_token: str) -> Optional[SpeakerIdentification]:
    """Initialize speaker identification"""
    global speaker_inference, speaker_id_enabled
    if not SPEAKER_ID_AVAILABLE:
        print("Speaker ID not available")
        return None
    try:
        speaker_inference = SpeakerIdentification(hf_token)
        # Load existing profiles
        if os.path.exists(SPEAKER_PROFILES_FILE):
            speaker_inference.load_profiles(SPEAKER_PROFILES_FILE)
        speaker_id_enabled = True
        print("Speaker identification initialized")
        return speaker_inference
    except Exception as e:
        print(f"Error initializing speaker ID: {e}")
        return None
 # Flask routes
@app.route('/health', methods=['GET'])
 def health():
    """Health check"""
    return jsonify({
        'status': 'healthy',
        'whisper_loaded': whisper_model is not None,
        'ha_connected': ha_client is not None,
        'precise_enabled': precise_enabled,
        'active_wake_words': list(precise_runners.keys()),
        'speaker_id_enabled': speaker_id_enabled,
        'enrolled_speakers': list(speaker_inference.profiles.keys()) if speaker_inference else []
    })
@app.route('/wake-words', methods=['GET'])
 def list_wake_words():
    """List all configured wake words"""
    wake_words = []
    for name, config in WAKE_WORD_CONFIGS.items():
        wake_words.append({
            'name': name,
            'enabled': config.get('enabled', False),
            'active': name in precise_runners,
            'model': config['model'],
            'sensitivity': config.get('sensitivity', 0.5),
            'response': config.get('response', ''),
            'context': config.get('context', 'general')
        })
    return jsonify({
        'wake_words': wake_words,
        'total': len(wake_words),
        'active': len(precise_runners)
    })
@app.route('/wake-words/<name>/enable', methods=['POST'])
 def enable_wake_word(name):
    """Enable a wake word"""
    if name not in WAKE_WORD_CONFIGS:
        return jsonify({'error': 'Wake word not found'}), 404
    config = WAKE_WORD_CONFIGS[name]
    config['enabled'] = True
    # Start the wake word if not already running
    if name not in precise_runners:
        # Restart all wake words to pick up changes
        # (simpler than starting individual ones)
        return jsonify({
            'message': f'Enabled {name}. Restart server to activate.'
        })
    return jsonify({'message': f'Wake word {name} enabled'})
@app.route('/speakers/enroll', methods=['POST'])
 def enroll_speaker():
    """Enroll a new speaker"""
    if not speaker_id_enabled or not speaker_inference:
        return jsonify({'error': 'Speaker ID not enabled'}), 400
    if 'audio' not in request.files:
        return jsonify({'error': 'No audio file'}), 400
    name = request.form.get('name')
    if not name:
        return jsonify({'error': 'No speaker name provided'}), 400
    audio_file = request.files['audio']
    # Save temporarily
    with tempfile.NamedTemporaryFile(suffix='.wav', delete=False) as temp:
        audio_file.save(temp.name)
        temp_path = temp.name
    try:
        speaker_inference.enroll_speaker(name, temp_path)
        speaker_inference.save_profiles(SPEAKER_PROFILES_FILE)
        return jsonify({
            'message': f'Enrolled speaker: {name}',
            'total_speakers': len(speaker_inference.profiles)
        })
    except Exception as e:
        return jsonify({'error': str(e)}), 500
    finally:
        if os.path.exists(temp_path):
            os.remove(temp_path)
@app.route('/speakers', methods=['GET'])
 def list_speakers():
    """List enrolled speakers"""
    if not speaker_id_enabled or not speaker_inference:
        return jsonify({'error': 'Speaker ID not enabled'}), 400
    speakers = []
    for name, profile in speaker_inference.profiles.items():
        speakers.append({
            'name': name,
            'enrolled': profile.get('enrolled', 0)
        })
    return jsonify({
        'speakers': speakers,
        'total': len(speakers)
    })
@app.route('/process-enhanced', methods=['POST'])
 def process_enhanced():
    """
    Enhanced processing with speaker ID and wake word context
    """
    if 'audio' not in request.files:
        return jsonify({'error': 'No audio file'}), 400
    wake_word = request.form.get('wake_word', 'unknown')
    audio_file = request.files['audio']
    with tempfile.NamedTemporaryFile(suffix='.wav', delete=False) as temp:
        audio_file.save(temp.name)
        temp_path = temp.name
    try:
        # Identify speaker (if enabled)
        speaker = 'unknown'
        if speaker_id_enabled and speaker_inference:
            speaker = speaker_inference.identify_speaker(temp_path)
            print(f"Identified speaker: {speaker}")
        # Transcribe
        text = transcribe_audio(temp_path)
        if not text:
            return jsonify({'error': 'Transcription failed'}), 500
        print(f"[{speaker}] via [{wake_word}]: {text}")
        # Get wake word config
        config = WAKE_WORD_CONFIGS.get(wake_word, {})
        context = config.get('context', 'general')
        # Process based on context and speaker
        response = f"Heard via {wake_word}: {text}"
        return jsonify({
            'success': True,
            'transcription': text,
            'speaker': speaker,
            'wake_word': wake_word,
            'context': context,
            'response': response
        })
    finally:
        if os.path.exists(temp_path):
            os.remove(temp_path)
 def main():
    parser = argparse.ArgumentParser(
        description="Enhanced Voice Server with Multi-Wake-Word and Speaker ID"
    )
    parser.add_argument('--host', default=DEFAULT_HOST)
    parser.add_argument('--port', type=int, default=DEFAULT_PORT)
    parser.add_argument('--whisper-model', default=DEFAULT_WHISPER_MODEL)
    parser.add_argument('--ha-url', default=DEFAULT_HA_URL)
    parser.add_argument('--ha-token', default=DEFAULT_HA_TOKEN)
    parser.add_argument('--enable-precise', action='store_true',
                       help='Enable wake word detection')
    parser.add_argument('--multi-wake-word', action='store_true',
                       help='Enable multiple wake words')
    parser.add_argument('--precise-engine', default=DEFAULT_PRECISE_ENGINE)
    parser.add_argument('--enable-speaker-id', action='store_true',
                       help='Enable speaker identification')
    parser.add_argument('--hf-token', default=DEFAULT_HF_TOKEN,
                       help='HuggingFace token for speaker ID')
    args = parser.parse_args()
    # Initialize HA client
    global ha_client
    ha_client = HomeAssistantClient(args.ha_url, args.ha_token)
    # Load Whisper
    print(f"Starting enhanced voice server on {args.host}:{args.port}")
    load_whisper_model(args.whisper_model)
    # Start Precise (multiple wake words)
    if args.enable_precise:
        if not PRECISE_AVAILABLE:
            print("Error: Precise not available")
            sys.exit(1)
        # Enable all or just first wake word
        if args.multi_wake_word:
            # Enable all configured wake words
            enabled_count = sum(1 for c in WAKE_WORD_CONFIGS.values() if c.get('enabled'))
            print(f"\nStarting {enabled_count} wake words...")
        else:
            # Enable only first wake word
            first_key = list(WAKE_WORD_CONFIGS.keys())[0]
            WAKE_WORD_CONFIGS[first_key]['enabled'] = True
            for key in list(WAKE_WORD_CONFIGS.keys())[1:]:
                WAKE_WORD_CONFIGS[key]['enabled'] = False
        if not start_multiple_wake_words(WAKE_WORD_CONFIGS, args.precise_engine):
            print("Error: No wake words started")
            sys.exit(1)
    # Initialize speaker ID
    if args.enable_speaker_id:
        if not args.hf_token:
            print("Error: --hf-token required for speaker ID")
            sys.exit(1)
        if not init_speaker_identification(args.hf_token):
            print("Warning: Speaker ID initialization failed")
    # Start server
    try:
        print("\n" + "="*50)
        print("Server ready!")
        print("="*50 + "\n")
        app.run(host=args.host, port=args.port, debug=False)
    except KeyboardInterrupt:
        print("\nShutting down...")
        stop_all_wake_words()
        sys.exit(0)
 if __name__ == '__main__':
    main()