# Mycroft Precise Wake Word Training Guide ## Overview Mycroft Precise is a neural network-based wake word detector that you can train on custom wake words. This guide covers two deployment approaches for your Maix Duino voice assistant: 1. **Server-side detection** (Recommended to start) - Run Precise on Heimdall 2. **Edge detection** (Advanced) - Convert model for K210 on Maix Duino ## Architecture Options ### Option A: Server-Side Wake Word Detection (Recommended) ``` Maix Duino Heimdall ┌─────────────────┐ ┌──────────────────────┐ │ Continuous │ Audio Stream │ Mycroft Precise │ │ Audio Capture │───────────────>│ Wake Word Detection │ │ │ │ │ │ LED Feedback │<───────────────│ Whisper STT │ │ Speaker Output │ Response │ HA Integration │ │ │ │ Piper TTS │ └─────────────────┘ └──────────────────────┘ ``` **Pros:** - Easier setup and debugging - Better accuracy (more compute available) - Easy to retrain and update models - Can use ensemble models **Cons:** - Continuous audio streaming (bandwidth) - Slightly higher latency (~100-200ms) - Requires stable network ### Option B: Edge Detection on Maix Duino (Advanced) ``` Maix Duino Heimdall ┌─────────────────┐ ┌──────────────────────┐ │ Precise Model │ │ │ │ (K210 KPU) │ │ │ │ Wake Detection │ Audio (on wake)│ Whisper STT │ │ │───────────────>│ HA Integration │ │ Audio Capture │ │ Piper TTS │ │ LED Feedback │<───────────────│ │ └─────────────────┘ Response └──────────────────────┘ ``` **Pros:** - Lower latency (~50ms wake detection) - Less network traffic - Works even if server is down - Better privacy (no continuous streaming) **Cons:** - Complex model conversion (TensorFlow → ONNX → KMODEL) - Limited by K210 compute - Harder to update models - Requires careful optimization ## Recommended Approach: Start with Server-Side Begin with server-side detection on Heimdall, then optimize to edge detection once everything works. ## Phase 1: Mycroft Precise Setup on Heimdall ### Install Mycroft Precise ```bash # SSH to Heimdall ssh alan@10.1.10.71 # Create conda environment for Precise conda create -n precise python=3.7 -y conda activate precise # Install TensorFlow 1.x (Precise requires this) pip install tensorflow==1.15.5 --break-system-packages # Install Precise pip install mycroft-precise --break-system-packages # Install audio dependencies sudo apt-get install -y portaudio19-dev sox libatlas-base-dev # Install precise-engine (for faster inference) wget https://github.com/MycroftAI/mycroft-precise/releases/download/v0.3.0/precise-engine_0.3.0_x86_64.tar.gz tar xvf precise-engine_0.3.0_x86_64.tar.gz sudo cp precise-engine/precise-engine /usr/local/bin/ sudo chmod +x /usr/local/bin/precise-engine ``` ### Verify Installation ```bash precise-engine --version # Should output: Precise v0.3.0 precise-listen --help # Should show help text ``` ## Phase 2: Training Your Custom Wake Word ### Step 1: Collect Wake Word Samples You'll need ~50-100 samples of your wake word. Choose something: - 2-3 syllables long - Easy to pronounce - Unlikely to occur in normal speech Example wake words: - "Hey Computer" (recommended - similar to commercial products) - "Okay Jarvis" - "Hello Assistant" - "Activate Assistant" ```bash # Create project directory mkdir -p ~/precise-models/hey-computer cd ~/precise-models/hey-computer # Record wake word samples precise-collect ``` When prompted: 1. Type your wake word ("hey computer") 2. Press SPACE to record 3. Say the wake word clearly 4. Press SPACE to stop 5. Repeat 50-100 times **Tips for good samples:** - Vary your tone and speed - Different distances from mic - Different background noise levels - Different pronunciations - Have family members record too ### Step 2: Collect "Not Wake Word" Samples Record background audio and similar-sounding phrases: ```bash # Create not-wake-word directory mkdir -p not-wake-word # Record random speech, music, TV, etc. # These help the model learn what NOT to trigger on precise-collect -f not-wake-word/random.wav ``` Collect ~200-500 samples of: - Normal conversation - TV/music in background - Similar sounding phrases ("hey commuter", "they computed", etc.) - Ambient noise - Other household sounds ### Step 3: Generate Training Data ```bash # Organize samples mkdir -p hey-computer/{wake-word,not-wake-word,test/wake-word,test/not-wake-word} # Split samples (80% train, 20% test) # Move 80% of wake-word samples to hey-computer/wake-word/ # Move 20% to hey-computer/test/wake-word/ # Move 80% of not-wake-word to hey-computer/not-wake-word/ # Move 20% to hey-computer/test/not-wake-word/ # Generate training data precise-train-incremental hey-computer.net hey-computer/ ``` ### Step 4: Train the Model ```bash # Basic training (will take 30-60 minutes) precise-train -e 60 hey-computer.net hey-computer/ # For better accuracy, train longer precise-train -e 120 hey-computer.net hey-computer/ # Watch for overfitting - validation loss should decrease # Stop if validation loss starts increasing ``` Training output will show: ``` Epoch 1/60 loss: 0.4523 - val_loss: 0.3891 Epoch 2/60 loss: 0.3102 - val_loss: 0.2845 ... ``` ### Step 5: Test the Model ```bash # Test with microphone precise-listen hey-computer.net # Speak your wake word - should see "!" when detected # Speak other phrases - should not trigger # Test with audio files precise-test hey-computer.net hey-computer/test/ # Should show accuracy metrics: # Wake word accuracy: 95%+ # False positive rate: <5% ``` ### Step 6: Optimize Sensitivity ```bash # Adjust activation threshold precise-listen hey-computer.net -t 0.5 # Default precise-listen hey-computer.net -t 0.7 # More conservative precise-listen hey-computer.net -t 0.3 # More aggressive # Find optimal threshold for your use case # Higher = fewer false positives, more false negatives # Lower = more false positives, fewer false negatives ``` ## Phase 3: Integration with Voice Server ### Update voice_server.py Add Mycroft Precise support to the server: ```python # Add to imports from precise_runner import PreciseEngine, PreciseRunner import pyaudio # Add to configuration PRECISE_MODEL = os.getenv("PRECISE_MODEL", "/home/alan/precise-models/hey-computer.net") PRECISE_SENSITIVITY = float(os.getenv("PRECISE_SENSITIVITY", "0.5")) # Global precise runner precise_runner = None def on_activation(): """Called when wake word is detected""" print("Wake word detected!") # Trigger recording and processing # (Implementation depends on your audio streaming setup) def start_precise_listener(): """Start Mycroft Precise wake word detection""" global precise_runner engine = PreciseEngine( '/usr/local/bin/precise-engine', PRECISE_MODEL ) precise_runner = PreciseRunner( engine, sensitivity=PRECISE_SENSITIVITY, on_activation=on_activation ) precise_runner.start() print(f"Precise listening with model: {PRECISE_MODEL}") ``` ### Server-Side Wake Word Detection Architecture For server-side detection, you need continuous audio streaming from Maix Duino: ```python # New endpoint for audio streaming @app.route('/stream', methods=['POST']) def stream_audio(): """ Receive continuous audio stream for wake word detection This endpoint processes incoming audio chunks and runs them through Mycroft Precise for wake word detection. """ # Implementation here pass ``` ## Phase 4: Maix Duino Integration (Server-Side Detection) ### Update maix_voice_client.py For server-side detection, stream audio continuously: ```python # Add to configuration STREAM_ENDPOINT = "/stream" WAKE_WORD_CHECK_INTERVAL = 0.1 # Check every 100ms def stream_audio_continuous(): """ Stream audio to server for wake word detection Server will notify us when wake word is detected """ import socket import struct # Create socket connection sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) server_addr = (VOICE_SERVER_URL.replace('http://', '').split(':')[0], 8888) try: sock.connect(server_addr) print("Connected to wake word server") while True: # Capture audio chunk chunk = i2s_dev.record(CHUNK_SIZE) if chunk: # Send chunk size first, then chunk sock.sendall(struct.pack('>I', len(chunk))) sock.sendall(chunk) # Check for wake word detection signal # (simplified - actual implementation needs non-blocking socket) time.sleep(0.01) except Exception as e: print(f"Streaming error: {e}") finally: sock.close() ``` ## Phase 5: Edge Detection on Maix Duino (Advanced) ### Convert Precise Model to KMODEL This is complex and requires several conversion steps: ```bash # Step 1: Convert TensorFlow model to ONNX pip install tf2onnx --break-system-packages python -m tf2onnx.convert \ --saved-model hey-computer.net \ --output hey-computer.onnx # Step 2: Optimize ONNX model pip install onnx --break-system-packages python -c " import onnx from onnx import optimizer model = onnx.load('hey-computer.onnx') passes = ['eliminate_deadend', 'eliminate_identity', 'eliminate_nop_dropout', 'eliminate_nop_pad'] optimized = optimizer.optimize(model, passes) onnx.save(optimized, 'hey-computer-opt.onnx') " # Step 3: Convert ONNX to KMODEL (for K210) # Use nncase (https://github.com/kendryte/nncase) # This step is hardware-specific and complex # Install nncase pip install nncase --break-system-packages # Convert (adjust parameters based on your model) ncc compile hey-computer-opt.onnx \ -i onnx \ --dataset calibration_data \ -o hey-computer.kmodel \ --target k210 ``` **Note:** KMODEL conversion is non-trivial and may require model architecture adjustments. The K210 has limitations: - Max model size: ~6MB - Limited operators support - Quantization required for performance ### Testing KMODEL on Maix Duino ```python # Load model in maix_voice_client.py import KPU as kpu def load_wake_word_model_kmodel(): """Load converted KMODEL for wake word detection""" global kpu_task try: kpu_task = kpu.load("/sd/models/hey-computer.kmodel") print("Wake word model loaded on K210") return True except Exception as e: print(f"Failed to load model: {e}") return False def detect_wake_word_kmodel(): """Run wake word detection using K210 KPU""" global kpu_task # Capture audio audio_chunk = i2s_dev.record(CHUNK_SIZE) # Preprocess for model (depends on model input format) # This is model-specific - adjust based on your training # Run inference features = preprocess_audio(audio_chunk) output = kpu.run_yolo2(kpu_task, features) # Adjust based on model type # Check confidence if output[0] > WAKE_WORD_THRESHOLD: return True return False ``` ## Recommended Wake Words Based on testing and community feedback: **Best performers:** 1. "Hey Computer" - Clear, distinct, 2-syllable, hard consonants 2. "Okay Jarvis" - Pop culture reference, easy to say 3. "Hey Mycroft" - Original Mycroft wake word (lots of training data available) **Avoid:** - Single syllable words (too easy to trigger) - Common phrases ("okay", "hey there") - Names of people in your household - Words that sound like common speech patterns ## Training Tips ### For Best Accuracy 1. **Diverse training data:** - Multiple speakers - Various distances (1ft to 15ft) - Different noise conditions - Accent variations 2. **Quality over quantity:** - 50 good samples > 200 poor samples - Clear pronunciation - Consistent volume 3. **Hard negatives:** - Include similar-sounding phrases - Include partial wake words - Include common false triggers you notice 4. **Regular retraining:** - Add false positives to training set - Add missed detections - Retrain every few weeks initially ### Collecting Hard Negatives ```bash # Run Precise in test mode and collect false positives precise-listen hey-computer.net --save-false-positives # This will save audio clips when model triggers incorrectly # Add these to your not-wake-word training set # Retrain to reduce false positives ``` ## Performance Benchmarks ### Server-Side Detection (Heimdall) - **Latency:** 100-200ms from utterance to detection - **Accuracy:** 95%+ with good training - **False positive rate:** <1 per hour with tuning - **CPU usage:** ~5-10% (single core) - **Network:** ~128kbps continuous stream ### Edge Detection (Maix Duino) - **Latency:** 50-100ms - **Accuracy:** 80-90% (limited by K210 quantization) - **False positive rate:** Varies by model optimization - **CPU usage:** ~30% K210 (leaves room for other tasks) - **Network:** 0 until wake detected ## Monitoring and Debugging ### Log Wake Word Detections ```python # Add to voice_server.py import datetime def log_wake_word(confidence, timestamp=None): """Log wake word detections for analysis""" if timestamp is None: timestamp = datetime.datetime.now() log_file = "/home/alan/voice-assistant/logs/wake_words.log" with open(log_file, 'a') as f: f.write(f"{timestamp.isoformat()},{confidence}\n") ``` ### Analyze False Positives ```bash # Check wake word log tail -f ~/voice-assistant/logs/wake_words.log # Find patterns in false positives grep "wake_word" ~/voice-assistant/logs/wake_words.log | \ awk -F',' '{print $2}' | \ sort -n | uniq -c ``` ## Production Deployment ### Systemd Service with Precise Update the systemd service to include Precise: ```ini [Unit] Description=Voice Assistant with Wake Word Detection After=network.target [Service] Type=simple User=alan WorkingDirectory=/home/alan/voice-assistant Environment="PATH=/home/alan/miniconda3/envs/precise/bin:/usr/local/bin:/usr/bin:/bin" EnvironmentFile=/home/alan/voice-assistant/config/.env ExecStart=/home/alan/miniconda3/envs/precise/bin/python voice_server.py --enable-precise Restart=on-failure RestartSec=10 [Install] WantedBy=multi-user.target ``` ## Troubleshooting ### Precise Won't Start ```bash # Check TensorFlow version python -c "import tensorflow as tf; print(tf.__version__)" # Should be 1.15.x # Check model file file hey-computer.net # Should be "TensorFlow SavedModel" # Test model directly precise-engine hey-computer.net # Should load without errors ``` ### Low Accuracy 1. **Collect more training data** - Especially hard negatives 2. **Increase training epochs** - Try 200-300 epochs 3. **Verify training/test split** - Should be 80/20 4. **Check audio quality** - Sample rate should match (16kHz) 5. **Try different wake words** - Some are easier to detect ### High False Positive Rate 1. **Increase threshold** - Try 0.6, 0.7, 0.8 2. **Add false positives to training** - Retrain with false triggers 3. **Collect more negative samples** - Expand not-wake-word set 4. **Use ensemble models** - Run multiple models, require agreement ### KMODEL Conversion Fails This is expected - K210 conversion is complex: 1. **Simplify model architecture** - Reduce layer count 2. **Use quantization-aware training** - Train with quantization in mind 3. **Check operator support** - K210 doesn't support all TF ops 4. **Consider alternatives:** - Use pre-trained models for K210 - Stick with server-side detection - Use Porcupine instead (has K210 support) ## Alternative: Use Pre-trained Models Mycroft provides some pre-trained models: ```bash # Download Hey Mycroft model wget https://github.com/MycroftAI/precise-data/raw/models-dev/hey-mycroft.tar.gz tar xzf hey-mycroft.tar.gz # Test it precise-listen hey-mycroft.net ``` Then train your own wake word starting from this base: ```bash # Fine-tune from pre-trained model precise-train -e 60 my-wake-word.net my-wake-word/ \ --from-checkpoint hey-mycroft.net ``` ## Next Steps 1. **Start with server-side** - Get it working on Heimdall first 2. **Collect good training data** - Quality samples are key 3. **Test and tune threshold** - Find the sweet spot for your environment 4. **Monitor performance** - Track false positives and misses 5. **Iterate on training** - Add hard examples, retrain 6. **Consider edge deployment** - Once server-side is solid ## Resources - Mycroft Precise Docs: https://github.com/MycroftAI/mycroft-precise - Training Guide: https://mycroft-ai.gitbook.io/docs/mycroft-technologies/precise - Community Models: https://github.com/MycroftAI/precise-data - K210 Docs: https://canaan-creative.com/developer - nncase: https://github.com/kendryte/nncase ## Conclusion Mycroft Precise gives you full control over your wake word detection with complete privacy. Start with server-side detection for easier development, then optimize to edge detection once you have a well-trained model. The key to success is good training data - invest time in collecting diverse, high-quality samples!