# Wake Word Models: Pre-trained, Multiple, and Voice Adaptation ## Pre-trained Wake Word Models ### Yes! "Hey Mycroft" Already Exists Mycroft provides several pre-trained models that you can use immediately: #### Available Pre-trained Models **Hey Mycroft** (Official) ```bash # Download from Mycroft's model repository cd ~/precise-models/pretrained wget https://github.com/MycroftAI/precise-data/raw/models-dev/hey-mycroft.tar.gz tar xzf hey-mycroft.tar.gz # Test immediately conda activate precise precise-listen hey-mycroft.net # Should detect "Hey Mycroft" right away! ``` **Other Available Models:** - **Hey Mycroft** - Best tested, most reliable - **Christopher** - Alternative wake word - **Hey Jarvis** - Community contributed - **Computer** - Star Trek style #### Using Pre-trained Models **Option 1: Use as-is** ```bash # Just point your server to the pre-trained model python voice_server.py \ --enable-precise \ --precise-model ~/precise-models/pretrained/hey-mycroft.net \ --precise-sensitivity 0.5 ``` **Option 2: Fine-tune for your voice** ```bash # Use pre-trained as starting point, add your samples cd ~/precise-models/my-hey-mycroft # Record additional samples precise-collect # Train from checkpoint (much faster than from scratch!) precise-train -e 30 my-hey-mycroft.net . \ --from-checkpoint ~/precise-models/pretrained/hey-mycroft.net # This adds your voice/environment while keeping the base model ``` **Option 3: Ensemble with custom** ```python # Use both pre-trained and custom model # Require both to agree (reduces false positives) # See implementation below ``` ### Advantages of Pre-trained Models ✅ **Instant deployment** - No training required ✅ **Proven accuracy** - Tested by thousands of users ✅ **Good starting point** - Fine-tune rather than train from scratch ✅ **Multiple speakers** - Already includes diverse voices ✅ **Save time** - Skip 1-2 hours of training ### Disadvantages ❌ **Generic** - Not optimized for your voice/environment ❌ **May need tuning** - Threshold adjustment required ❌ **Limited choice** - Only a few wake words available ### Recommendation **Start with "Hey Mycroft"** pre-trained model: 1. Deploy immediately (zero training time) 2. Test in your environment 3. Collect false positives/negatives 4. Fine-tune with your examples 5. Best of both worlds! ## Multiple Wake Words ### Can You Have Multiple Wake Words? **Short answer:** Yes, but with tradeoffs. ### Implementation Approaches #### Approach 1: Server-Side Multiple Models (Recommended) Run multiple Precise models in parallel on Heimdall: ```python # In voice_server.py from precise_runner import PreciseEngine, PreciseRunner # Global runners for each wake word precise_runners = {} wake_word_configs = { 'hey_mycroft': { 'model': '~/precise-models/pretrained/hey-mycroft.net', 'sensitivity': 0.5, 'response': 'Yes?' }, 'hey_computer': { 'model': '~/precise-models/hey-computer/hey-computer.net', 'sensitivity': 0.5, 'response': 'I\'m listening' }, 'jarvis': { 'model': '~/precise-models/jarvis/jarvis.net', 'sensitivity': 0.6, 'response': 'At your service, sir' } } def on_wake_word_detected(wake_word_name): """Callback with wake word identifier""" def callback(): print(f"Wake word detected: {wake_word_name}") wake_word_queue.put({ 'timestamp': time.time(), 'wake_word': wake_word_name, 'response': wake_word_configs[wake_word_name]['response'] }) return callback def start_multiple_wake_words(): """Start multiple Precise listeners""" for name, config in wake_word_configs.items(): engine = PreciseEngine( '/usr/local/bin/precise-engine', os.path.expanduser(config['model']) ) runner = PreciseRunner( engine, sensitivity=config['sensitivity'], on_activation=on_wake_word_detected(name) ) runner.start() precise_runners[name] = runner print(f"Started wake word listener: {name}") ``` **Resource Usage:** - CPU: ~5-10% per model (3 models = ~15-30%) - RAM: ~100-200MB per model - Still very manageable on Heimdall **Pros:** ✅ Different wake words for different purposes ✅ Family members can choose preferred wake word ✅ Context-aware responses ✅ Easy to add/remove models **Cons:** ❌ Higher CPU usage (scales linearly) ❌ Increased false positive risk (3x models = 3x chance) ❌ More complex configuration #### Approach 2: Edge Multiple Models (K210) **Challenge:** K210 has limited resources **Option A: Sequential checking** (Feasible) ```python # Check each model in sequence models = ['hey-mycroft.kmodel', 'hey-computer.kmodel'] for model in models: kpu_task = kpu.load(f"/sd/models/{model}") result = kpu.run(kpu_task, audio_features) if result > threshold: return model # Wake word detected ``` **Resource impact:** - Latency: +50-100ms per additional model - Memory: Models must fit in 6MB total - CPU: ~30% per model check **Option B: Combined model** (Advanced) ```python # Train a single model that recognizes multiple phrases # Each phrase maps to different output class # More complex training but single inference ``` **Recommendation for edge:** - **1-2 wake words max** on K210 - **Server-side** for 3+ wake words #### Approach 3: Contextual Wake Words Different wake words trigger different behaviors: ```python wake_word_contexts = { 'hey_mycroft': 'general', # General commands 'hey_assistant': 'general', # Alternative general 'emergency': 'priority', # High priority 'goodnight': 'bedtime', # Bedtime routine } def handle_wake_word(wake_word, command): context = wake_word_contexts[wake_word] if context == 'priority': # Skip queue, process immediately # Maybe call emergency contact pass elif context == 'bedtime': # Trigger bedtime automation # Lower volume for responses pass else: # Normal processing pass ``` ### Best Practices for Multiple Wake Words 1. **Start with one** - Get it working well first 2. **Add gradually** - One at a time, test thoroughly 3. **Different purposes** - Each wake word should have a reason 4. **Monitor performance** - Track false positives per wake word 5. **User preference** - Let family members choose their favorite ### Recommended Configuration **For most users:** ```python wake_words = { 'hey_mycroft': 'primary', # Main wake word (pre-trained) 'hey_computer': 'alternative' # Custom trained for your voice } ``` **For power users:** ```python wake_words = { 'hey_mycroft': 'general', 'jarvis': 'personal_assistant', # Custom responses 'computer': 'technical_queries', # Different intent parser } ``` **For families:** ```python wake_words = { 'hey_mycroft': 'shared', # Everyone can use 'dad': 'user_alan', # Personalized 'mom': 'user_sarah', # Personalized 'kids': 'user_children', # Kid-safe responses } ``` ## Voice Adaptation and Multi-User Support ### Challenge: Different Voices, Same Wake Word When multiple people use the system: - Different accents - Different speech patterns - Different pronunciations - Different vocal characteristics ### Solution Approaches #### Approach 1: Diverse Training Data (Recommended) **During initial training:** ```bash # Have everyone in household record samples cd ~/precise-models/hey-computer # Alan records 30 samples precise-collect # Record as user 1 # Sarah records 30 samples precise-collect # Record as user 2 # Kids record 20 samples precise-collect # Record as user 3 # Combine all in training set # Train one model that works for everyone ./3-train-model.sh ``` **Pros:** ✅ Single model for everyone ✅ No user switching needed ✅ Simple to maintain ✅ Works immediately for all users **Cons:** ❌ May have lower per-person accuracy ❌ Requires upfront time from everyone ❌ Hard to add new users later #### Approach 2: Incremental Training Start with your voice, add others over time: ```bash # Week 1: Train with Alan's voice cd ~/precise-models/hey-computer # Record and train with Alan's samples precise-train -e 60 hey-computer.net . # Week 2: Sarah wants to use it # Collect Sarah's samples mkdir -p sarah-samples/wake-word precise-collect # Sarah records 20-30 samples # Add to existing training set cp sarah-samples/wake-word/* wake-word/ # Retrain (continue from checkpoint) precise-train -e 30 hey-computer.net . \ --from-checkpoint hey-computer.net # Now works for both Alan and Sarah! ``` **Pros:** ✅ Gradual improvement ✅ Don't need everyone upfront ✅ Easy to add new users ✅ Maintains accuracy for existing users **Cons:** ❌ May not work well for new users initially ❌ Requires retraining periodically #### Approach 3: Per-User Models with Speaker Identification Train separate models + identify who's speaking: **Step 1: Train per-user wake word models** ```bash # Alan's model ~/precise-models/hey-computer-alan/ # Sarah's model ~/precise-models/hey-computer-sarah/ # Kids' model ~/precise-models/hey-computer-kids/ ``` **Step 2: Use speaker identification** ```python # Pseudo-code for speaker identification def identify_speaker(audio): """ Identify speaker from voice characteristics Using speaker embeddings (x-vectors, d-vectors) """ # Extract speaker embedding embedding = speaker_encoder.encode(audio) # Compare to known users similarities = { 'alan': cosine_similarity(embedding, alan_embedding), 'sarah': cosine_similarity(embedding, sarah_embedding), 'kids': cosine_similarity(embedding, kids_embedding), } # Return most similar return max(similarities, key=similarities.get) def process_command(audio): # Detect wake word with all models wake_detected = check_all_models(audio) if wake_detected: # Identify speaker speaker = identify_speaker(audio) # Use speaker-specific model for better accuracy model = f'~/precise-models/hey-computer-{speaker}/' # Continue with speaker context process_with_context(audio, speaker) ``` **Speaker identification libraries:** - **Resemblyzer** - Simple speaker verification - **speechbrain** - Complete toolkit - **pyannote.audio** - You already use this for diarization! **Implementation:** ```bash # You already have pyannote for diarization! conda activate voice-assistant pip install pyannote.audio --break-system-packages # Can use speaker embeddings for identification ``` ```python from pyannote.audio import Inference # Load speaker embedding model inference = Inference( "pyannote/embedding", use_auth_token=hf_token ) # Extract embeddings for known users alan_embedding = inference("alan_voice_sample.wav") sarah_embedding = inference("sarah_voice_sample.wav") # Compare with incoming audio unknown_embedding = inference(audio_buffer) from scipy.spatial.distance import cosine alan_similarity = 1 - cosine(unknown_embedding, alan_embedding) sarah_similarity = 1 - cosine(unknown_embedding, sarah_embedding) if alan_similarity > 0.8: user = 'alan' elif sarah_similarity > 0.8: user = 'sarah' else: user = 'unknown' ``` **Pros:** ✅ Personalized responses per user ✅ Better accuracy (model optimized for each voice) ✅ User-specific preferences/permissions ✅ Can track who said what **Cons:** ❌ More complex setup ❌ Higher resource usage ❌ Requires voice samples from each user ❌ Privacy considerations #### Approach 4: Adaptive/Online Learning Model improves automatically based on usage: ```python class AdaptiveWakeWord: def __init__(self, base_model): self.base_model = base_model self.user_samples = [] self.retrain_threshold = 50 # Retrain after N samples def on_detection(self, audio, user_confirmed=True): """User confirms this was correct detection""" if user_confirmed: self.user_samples.append(audio) # Periodically retrain if len(self.user_samples) >= self.retrain_threshold: self.retrain_with_samples() self.user_samples = [] def retrain_with_samples(self): """Background retraining with collected samples""" # Add samples to training set # Retrain model # Swap in new model pass ``` **Pros:** ✅ Automatic improvement ✅ Adapts to user's voice over time ✅ No manual retraining ✅ Gets better with use **Cons:** ❌ Complex implementation ❌ Requires user feedback mechanism ❌ Risk of drift/degradation ❌ Background training overhead ## Recommended Strategy ### Phase 1: Single Wake Word, Single Model ```bash # Week 1-2 # Use pre-trained "Hey Mycroft" # OR train custom "Hey Computer" with all family members' voices # Keep it simple, get it working ``` ### Phase 2: Add Fine-tuning ```bash # Week 3-4 # Collect false positives/negatives # Retrain with household-specific data # Optimize threshold ``` ### Phase 3: Consider Multiple Wake Words ```bash # Month 2 # If needed, add second wake word # "Hey Mycroft" for general # "Jarvis" for personal assistant tasks ``` ### Phase 4: Personalization ```bash # Month 3+ # If desired, add speaker identification # Personalized responses # User-specific preferences ``` ## Practical Examples ### Example 1: Family of 4, Single Model ```bash # Training session with everyone cd ~/precise-models/hey-mycroft-family # Dad records 25 samples precise-collect # Mom records 25 samples precise-collect # Kid 1 records 15 samples precise-collect # Kid 2 records 15 samples precise-collect # Collect shared negative samples (200+) # TV, music, conversation, etc. precise-collect -f not-wake-word/household.wav # Train single model for everyone precise-train -e 60 hey-mycroft-family.net . # Deploy python voice_server.py \ --enable-precise \ --precise-model hey-mycroft-family.net ``` **Result:** Everyone can use it, one model, simple. ### Example 2: Two Wake Words, Different Purposes ```python # voice_server.py configuration wake_words = { 'hey_mycroft': { 'model': 'hey-mycroft.net', 'sensitivity': 0.5, 'intent_parser': 'general', # All commands 'response': 'Yes?' }, 'emergency': { 'model': 'emergency.net', 'sensitivity': 0.7, # Higher threshold 'intent_parser': 'emergency', # Limited commands 'response': 'Emergency mode activated' } } # "Hey Mycroft, turn on the lights" - works # "Emergency, call for help" - triggers emergency protocol ``` ### Example 3: Speaker Identification + Personalization ```python # Enhanced processing with speaker ID def process_with_speaker_id(audio, speaker): # Different HA entity based on speaker entity_maps = { 'alan': { 'bedroom_light': 'light.master_bedroom', 'office_light': 'light.alan_office', }, 'sarah': { 'bedroom_light': 'light.master_bedroom', 'office_light': 'light.sarah_office', }, 'kids': { 'bedroom_light': 'light.kids_bedroom', 'tv': None, # Kids can't control TV } } # Transcribe command text = whisper_transcribe(audio) # "Turn on bedroom light" if 'bedroom light' in text: entity = entity_maps[speaker]['bedroom_light'] ha_client.turn_on(entity) response = f"Turned on your bedroom light" return response ``` ## Resource Requirements ### Single Wake Word - **CPU:** 5-10% (Heimdall) - **RAM:** 100-200MB - **Model size:** 1-3MB - **Training time:** 30-60 min ### Multiple Wake Words (3 models) - **CPU:** 15-30% (Heimdall) - **RAM:** 300-600MB - **Model size:** 3-9MB total - **Training time:** 90-180 min ### With Speaker Identification - **CPU:** +5-10% for speaker ID - **RAM:** +200-300MB for embedding model - **Model size:** +50MB for speaker model - **Setup time:** +30-60 min for voice enrollment ### K210 Edge (Maix Duino) - **Single model:** Feasible, ~30% CPU - **2 models:** Feasible, ~60% CPU, higher latency - **3+ models:** Not recommended - **Speaker ID:** Not feasible (limited RAM/compute) ## Quick Decision Guide **Just getting started?** → Use pre-trained "Hey Mycroft" **Want custom wake word?** → Train one model with all family voices **Need multiple wake words?** → Start server-side with 2-3 models **Want personalization?** → Add speaker identification **Deploying to edge (K210)?** → Stick to 1-2 wake words maximum **Family of 4+ people?** → Train single model with everyone's voice **Privacy is paramount?** → Skip speaker ID, use single universal model ## Testing Multiple Wake Words ```bash # Test all wake words quickly conda activate precise # Terminal 1: Hey Mycroft precise-listen hey-mycroft.net # Terminal 2: Hey Computer precise-listen hey-computer.net # Terminal 3: Emergency precise-listen emergency.net # Say each wake word, verify correct detection ``` ## Conclusion ### For Your Maix Duino Project: **Recommended approach:** 1. **Start with "Hey Mycroft"** - Use pre-trained model 2. **Fine-tune if needed** - Add your household's voices 3. **Consider 2nd wake word** - Only if you have a specific use case 4. **Speaker ID** - Phase 2/3 enhancement, not critical for MVP 5. **Keep it simple** - One wake word works great for most users **The pre-trained "Hey Mycroft" model saves you 1-2 hours** and works immediately. You can always fine-tune or add custom wake words later! **Multiple wake words are cool but not necessary** - Most commercial products use just one. Focus on making one wake word work really well before adding more. **Voice adaptation** - Training with multiple voices upfront is simpler than per-user models. Save speaker ID for later if you need personalization. ## Quick Start with Pre-trained ```bash # On Heimdall cd ~/precise-models/pretrained wget https://github.com/MycroftAI/precise-data/raw/models-dev/hey-mycroft.tar.gz tar xzf hey-mycroft.tar.gz # Test it conda activate precise precise-listen hey-mycroft.net # Deploy cd ~/voice-assistant python voice_server.py \ --enable-precise \ --precise-model ~/precise-models/pretrained/hey-mycroft.net # You're done! No training needed! ``` **That's it - you have a working wake word in 5 minutes!** 🎉