pyr0ball 173f7f37d4 feat: import mycroft-precise work as Minerva foundation

Ports prior voice assistant research and prototypes from devl/Devops
into the Minerva repo. Includes:

- docs/: architecture, wake word guides, ESP32-S3 spec, hardware buying guide
- scripts/: voice_server.py, voice_server_enhanced.py, setup scripts
- hardware/maixduino/: edge device scripts with WiFi credentials scrubbed
  (replaced hardcoded password with secrets.py pattern)
- config/.env.example: server config template
- .gitignore: excludes .env, secrets.py, model blobs, ELF firmware
- CLAUDE.md: Minerva product context and connection to cf-voice roadmap

2026-04-06 22:21:12 -07:00

24 KiB

Executable file

Raw Permalink Blame History

Advanced Wake Word Topics - Pre-trained Models, Multiple Wake Words, and Voice Adaptation

Pre-trained Mycroft Models

Yes! Pre-trained Models Exist

Mycroft AI provides several pre-trained wake word models you can use immediately:

Available Models:

Hey Mycroft - Original Mycroft wake word (most training data)
Hey Jarvis - Popular alternative
Christopher - Alternative wake word
Hey Ezra - Another option

Download Pre-trained Models

# On Heimdall
conda activate precise
cd ~/precise-models

# Create directory for pre-trained models
mkdir -p pretrained
cd pretrained

# Download Hey Mycroft (recommended starting point)
wget https://github.com/MycroftAI/precise-data/raw/models-dev/hey-mycroft.tar.gz
tar xzf hey-mycroft.tar.gz

# Download other models
wget https://github.com/MycroftAI/precise-data/raw/models-dev/hey-jarvis.tar.gz
tar xzf hey-jarvis.tar.gz

# List available models
ls -lh *.net

Test Pre-trained Model

conda activate precise

# Test Hey Mycroft
precise-listen hey-mycroft.net

# Speak "Hey Mycroft" - should see "!" when detected
# Press Ctrl+C to exit

# Test with different threshold
precise-listen hey-mycroft.net -t 0.7  # More conservative

Use Pre-trained Model in Voice Server

cd ~/voice-assistant

# Start server with Hey Mycroft model
python voice_server.py \
    --enable-precise \
    --precise-model ~/precise-models/pretrained/hey-mycroft.net \
    --precise-sensitivity 0.5

Fine-tune Pre-trained Models

You can use pre-trained models as a starting point and fine-tune with your voice:

cd ~/precise-models
mkdir -p hey-mycroft-custom

# Copy base model
cp pretrained/hey-mycroft.net hey-mycroft-custom/

# Collect your samples
cd hey-mycroft-custom
precise-collect  # Record 20-30 samples of YOUR voice

# Fine-tune from pre-trained model
precise-train -e 30 hey-mycroft-custom.net . \
    --from-checkpoint ../pretrained/hey-mycroft.net

# This is MUCH faster than training from scratch!

Benefits:

✅ Start with proven model
✅ Much less training data needed (20-30 vs 100+ samples)
✅ Faster training (30 mins vs 60 mins)
✅ Good baseline accuracy

Multiple Wake Words

Architecture Options

Option 1: Multiple Models in Parallel (Server-Side Only)

Run multiple Precise instances simultaneously:

# In voice_server.py - Multiple wake word detection

from precise_runner import PreciseEngine, PreciseRunner
import threading

# Global runners
precise_runners = {}

def on_wake_word_detected(wake_word_name):
    """Callback factory for different wake words"""
    def callback():
        print(f"Wake word detected: {wake_word_name}")
        wake_word_queue.put({
            'wake_word': wake_word_name,
            'timestamp': time.time()
        })
    return callback

def start_multiple_wake_words(wake_word_configs):
    """
    Start multiple wake word detectors
    
    Args:
        wake_word_configs: List of dicts with 'name', 'model', 'sensitivity'
    
    Example:
        configs = [
            {'name': 'hey mycroft', 'model': 'hey-mycroft.net', 'sensitivity': 0.5},
            {'name': 'hey jarvis', 'model': 'hey-jarvis.net', 'sensitivity': 0.5}
        ]
    """
    global precise_runners
    
    for config in wake_word_configs:
        engine = PreciseEngine(
            '/usr/local/bin/precise-engine',
            config['model']
        )
        
        runner = PreciseRunner(
            engine,
            sensitivity=config['sensitivity'],
            on_activation=on_wake_word_detected(config['name'])
        )
        
        runner.start()
        precise_runners[config['name']] = runner
        
        print(f"Started wake word detector: {config['name']}")

Server-Side Multiple Wake Words:

# Start server with multiple wake words
python voice_server.py \
    --enable-precise \
    --precise-models "hey-mycroft:~/models/hey-mycroft.net:0.5,hey-jarvis:~/models/hey-jarvis.net:0.5"

Performance Impact:

CPU: ~5-10% per model (can run 2-3 easily)
Memory: ~50-100MB per model
Latency: Minimal (all run in parallel)

Option 2: Single Model, Multiple Phrases (Edge or Server)

Train ONE model that responds to multiple phrases:

cd ~/precise-models/multi-wake
conda activate precise

# Record samples for BOTH wake words in the SAME dataset
# Label all as "wake-word" regardless of which phrase

mkdir -p wake-word not-wake-word

# Record "Hey Mycroft" samples
precise-collect  # Save to wake-word/hey-mycroft-*.wav

# Record "Hey Computer" samples  
precise-collect  # Save to wake-word/hey-computer-*.wav

# Record negatives
precise-collect -f not-wake-word/random.wav

# Train single model on both phrases
precise-train -e 60 multi-wake.net .

Pros:

✅ Single model = less compute
✅ Works on edge (K210)
✅ Easy to deploy

Cons:

❌ Can't tell which wake word was used
❌ May reduce accuracy for each individual phrase
❌ Higher false positive risk

Option 3: Sequential Detection (Edge)

Detect wake word, then identify which one:

# Pseudo-code for edge detection
if wake_word_detected():
    audio_snippet = last_2_seconds()
    
    # Run all models on the audio snippet
    scores = {
        'hey-mycroft': model1.score(audio_snippet),
        'hey-jarvis': model2.score(audio_snippet),
        'hey-computer': model3.score(audio_snippet)
    }
    
    # Use highest scoring wake word
    wake_word = max(scores, key=scores.get)

Recommendations

Server-Side (Heimdall):

✅ Use Option 1 - Multiple models in parallel
Run 2-3 wake words easily
Each can have different sensitivity
Can identify which wake word was used
Example: "Hey Mycroft" for commands, "Hey Jarvis" for queries

Edge (Maix Duino K210):

✅ Use Option 2 - Single multi-phrase model
K210 can handle 1 model efficiently
Train on 2-3 phrases max
Simpler deployment
Lower latency

Voice Adaptation & Multi-User Support

Approach 1: Inclusive Training (Recommended)

Train ONE model on EVERYONE'S voices:

cd ~/precise-models/family-wake-word
conda activate precise

# Record samples from each family member
# Alice records 30 samples
precise-collect  # Save as wake-word/alice-*.wav

# Bob records 30 samples
precise-collect  # Save as wake-word/bob-*.wav

# Carol records 30 samples
precise-collect  # Save as wake-word/carol-*.wav

# Train on all voices
precise-train -e 60 family-wake-word.net .

Pros:

✅ Everyone can use the system
✅ Single model deployment
✅ Works for all family members
✅ Simple maintenance

Cons:

❌ Can't identify who spoke
❌ May need more training data
❌ No personalization

Best for: Family voice assistant, shared devices

Approach 2: Speaker Identification (Advanced)

Detect wake word, then identify speaker:

# Architecture with speaker ID

# Step 1: Precise detects wake word
if wake_word_detected():
    
    # Step 2: Capture voice sample
    voice_sample = record_audio(duration=3)
    
    # Step 3: Speaker identification
    speaker = identify_speaker(voice_sample)
    # Uses voice embeddings/neural network
    
    # Step 4: Process with user context
    process_command(voice_sample, user=speaker)

Implementation Options:

Option A: Use resemblyzer (Voice Embeddings)

pip install resemblyzer --break-system-packages

# Enrollment phase
python enroll_users.py
# Each user records 10-20 seconds of speech
# System creates voice profile (embedding)

# Runtime
python speaker_id.py
# Compares incoming audio to stored embeddings
# Returns most likely speaker

Example Code:

from resemblyzer import VoiceEncoder, preprocess_wav
import numpy as np

# Initialize encoder
encoder = VoiceEncoder()

# Enrollment - do once per user
def enroll_user(name, audio_files):
    """Create voice profile for user"""
    embeddings = []
    
    for audio_file in audio_files:
        wav = preprocess_wav(audio_file)
        embedding = encoder.embed_utterance(wav)
        embeddings.append(embedding)
    
    # Average embeddings for robustness
    user_profile = np.mean(embeddings, axis=0)
    
    # Save profile
    np.save(f'profiles/{name}.npy', user_profile)
    return user_profile

# Identification - run each time
def identify_speaker(audio_file, profiles_dir='profiles'):
    """Identify which enrolled user is speaking"""
    wav = preprocess_wav(audio_file)
    test_embedding = encoder.embed_utterance(wav)
    
    # Load all profiles
    profiles = {}
    for profile_file in os.listdir(profiles_dir):
        name = profile_file.replace('.npy', '')
        profile = np.load(os.path.join(profiles_dir, profile_file))
        profiles[name] = profile
    
    # Calculate similarity to each profile
    similarities = {}
    for name, profile in profiles.items():
        similarity = np.dot(test_embedding, profile)
        similarities[name] = similarity
    
    # Return most similar
    best_match = max(similarities, key=similarities.get)
    confidence = similarities[best_match]
    
    if confidence > 0.7:  # Threshold
        return best_match
    else:
        return "unknown"

Option B: Use pyannote.audio (Production-grade)

pip install pyannote.audio --break-system-packages

# Requires HuggingFace token (same as diarization)

Example:

from pyannote.audio import Inference

# Initialize
inference = Inference(
    "pyannote/embedding",
    use_auth_token="your_hf_token"
)

# Enroll users
alice_profile = inference("alice_sample.wav")
bob_profile = inference("bob_sample.wav")

# Identify
test_embedding = inference("test_audio.wav")

# Compare
from scipy.spatial.distance import cosine
alice_similarity = 1 - cosine(test_embedding, alice_profile)
bob_similarity = 1 - cosine(test_embedding, bob_profile)

if alice_similarity > bob_similarity and alice_similarity > 0.7:
    speaker = "Alice"
elif bob_similarity > 0.7:
    speaker = "Bob"
else:
    speaker = "Unknown"

Pros:

✅ Can identify individual users
✅ Personalized responses
✅ User-specific commands/permissions
✅ Better for privacy (know who's speaking)

Cons:

❌ More complex implementation
❌ Requires enrollment phase
❌ Additional processing time (~100-200ms)
❌ May fail with similar voices

Approach 3: Per-User Wake Word Models

Each person has their OWN wake word:

# Alice's wake word: "Hey Mycroft"
# Train on ONLY Alice's voice

# Bob's wake word: "Hey Jarvis"  
# Train on ONLY Bob's voice

# Carol's wake word: "Hey Computer"
# Train on ONLY Carol's voice

Deployment: Run all 3 models in parallel (server-side):

wake_word_configs = [
    {'name': 'Alice', 'wake_word': 'hey mycroft', 'model': 'alice-wake.net'},
    {'name': 'Bob', 'wake_word': 'hey jarvis', 'model': 'bob-wake.net'},
    {'name': 'Carol', 'wake_word': 'hey computer', 'model': 'carol-wake.net'}
]

Pros:

✅ Automatic user identification
✅ Highest accuracy per user
✅ Clear user separation
✅ No additional speaker ID needed

Cons:

❌ Requires 3x models (server only)
❌ Users must remember their wake word
❌ 3x CPU usage (~15-30%)
❌ Can't work on edge (K210)

Approach 4: Context-Based Adaptation

No speaker ID, but learn from interaction:

# Track command patterns
user_context = {
    'last_command': 'turn on living room lights',
    'frequent_entities': ['light.living_room', 'light.bedroom'],
    'time_of_day_patterns': {'morning': 'coffee maker', 'evening': 'tv'},
    'location': 'home'  # vs 'away'
}

# Use context to improve intent recognition
if "turn on the lights" and time.is_morning():
    # Probably means bedroom lights (based on history)
    entity = user_context['frequent_entities'][0]

Pros:

✅ No enrollment needed
✅ Improves over time
✅ Simple to implement
✅ Works with any number of users

Cons:

❌ No true user identification
❌ May make incorrect assumptions
❌ Privacy concerns (tracking behavior)

Recommended Strategy

For Your Use Case

Based on your home lab setup, I recommend:

Phase 1: Single Wake Word, Inclusive Training (Week 1-2)

# Start simple
cd ~/precise-models/hey-computer
conda activate precise

# Have all family members record samples
# Alice: 30 samples of "Hey Computer"
# Bob: 30 samples of "Hey Computer"
# You: 30 samples of "Hey Computer"

# Train single model on all voices
precise-train -e 60 hey-computer.net .

# Deploy to server
python voice_server.py \
    --enable-precise \
    --precise-model hey-computer.net

Why:

Simple to setup and test
Everyone can use it immediately
Single model = easier debugging
Works on edge if you migrate later

Phase 2: Add Speaker Identification (Week 3-4)

# Install resemblyzer
pip install resemblyzer --break-system-packages

# Enroll users
python enroll_users.py
# Each person speaks for 20 seconds

# Update voice_server.py to identify speaker
# Use speaker ID for personalized responses

Why:

Enables personalization
Can track preferences per user
User-specific command permissions
Better privacy (know who's speaking)

Phase 3: Multiple Wake Words (Month 2+)

# Add alternative wake words for different contexts
# "Hey Mycroft" - General commands
# "Hey Jarvis" - Media/Plex control
# "Computer" - Quick commands (lights, temp)

# Deploy multiple models on server
python voice_server.py \
    --enable-precise \
    --precise-models "mycroft:hey-mycroft.net:0.5,jarvis:hey-jarvis.net:0.5"

Why:

Different wake words for different contexts
Reduces false positives (more specific triggers)
Fun factor (Jarvis for media!)
Server can handle 2-3 easily

Implementation Guide: Multiple Wake Words

Update voice_server.py for Multiple Wake Words

# Add to voice_server.py

def start_multiple_wake_words(configs):
    """
    Start multiple wake word detectors
    
    Args:
        configs: List of dicts with 'name', 'model_path', 'sensitivity'
    """
    global precise_runners
    precise_runners = {}
    
    for config in configs:
        try:
            engine = PreciseEngine(
                DEFAULT_PRECISE_ENGINE,
                config['model_path']
            )
            
            def make_callback(wake_word_name):
                def callback():
                    print(f"Wake word detected: {wake_word_name}")
                    wake_word_queue.put({
                        'wake_word': wake_word_name,
                        'timestamp': time.time(),
                        'source': 'precise'
                    })
                return callback
            
            runner = PreciseRunner(
                engine,
                sensitivity=config['sensitivity'],
                on_activation=make_callback(config['name'])
            )
            
            runner.start()
            precise_runners[config['name']] = runner
            
            print(f"✓ Started: {config['name']} (sensitivity: {config['sensitivity']})")
            
        except Exception as e:
            print(f"✗ Failed to start {config['name']}: {e}")
    
    return len(precise_runners) > 0

# Add to main()
parser.add_argument('--precise-models', 
                   help='Multiple models: name:path:sensitivity,name2:path2:sensitivity2')

# Parse multiple models
if args.precise_models:
    configs = []
    for model_spec in args.precise_models.split(','):
        name, path, sensitivity = model_spec.split(':')
        configs.append({
            'name': name,
            'model_path': os.path.expanduser(path),
            'sensitivity': float(sensitivity)
        })
    
    start_multiple_wake_words(configs)

Usage Example

cd ~/voice-assistant

# Start with multiple wake words
python voice_server.py \
    --enable-precise \
    --precise-models "\
hey-mycroft:~/precise-models/pretrained/hey-mycroft.net:0.5,\
hey-jarvis:~/precise-models/pretrained/hey-jarvis.net:0.5"

Implementation Guide: Speaker Identification

Add to voice_server.py

# Add resemblyzer support
try:
    from resemblyzer import VoiceEncoder, preprocess_wav
    import numpy as np
    SPEAKER_ID_AVAILABLE = True
except ImportError:
    SPEAKER_ID_AVAILABLE = False
    print("Warning: resemblyzer not available. Speaker ID disabled.")

# Initialize encoder
voice_encoder = None
speaker_profiles = {}

def load_speaker_profiles(profiles_dir='~/voice-assistant/profiles'):
    """Load enrolled speaker profiles"""
    global speaker_profiles, voice_encoder
    
    if not SPEAKER_ID_AVAILABLE:
        return False
    
    profiles_dir = os.path.expanduser(profiles_dir)
    
    if not os.path.exists(profiles_dir):
        print(f"No speaker profiles found at {profiles_dir}")
        return False
    
    # Initialize encoder
    voice_encoder = VoiceEncoder()
    
    # Load all profiles
    for profile_file in os.listdir(profiles_dir):
        if profile_file.endswith('.npy'):
            name = profile_file.replace('.npy', '')
            profile = np.load(os.path.join(profiles_dir, profile_file))
            speaker_profiles[name] = profile
            print(f"Loaded speaker profile: {name}")
    
    return len(speaker_profiles) > 0

def identify_speaker(audio_path, threshold=0.7):
    """Identify speaker from audio file"""
    if not SPEAKER_ID_AVAILABLE or not speaker_profiles:
        return None
    
    try:
        # Get embedding for test audio
        wav = preprocess_wav(audio_path)
        test_embedding = voice_encoder.embed_utterance(wav)
        
        # Compare to all profiles
        similarities = {}
        for name, profile in speaker_profiles.items():
            similarity = np.dot(test_embedding, profile)
            similarities[name] = similarity
        
        # Get best match
        best_match = max(similarities, key=similarities.get)
        confidence = similarities[best_match]
        
        print(f"Speaker ID: {best_match} (confidence: {confidence:.2f})")
        
        if confidence > threshold:
            return best_match
        else:
            return "unknown"
            
    except Exception as e:
        print(f"Error identifying speaker: {e}")
        return None

# Update process endpoint to include speaker ID
@app.route('/process', methods=['POST'])
def process():
    """Process complete voice command with speaker identification"""
    # ... existing code ...
    
    # Add speaker identification
    speaker = identify_speaker(temp_path) if speaker_profiles else None
    
    if speaker:
        print(f"Detected speaker: {speaker}")
        # Could personalize response based on speaker
    
    # ... rest of processing ...

Enrollment Script

Create enroll_speaker.py:

#!/usr/bin/env python3
"""
Enroll users for speaker identification

Usage:
    python enroll_speaker.py --name Alice --audio alice_sample.wav
    python enroll_speaker.py --name Alice --duration 20  # Record live
"""

import argparse
import os
import numpy as np
from resemblyzer import VoiceEncoder, preprocess_wav
import pyaudio
import wave

def record_audio(duration=20, sample_rate=16000):
    """Record audio from microphone"""
    print(f"Recording for {duration} seconds...")
    print("Speak naturally - read a paragraph, have a conversation, etc.")
    
    chunk = 1024
    format = pyaudio.paInt16
    channels = 1
    
    p = pyaudio.PyAudio()
    
    stream = p.open(
        format=format,
        channels=channels,
        rate=sample_rate,
        input=True,
        frames_per_buffer=chunk
    )
    
    frames = []
    for i in range(0, int(sample_rate / chunk * duration)):
        data = stream.read(chunk)
        frames.append(data)
    
    stream.stop_stream()
    stream.close()
    p.terminate()
    
    # Save to temp file
    temp_file = f"/tmp/enrollment_{os.getpid()}.wav"
    wf = wave.open(temp_file, 'wb')
    wf.setnchannels(channels)
    wf.setsampwidth(p.get_sample_size(format))
    wf.setframerate(sample_rate)
    wf.writeframes(b''.join(frames))
    wf.close()
    
    return temp_file

def enroll_speaker(name, audio_file, profiles_dir='~/voice-assistant/profiles'):
    """Create voice profile for speaker"""
    profiles_dir = os.path.expanduser(profiles_dir)
    os.makedirs(profiles_dir, exist_ok=True)
    
    # Initialize encoder
    encoder = VoiceEncoder()
    
    # Process audio
    wav = preprocess_wav(audio_file)
    embedding = encoder.embed_utterance(wav)
    
    # Save profile
    profile_path = os.path.join(profiles_dir, f'{name}.npy')
    np.save(profile_path, embedding)
    
    print(f"✓ Enrolled speaker: {name}")
    print(f"  Profile saved to: {profile_path}")
    
    return profile_path

def main():
    parser = argparse.ArgumentParser(description="Enroll speaker for voice identification")
    parser.add_argument('--name', required=True, help='Speaker name')
    parser.add_argument('--audio', help='Path to audio file (wav)')
    parser.add_argument('--duration', type=int, default=20, 
                       help='Recording duration if not using audio file')
    parser.add_argument('--profiles-dir', default='~/voice-assistant/profiles',
                       help='Directory to save profiles')
    
    args = parser.parse_args()
    
    # Get audio file
    if args.audio:
        audio_file = args.audio
        if not os.path.exists(audio_file):
            print(f"Error: Audio file not found: {audio_file}")
            return 1
    else:
        audio_file = record_audio(args.duration)
    
    # Enroll speaker
    try:
        enroll_speaker(args.name, audio_file, args.profiles_dir)
        return 0
    except Exception as e:
        print(f"Error enrolling speaker: {e}")
        return 1

if __name__ == '__main__':
    import sys
    sys.exit(main())

Performance Comparison

Single Wake Word

Latency: 100-200ms
CPU: ~5-10% (idle)
Memory: ~100MB
Accuracy: 95%+

Multiple Wake Words (3 models)

Latency: 100-200ms (parallel)
CPU: ~15-30% (idle)
Memory: ~300MB
Accuracy: 95%+ each

With Speaker Identification

Additional latency: +100-200ms
Additional CPU: +5% during ID
Additional memory: +50MB
Accuracy: 85-95% (depending on enrollment quality)

Best Practices

Wake Word Selection

Different enough - "Hey Mycroft" vs "Hey Jarvis" (not "Hey Alice" vs "Hey Alex")
Clear consonants - Easier to detect
2-3 syllables - Not too short, not too long
Test in environment - Check for false triggers

Training

Include all users - If using single model
Diverse conditions - Different rooms, noise levels
Regular updates - Add false positives weekly
Per-user models - Higher accuracy, more compute

Speaker Identification

Quality enrollment - 20+ seconds of clear speech
Re-enroll periodically - Voices change (colds, etc.)
Test thresholds - Balance accuracy vs false IDs
Graceful fallback - Handle unknown speakers

Recommended Path for You

# Week 1: Start with pre-trained "Hey Mycroft"
wget https://github.com/MycroftAI/precise-data/raw/models-dev/hey-mycroft.tar.gz
precise-listen hey-mycroft.net  # Test it!

# Week 2: Fine-tune with your voices
precise-train -e 30 hey-mycroft-custom.net . \
    --from-checkpoint hey-mycroft.net

# Week 3: Add speaker identification
pip install resemblyzer
python enroll_speaker.py --name Alan --duration 20
python enroll_speaker.py --name [Family Member] --duration 20

# Week 4: Add second wake word ("Hey Jarvis" for Plex?)
wget hey-jarvis.tar.gz
# Run both in parallel

# Month 2+: Optimize and expand
# - More wake words for different contexts
# - Per-user wake word models
# - Context-aware responses

This gives you a smooth progression from simple to advanced!

24 KiB Executable file Raw Permalink Blame History

Advanced Wake Word Topics - Pre-trained Models, Multiple Wake Words, and Voice Adaptation

Pre-trained Mycroft Models

Yes! Pre-trained Models Exist

Download Pre-trained Models

Test Pre-trained Model

Use Pre-trained Model in Voice Server

Fine-tune Pre-trained Models

Multiple Wake Words

Architecture Options

Option 1: Multiple Models in Parallel (Server-Side Only)

Option 2: Single Model, Multiple Phrases (Edge or Server)

Option 3: Sequential Detection (Edge)

Recommendations

Voice Adaptation & Multi-User Support

Approach 1: Inclusive Training (Recommended)

Approach 2: Speaker Identification (Advanced)

Option A: Use resemblyzer (Voice Embeddings)

Option B: Use pyannote.audio (Production-grade)

Approach 3: Per-User Wake Word Models

Approach 4: Context-Based Adaptation

Recommended Strategy

For Your Use Case

Phase 1: Single Wake Word, Inclusive Training (Week 1-2)

Phase 2: Add Speaker Identification (Week 3-4)

Phase 3: Multiple Wake Words (Month 2+)

Implementation Guide: Multiple Wake Words

Update voice_server.py for Multiple Wake Words

Usage Example

Implementation Guide: Speaker Identification

Add to voice_server.py

Enrollment Script

Performance Comparison

Single Wake Word

Multiple Wake Words (3 models)

With Speaker Identification

Best Practices

Wake Word Selection

Training

Speaker Identification

Recommended Path for You

24 KiB

Executable file

Raw Permalink Blame History