minerva/docs/WAKE_WORD_QUICK_REF.md
pyr0ball 173f7f37d4 feat: import mycroft-precise work as Minerva foundation
Ports prior voice assistant research and prototypes from devl/Devops
into the Minerva repo. Includes:

- docs/: architecture, wake word guides, ESP32-S3 spec, hardware buying guide
- scripts/: voice_server.py, voice_server_enhanced.py, setup scripts
- hardware/maixduino/: edge device scripts with WiFi credentials scrubbed
  (replaced hardcoded password with secrets.py pattern)
- config/.env.example: server config template
- .gitignore: excludes .env, secrets.py, model blobs, ELF firmware
- CLAUDE.md: Minerva product context and connection to cf-voice roadmap
2026-04-06 22:21:12 -07:00

10 KiB
Executable file

Wake Word Quick Reference Card

🎯 TL;DR: What Should I Do?

Recommendation for Your Setup

Week 1: Use pre-trained "Hey Mycroft"

./download_pretrained_models.sh --model hey-mycroft
precise-listen ~/precise-models/pretrained/hey-mycroft.net

Week 2-3: Fine-tune with all family members' voices

cd ~/precise-models/hey-mycroft-family
precise-train -e 30 custom.net . --from-checkpoint ../pretrained/hey-mycroft.net

Week 4+: Add speaker identification

pip install resemblyzer
python enroll_speaker.py --name Alan --duration 20
python enroll_speaker.py --name [Family] --duration 20

Month 2+: Add second wake word (Hey Jarvis for Plex?)

./download_pretrained_models.sh --model hey-jarvis
# Run both in parallel on server

📋 Pre-trained Models

Available Models (Ready to Use!)

Wake Word Download Best For
Hey Mycroft --model hey-mycroft Default choice, most data
Hey Jarvis --model hey-jarvis Pop culture, media control
Christopher --model christopher Unique, less common
Hey Ezra --model hey-ezra Alternative option

Quick Download

# Download one
./download_pretrained_models.sh --model hey-mycroft

# Download all
./download_pretrained_models.sh --test-all

# Test immediately
precise-listen ~/precise-models/pretrained/hey-mycroft.net

🔢 Multiple Wake Words

What: Run 2-3 different wake word models simultaneously
Where: Heimdall (server)
Performance: ~15-30% CPU for 3 models

# Start with multiple wake words
python voice_server.py \
    --enable-precise \
    --precise-models "\
hey-mycroft:~/models/hey-mycroft.net:0.5,\
hey-jarvis:~/models/hey-jarvis.net:0.5"

Pros:

  • Can identify which wake word was used
  • Different contexts (Mycroft=commands, Jarvis=media)
  • Easy to add/remove wake words
  • Each can have different sensitivity

Cons:

  • Only works server-side (not on Maix Duino)
  • Higher CPU usage (but still reasonable)

Use When:

  • You want different wake words for different purposes
  • Server has CPU to spare (yours does!)
  • Want flexibility to add wake words later

Option 2: Single Multi-Phrase Model (Edge-Compatible)

What: One model responds to multiple phrases
Where: Server OR Maix Duino
Performance: Same as single model

# Train on multiple phrases
cd ~/precise-models/multi-wake
# Record "Hey Mycroft" samples → wake-word/
# Record "Hey Computer" samples → wake-word/
# Record negatives → not-wake-word/
precise-train -e 60 multi-wake.net .

Pros:

  • Single model = less compute
  • Works on edge (K210)
  • Simple deployment

Cons:

  • Can't tell which wake word was used
  • May reduce accuracy
  • Higher false positive risk

Use When:

  • Deploying to Maix Duino (edge)
  • Want backup wake words
  • Don't care which was used

👥 Multi-User Support

Option 1: Inclusive Training START HERE

What: One model, all voices
How: All family members record samples

cd ~/precise-models/family-wake
# Alice records 30 samples
# Bob records 30 samples  
# You record 30 samples
precise-train -e 60 family-wake.net .

Pros:

  • Everyone can use it
  • Simple deployment
  • Single model

Cons:

  • Can't identify who spoke
  • No personalization

Use When:

  • Just getting started
  • Don't need to know who spoke
  • Want simplicity

Option 2: Speaker Identification (Week 4+)

What: Detect wake word, then identify speaker
How: Voice embeddings (resemblyzer or pyannote)

# Install
pip install resemblyzer

# Enroll users
python enroll_speaker.py --name Alan --duration 20
python enroll_speaker.py --name Alice --duration 20
python enroll_speaker.py --name Bob --duration 20

# Server identifies speaker automatically

Pros:

  • Personalized responses
  • User-specific permissions
  • Better privacy
  • Track preferences

Cons:

  • More complex
  • Requires enrollment
  • +100-200ms latency
  • May fail with similar voices

Use When:

  • Want personalization
  • Need user-specific commands
  • Ready for advanced features

Option 3: Per-User Wake Words (Advanced)

What: Each person has their own wake word
How: Multiple models, one per person

# Alice: "Hey Mycroft"
# Bob: "Hey Jarvis"
# You: "Hey Computer"

# Run all 3 models in parallel

Pros:

  • Automatic user ID
  • Highest accuracy per user
  • Clear separation

Cons:

  • 3x models = 3x CPU
  • Users must remember their word
  • Server-only (not edge)

Use When:

  • Need automatic user ID
  • Have CPU to spare
  • Users want their own wake word

🎯 Decision Tree

START: Want to use voice assistant
  │
  ├─ Single user or don't care who spoke?
  │   └─ Use: Inclusive Training (Option 1)
  │       └─ Download: Hey Mycroft (pre-trained)
  │
  ├─ Multiple users AND need to know who spoke?
  │   └─ Use: Speaker Identification (Option 2)
  │       └─ Start with: Hey Mycroft + resemblyzer
  │
  ├─ Want different wake words for different purposes?
  │   └─ Use: Multiple Models (Option 1)
  │       └─ Download: Hey Mycroft + Hey Jarvis
  │
  └─ Deploying to Maix Duino (edge)?
      └─ Use: Single Multi-Phrase Model (Option 2)
          └─ Train: Custom model with 2-3 phrases

📊 Comparison Table

Feature Inclusive Speaker ID Per-User Wake Multiple Wake
Setup Time 2 hours 4 hours 6 hours 3 hours
Complexity Easy Medium Hard Easy
CPU Usage 5-10% 10-15% 15-30% 15-30%
Latency 100ms 300ms 100ms 100ms
User ID No Yes Yes No
Edge Deploy Yes ⚠️ Maybe No ⚠️ Partial
Personalize No Yes Yes ⚠️ Partial

Week 1: Get It Working

# Use pre-trained Hey Mycroft
./download_pretrained_models.sh --model hey-mycroft

# Test it
precise-listen ~/precise-models/pretrained/hey-mycroft.net

# Deploy to server
python voice_server.py --enable-precise \
    --precise-model ~/precise-models/pretrained/hey-mycroft.net

Week 2-3: Make It Yours

# Fine-tune with your family's voices
cd ~/precise-models/hey-mycroft-family

# Have everyone record 20-30 samples
precise-collect  # Alice
precise-collect  # Bob
precise-collect  # You

# Train
precise-train -e 30 custom.net . \
    --from-checkpoint ../pretrained/hey-mycroft.net

Week 4+: Add Intelligence

# Speaker identification
pip install resemblyzer
python enroll_speaker.py --name Alan --duration 20
python enroll_speaker.py --name Alice --duration 20

# Now server knows who's speaking!

Month 2+: Expand Features

# Add second wake word for media control
./download_pretrained_models.sh --model hey-jarvis

# Run both: Mycroft for commands, Jarvis for Plex
python voice_server.py --enable-precise \
    --precise-models "mycroft:hey-mycroft.net:0.5,jarvis:hey-jarvis.net:0.5"

💡 Pro Tips

Wake Word Selection

  • DO: Choose clear, distinct wake words
  • DO: Test in your environment
  • DON'T: Use similar-sounding words
  • DON'T: Use common phrases

Training

  • DO: Include all intended users
  • DO: Record in various conditions
  • DO: Add false positives to training
  • DON'T: Rush the training process

Deployment

  • DO: Start simple (one wake word)
  • DO: Test thoroughly before adding features
  • DO: Monitor false positive rate
  • DON'T: Deploy too many wake words at once

Speaker ID

  • DO: Use 20+ seconds for enrollment
  • DO: Re-enroll if accuracy drops
  • DO: Test threshold values
  • DON'T: Expect 100% accuracy

🔧 Quick Commands

# Download pre-trained model
./download_pretrained_models.sh --model hey-mycroft

# Test model
precise-listen ~/precise-models/pretrained/hey-mycroft.net

# Fine-tune from pre-trained
precise-train -e 30 custom.net . \
    --from-checkpoint ~/precise-models/pretrained/hey-mycroft.net

# Enroll speaker
python enroll_speaker.py --name Alan --duration 20

# Start with single wake word
python voice_server.py --enable-precise \
    --precise-model hey-mycroft.net

# Start with multiple wake words
python voice_server.py --enable-precise \
    --precise-models "mycroft:hey-mycroft.net:0.5,jarvis:hey-jarvis.net:0.5"

# Check status
curl http://10.1.10.71:5000/wake-word/status

# Monitor detections
curl http://10.1.10.71:5000/wake-word/detections

📚 See Also


FAQ

Q: Can I use "Hey Mycroft" right away?
A: Yes! Download with ./download_pretrained_models.sh --model hey-mycroft

Q: How many wake words can I run at once?
A: 2-3 comfortably on server. Maix Duino can handle 1.

Q: Can I train my own custom wake word?
A: Yes! See MYCROFT_PRECISE_GUIDE.md Phase 2.

Q: Does speaker ID work with multiple wake words?
A: Yes! Wake word detected → Speaker identified → Personalized response.

Q: Can I use this on Maix Duino?
A: Server-side (start here), then convert to KMODEL (advanced).

Q: How accurate is speaker identification?
A: 85-95% with good enrollment. Re-enroll if accuracy drops.

Q: What if someone has a cold?
A: May reduce accuracy temporarily. System should recover when voice returns to normal.

Q: Can kids use it?
A: Yes! Include their voices in training or enroll them separately.


Quick Decision: Start with pre-trained Hey Mycroft. Add features later!

./download_pretrained_models.sh --model hey-mycroft
precise-listen ~/precise-models/pretrained/hey-mycroft.net
# It just works! ✨