pyr0ball 173f7f37d4 feat: import mycroft-precise work as Minerva foundation

Ports prior voice assistant research and prototypes from devl/Devops
into the Minerva repo. Includes:

- docs/: architecture, wake word guides, ESP32-S3 spec, hardware buying guide
- scripts/: voice_server.py, voice_server_enhanced.py, setup scripts
- hardware/maixduino/: edge device scripts with WiFi credentials scrubbed
  (replaced hardcoded password with secrets.py pattern)
- config/.env.example: server config template
- .gitignore: excludes .env, secrets.py, model blobs, ELF firmware
- CLAUDE.md: Minerva product context and connection to cf-voice roadmap

2026-04-06 22:21:12 -07:00

10 KiB

Executable file

Raw Blame History

Wake Word Quick Reference Card

🎯 TL;DR: What Should I Do?

Recommendation for Your Setup

Week 1: Use pre-trained "Hey Mycroft"

./download_pretrained_models.sh --model hey-mycroft
precise-listen ~/precise-models/pretrained/hey-mycroft.net

Week 2-3: Fine-tune with all family members' voices

cd ~/precise-models/hey-mycroft-family
precise-train -e 30 custom.net . --from-checkpoint ../pretrained/hey-mycroft.net

Week 4+: Add speaker identification

pip install resemblyzer
python enroll_speaker.py --name Alan --duration 20
python enroll_speaker.py --name [Family] --duration 20

Month 2+: Add second wake word (Hey Jarvis for Plex?)

./download_pretrained_models.sh --model hey-jarvis
# Run both in parallel on server

📋 Pre-trained Models

Available Models (Ready to Use!)

Wake Word	Download	Best For
Hey Mycroft ⭐	`--model hey-mycroft`	Default choice, most data
Hey Jarvis	`--model hey-jarvis`	Pop culture, media control
Christopher	`--model christopher`	Unique, less common
Hey Ezra	`--model hey-ezra`	Alternative option

Quick Download

# Download one
./download_pretrained_models.sh --model hey-mycroft

# Download all
./download_pretrained_models.sh --test-all

# Test immediately
precise-listen ~/precise-models/pretrained/hey-mycroft.net

🔢 Multiple Wake Words

Option 1: Multiple Models (Server-Side) ⭐ RECOMMENDED

What: Run 2-3 different wake word models simultaneously
Where: Heimdall (server)
Performance: ~15-30% CPU for 3 models

# Start with multiple wake words
python voice_server.py \
    --enable-precise \
    --precise-models "\
hey-mycroft:~/models/hey-mycroft.net:0.5,\
hey-jarvis:~/models/hey-jarvis.net:0.5"

Pros:

✅ Can identify which wake word was used
✅ Different contexts (Mycroft=commands, Jarvis=media)
✅ Easy to add/remove wake words
✅ Each can have different sensitivity

Cons:

❌ Only works server-side (not on Maix Duino)
❌ Higher CPU usage (but still reasonable)

Use When:

You want different wake words for different purposes
Server has CPU to spare (yours does!)
Want flexibility to add wake words later

Option 2: Single Multi-Phrase Model (Edge-Compatible)

What: One model responds to multiple phrases
Where: Server OR Maix Duino
Performance: Same as single model

# Train on multiple phrases
cd ~/precise-models/multi-wake
# Record "Hey Mycroft" samples → wake-word/
# Record "Hey Computer" samples → wake-word/
# Record negatives → not-wake-word/
precise-train -e 60 multi-wake.net .

Pros:

✅ Single model = less compute
✅ Works on edge (K210)
✅ Simple deployment

Cons:

❌ Can't tell which wake word was used
❌ May reduce accuracy
❌ Higher false positive risk

Use When:

Deploying to Maix Duino (edge)
Want backup wake words
Don't care which was used

👥 Multi-User Support

Option 1: Inclusive Training ⭐ START HERE

What: One model, all voices
How: All family members record samples

cd ~/precise-models/family-wake
# Alice records 30 samples
# Bob records 30 samples  
# You record 30 samples
precise-train -e 60 family-wake.net .

Pros:

✅ Everyone can use it
✅ Simple deployment
✅ Single model

Cons:

❌ Can't identify who spoke
❌ No personalization

Use When:

Just getting started
Don't need to know who spoke
Want simplicity

Option 2: Speaker Identification (Week 4+)

What: Detect wake word, then identify speaker
How: Voice embeddings (resemblyzer or pyannote)

# Install
pip install resemblyzer

# Enroll users
python enroll_speaker.py --name Alan --duration 20
python enroll_speaker.py --name Alice --duration 20
python enroll_speaker.py --name Bob --duration 20

# Server identifies speaker automatically

Pros:

✅ Personalized responses
✅ User-specific permissions
✅ Better privacy
✅ Track preferences

Cons:

❌ More complex
❌ Requires enrollment
❌ +100-200ms latency
❌ May fail with similar voices

Use When:

Want personalization
Need user-specific commands
Ready for advanced features

Option 3: Per-User Wake Words (Advanced)

What: Each person has their own wake word
How: Multiple models, one per person

# Alice: "Hey Mycroft"
# Bob: "Hey Jarvis"
# You: "Hey Computer"

# Run all 3 models in parallel

Pros:

✅ Automatic user ID
✅ Highest accuracy per user
✅ Clear separation

Cons:

❌ 3x models = 3x CPU
❌ Users must remember their word
❌ Server-only (not edge)

Use When:

Need automatic user ID
Have CPU to spare
Users want their own wake word

🎯 Decision Tree

START: Want to use voice assistant
  │
  ├─ Single user or don't care who spoke?
  │   └─ Use: Inclusive Training (Option 1)
  │       └─ Download: Hey Mycroft (pre-trained)
  │
  ├─ Multiple users AND need to know who spoke?
  │   └─ Use: Speaker Identification (Option 2)
  │       └─ Start with: Hey Mycroft + resemblyzer
  │
  ├─ Want different wake words for different purposes?
  │   └─ Use: Multiple Models (Option 1)
  │       └─ Download: Hey Mycroft + Hey Jarvis
  │
  └─ Deploying to Maix Duino (edge)?
      └─ Use: Single Multi-Phrase Model (Option 2)
          └─ Train: Custom model with 2-3 phrases

📊 Comparison Table

Feature	Inclusive	Speaker ID	Per-User Wake	Multiple Wake
Setup Time	2 hours	4 hours	6 hours	3 hours
Complexity	⭐ Easy	⭐⭐⭐ Medium	⭐⭐⭐⭐ Hard	⭐⭐ Easy
CPU Usage	5-10%	10-15%	15-30%	15-30%
Latency	100ms	300ms	100ms	100ms
User ID	❌ No	✅ Yes	✅ Yes	❌ No
Edge Deploy	✅ Yes	⚠️ Maybe	❌ No	⚠️ Partial
Personalize	❌ No	✅ Yes	✅ Yes	⚠️ Partial

🚀 Recommended Timeline

Week 1: Get It Working

# Use pre-trained Hey Mycroft
./download_pretrained_models.sh --model hey-mycroft

# Test it
precise-listen ~/precise-models/pretrained/hey-mycroft.net

# Deploy to server
python voice_server.py --enable-precise \
    --precise-model ~/precise-models/pretrained/hey-mycroft.net

Week 2-3: Make It Yours

# Fine-tune with your family's voices
cd ~/precise-models/hey-mycroft-family

# Have everyone record 20-30 samples
precise-collect  # Alice
precise-collect  # Bob
precise-collect  # You

# Train
precise-train -e 30 custom.net . \
    --from-checkpoint ../pretrained/hey-mycroft.net

Week 4+: Add Intelligence

# Speaker identification
pip install resemblyzer
python enroll_speaker.py --name Alan --duration 20
python enroll_speaker.py --name Alice --duration 20

# Now server knows who's speaking!

Month 2+: Expand Features

# Add second wake word for media control
./download_pretrained_models.sh --model hey-jarvis

# Run both: Mycroft for commands, Jarvis for Plex
python voice_server.py --enable-precise \
    --precise-models "mycroft:hey-mycroft.net:0.5,jarvis:hey-jarvis.net:0.5"

💡 Pro Tips

Wake Word Selection

✅ DO: Choose clear, distinct wake words
✅ DO: Test in your environment
❌ DON'T: Use similar-sounding words
❌ DON'T: Use common phrases

Training

✅ DO: Include all intended users
✅ DO: Record in various conditions
✅ DO: Add false positives to training
❌ DON'T: Rush the training process

Deployment

✅ DO: Start simple (one wake word)
✅ DO: Test thoroughly before adding features
✅ DO: Monitor false positive rate
❌ DON'T: Deploy too many wake words at once

Speaker ID

✅ DO: Use 20+ seconds for enrollment
✅ DO: Re-enroll if accuracy drops
✅ DO: Test threshold values
❌ DON'T: Expect 100% accuracy

🔧 Quick Commands

# Download pre-trained model
./download_pretrained_models.sh --model hey-mycroft

# Test model
precise-listen ~/precise-models/pretrained/hey-mycroft.net

# Fine-tune from pre-trained
precise-train -e 30 custom.net . \
    --from-checkpoint ~/precise-models/pretrained/hey-mycroft.net

# Enroll speaker
python enroll_speaker.py --name Alan --duration 20

# Start with single wake word
python voice_server.py --enable-precise \
    --precise-model hey-mycroft.net

# Start with multiple wake words
python voice_server.py --enable-precise \
    --precise-models "mycroft:hey-mycroft.net:0.5,jarvis:hey-jarvis.net:0.5"

# Check status
curl http://10.1.10.71:5000/wake-word/status

# Monitor detections
curl http://10.1.10.71:5000/wake-word/detections

📚 See Also

Full guide: ADVANCED_WAKE_WORD_TOPICS.md
Training: MYCROFT_PRECISE_GUIDE.md
Deployment: PRECISE_DEPLOYMENT.md
Getting started: QUICKSTART.md

❓ FAQ

Q: Can I use "Hey Mycroft" right away?
A: Yes! Download with ./download_pretrained_models.sh --model hey-mycroft

Q: How many wake words can I run at once?
A: 2-3 comfortably on server. Maix Duino can handle 1.

Q: Can I train my own custom wake word?
A: Yes! See MYCROFT_PRECISE_GUIDE.md Phase 2.

Q: Does speaker ID work with multiple wake words?
A: Yes! Wake word detected → Speaker identified → Personalized response.

Q: Can I use this on Maix Duino?
A: Server-side (start here), then convert to KMODEL (advanced).

Q: How accurate is speaker identification?
A: 85-95% with good enrollment. Re-enroll if accuracy drops.

Q: What if someone has a cold?
A: May reduce accuracy temporarily. System should recover when voice returns to normal.

Q: Can kids use it?
A: Yes! Include their voices in training or enroll them separately.

Quick Decision: Start with pre-trained Hey Mycroft. Add features later!

./download_pretrained_models.sh --model hey-mycroft
precise-listen ~/precise-models/pretrained/hey-mycroft.net
# It just works! ✨

10 KiB Executable file Raw Blame History

Wake Word Quick Reference Card

🎯 TL;DR: What Should I Do?

Recommendation for Your Setup

📋 Pre-trained Models

Available Models (Ready to Use!)

Quick Download

🔢 Multiple Wake Words

Option 1: Multiple Models (Server-Side) ⭐ RECOMMENDED

Option 2: Single Multi-Phrase Model (Edge-Compatible)

👥 Multi-User Support

Option 1: Inclusive Training ⭐ START HERE

Option 2: Speaker Identification (Week 4+)

Option 3: Per-User Wake Words (Advanced)

🎯 Decision Tree

📊 Comparison Table

🚀 Recommended Timeline

Week 1: Get It Working

Week 2-3: Make It Yours

Week 4+: Add Intelligence

Month 2+: Expand Features

💡 Pro Tips

Wake Word Selection

Training

Deployment

Speaker ID

🔧 Quick Commands

📚 See Also

❓ FAQ

10 KiB

Executable file

Raw Blame History