minerva/docs/WAKE_WORD_QUICK_REF.md
pyr0ball 173f7f37d4 feat: import mycroft-precise work as Minerva foundation
Ports prior voice assistant research and prototypes from devl/Devops
into the Minerva repo. Includes:

- docs/: architecture, wake word guides, ESP32-S3 spec, hardware buying guide
- scripts/: voice_server.py, voice_server_enhanced.py, setup scripts
- hardware/maixduino/: edge device scripts with WiFi credentials scrubbed
  (replaced hardcoded password with secrets.py pattern)
- config/.env.example: server config template
- .gitignore: excludes .env, secrets.py, model blobs, ELF firmware
- CLAUDE.md: Minerva product context and connection to cf-voice roadmap
2026-04-06 22:21:12 -07:00

411 lines
10 KiB
Markdown
Executable file

# Wake Word Quick Reference Card
## 🎯 TL;DR: What Should I Do?
### Recommendation for Your Setup
**Week 1:** Use pre-trained "Hey Mycroft"
```bash
./download_pretrained_models.sh --model hey-mycroft
precise-listen ~/precise-models/pretrained/hey-mycroft.net
```
**Week 2-3:** Fine-tune with all family members' voices
```bash
cd ~/precise-models/hey-mycroft-family
precise-train -e 30 custom.net . --from-checkpoint ../pretrained/hey-mycroft.net
```
**Week 4+:** Add speaker identification
```bash
pip install resemblyzer
python enroll_speaker.py --name Alan --duration 20
python enroll_speaker.py --name [Family] --duration 20
```
**Month 2+:** Add second wake word (Hey Jarvis for Plex?)
```bash
./download_pretrained_models.sh --model hey-jarvis
# Run both in parallel on server
```
---
## 📋 Pre-trained Models
### Available Models (Ready to Use!)
| Wake Word | Download | Best For |
|-----------|----------|----------|
| **Hey Mycroft** ⭐ | `--model hey-mycroft` | Default choice, most data |
| **Hey Jarvis** | `--model hey-jarvis` | Pop culture, media control |
| **Christopher** | `--model christopher` | Unique, less common |
| **Hey Ezra** | `--model hey-ezra` | Alternative option |
### Quick Download
```bash
# Download one
./download_pretrained_models.sh --model hey-mycroft
# Download all
./download_pretrained_models.sh --test-all
# Test immediately
precise-listen ~/precise-models/pretrained/hey-mycroft.net
```
---
## 🔢 Multiple Wake Words
### Option 1: Multiple Models (Server-Side) ⭐ RECOMMENDED
**What:** Run 2-3 different wake word models simultaneously
**Where:** Heimdall (server)
**Performance:** ~15-30% CPU for 3 models
```bash
# Start with multiple wake words
python voice_server.py \
--enable-precise \
--precise-models "\
hey-mycroft:~/models/hey-mycroft.net:0.5,\
hey-jarvis:~/models/hey-jarvis.net:0.5"
```
**Pros:**
- ✅ Can identify which wake word was used
- ✅ Different contexts (Mycroft=commands, Jarvis=media)
- ✅ Easy to add/remove wake words
- ✅ Each can have different sensitivity
**Cons:**
- ❌ Only works server-side (not on Maix Duino)
- ❌ Higher CPU usage (but still reasonable)
**Use When:**
- You want different wake words for different purposes
- Server has CPU to spare (yours does!)
- Want flexibility to add wake words later
### Option 2: Single Multi-Phrase Model (Edge-Compatible)
**What:** One model responds to multiple phrases
**Where:** Server OR Maix Duino
**Performance:** Same as single model
```bash
# Train on multiple phrases
cd ~/precise-models/multi-wake
# Record "Hey Mycroft" samples → wake-word/
# Record "Hey Computer" samples → wake-word/
# Record negatives → not-wake-word/
precise-train -e 60 multi-wake.net .
```
**Pros:**
- ✅ Single model = less compute
- ✅ Works on edge (K210)
- ✅ Simple deployment
**Cons:**
- ❌ Can't tell which wake word was used
- ❌ May reduce accuracy
- ❌ Higher false positive risk
**Use When:**
- Deploying to Maix Duino (edge)
- Want backup wake words
- Don't care which was used
---
## 👥 Multi-User Support
### Option 1: Inclusive Training ⭐ START HERE
**What:** One model, all voices
**How:** All family members record samples
```bash
cd ~/precise-models/family-wake
# Alice records 30 samples
# Bob records 30 samples
# You record 30 samples
precise-train -e 60 family-wake.net .
```
**Pros:**
- ✅ Everyone can use it
- ✅ Simple deployment
- ✅ Single model
**Cons:**
- ❌ Can't identify who spoke
- ❌ No personalization
**Use When:**
- Just getting started
- Don't need to know who spoke
- Want simplicity
### Option 2: Speaker Identification (Week 4+)
**What:** Detect wake word, then identify speaker
**How:** Voice embeddings (resemblyzer or pyannote)
```bash
# Install
pip install resemblyzer
# Enroll users
python enroll_speaker.py --name Alan --duration 20
python enroll_speaker.py --name Alice --duration 20
python enroll_speaker.py --name Bob --duration 20
# Server identifies speaker automatically
```
**Pros:**
- ✅ Personalized responses
- ✅ User-specific permissions
- ✅ Better privacy
- ✅ Track preferences
**Cons:**
- ❌ More complex
- ❌ Requires enrollment
- ❌ +100-200ms latency
- ❌ May fail with similar voices
**Use When:**
- Want personalization
- Need user-specific commands
- Ready for advanced features
### Option 3: Per-User Wake Words (Advanced)
**What:** Each person has their own wake word
**How:** Multiple models, one per person
```bash
# Alice: "Hey Mycroft"
# Bob: "Hey Jarvis"
# You: "Hey Computer"
# Run all 3 models in parallel
```
**Pros:**
- ✅ Automatic user ID
- ✅ Highest accuracy per user
- ✅ Clear separation
**Cons:**
- ❌ 3x models = 3x CPU
- ❌ Users must remember their word
- ❌ Server-only (not edge)
**Use When:**
- Need automatic user ID
- Have CPU to spare
- Users want their own wake word
---
## 🎯 Decision Tree
```
START: Want to use voice assistant
├─ Single user or don't care who spoke?
│ └─ Use: Inclusive Training (Option 1)
│ └─ Download: Hey Mycroft (pre-trained)
├─ Multiple users AND need to know who spoke?
│ └─ Use: Speaker Identification (Option 2)
│ └─ Start with: Hey Mycroft + resemblyzer
├─ Want different wake words for different purposes?
│ └─ Use: Multiple Models (Option 1)
│ └─ Download: Hey Mycroft + Hey Jarvis
└─ Deploying to Maix Duino (edge)?
└─ Use: Single Multi-Phrase Model (Option 2)
└─ Train: Custom model with 2-3 phrases
```
---
## 📊 Comparison Table
| Feature | Inclusive | Speaker ID | Per-User Wake | Multiple Wake |
|---------|-----------|------------|---------------|---------------|
| **Setup Time** | 2 hours | 4 hours | 6 hours | 3 hours |
| **Complexity** | ⭐ Easy | ⭐⭐⭐ Medium | ⭐⭐⭐⭐ Hard | ⭐⭐ Easy |
| **CPU Usage** | 5-10% | 10-15% | 15-30% | 15-30% |
| **Latency** | 100ms | 300ms | 100ms | 100ms |
| **User ID** | ❌ No | ✅ Yes | ✅ Yes | ❌ No |
| **Edge Deploy** | ✅ Yes | ⚠️ Maybe | ❌ No | ⚠️ Partial |
| **Personalize** | ❌ No | ✅ Yes | ✅ Yes | ⚠️ Partial |
---
## 🚀 Recommended Timeline
### Week 1: Get It Working
```bash
# Use pre-trained Hey Mycroft
./download_pretrained_models.sh --model hey-mycroft
# Test it
precise-listen ~/precise-models/pretrained/hey-mycroft.net
# Deploy to server
python voice_server.py --enable-precise \
--precise-model ~/precise-models/pretrained/hey-mycroft.net
```
### Week 2-3: Make It Yours
```bash
# Fine-tune with your family's voices
cd ~/precise-models/hey-mycroft-family
# Have everyone record 20-30 samples
precise-collect # Alice
precise-collect # Bob
precise-collect # You
# Train
precise-train -e 30 custom.net . \
--from-checkpoint ../pretrained/hey-mycroft.net
```
### Week 4+: Add Intelligence
```bash
# Speaker identification
pip install resemblyzer
python enroll_speaker.py --name Alan --duration 20
python enroll_speaker.py --name Alice --duration 20
# Now server knows who's speaking!
```
### Month 2+: Expand Features
```bash
# Add second wake word for media control
./download_pretrained_models.sh --model hey-jarvis
# Run both: Mycroft for commands, Jarvis for Plex
python voice_server.py --enable-precise \
--precise-models "mycroft:hey-mycroft.net:0.5,jarvis:hey-jarvis.net:0.5"
```
---
## 💡 Pro Tips
### Wake Word Selection
-**DO:** Choose clear, distinct wake words
-**DO:** Test in your environment
-**DON'T:** Use similar-sounding words
-**DON'T:** Use common phrases
### Training
-**DO:** Include all intended users
-**DO:** Record in various conditions
-**DO:** Add false positives to training
-**DON'T:** Rush the training process
### Deployment
-**DO:** Start simple (one wake word)
-**DO:** Test thoroughly before adding features
-**DO:** Monitor false positive rate
-**DON'T:** Deploy too many wake words at once
### Speaker ID
-**DO:** Use 20+ seconds for enrollment
-**DO:** Re-enroll if accuracy drops
-**DO:** Test threshold values
-**DON'T:** Expect 100% accuracy
---
## 🔧 Quick Commands
```bash
# Download pre-trained model
./download_pretrained_models.sh --model hey-mycroft
# Test model
precise-listen ~/precise-models/pretrained/hey-mycroft.net
# Fine-tune from pre-trained
precise-train -e 30 custom.net . \
--from-checkpoint ~/precise-models/pretrained/hey-mycroft.net
# Enroll speaker
python enroll_speaker.py --name Alan --duration 20
# Start with single wake word
python voice_server.py --enable-precise \
--precise-model hey-mycroft.net
# Start with multiple wake words
python voice_server.py --enable-precise \
--precise-models "mycroft:hey-mycroft.net:0.5,jarvis:hey-jarvis.net:0.5"
# Check status
curl http://10.1.10.71:5000/wake-word/status
# Monitor detections
curl http://10.1.10.71:5000/wake-word/detections
```
---
## 📚 See Also
- **Full guide:** [ADVANCED_WAKE_WORD_TOPICS.md](ADVANCED_WAKE_WORD_TOPICS.md)
- **Training:** [MYCROFT_PRECISE_GUIDE.md](MYCROFT_PRECISE_GUIDE.md)
- **Deployment:** [PRECISE_DEPLOYMENT.md](PRECISE_DEPLOYMENT.md)
- **Getting started:** [QUICKSTART.md](QUICKSTART.md)
---
## ❓ FAQ
**Q: Can I use "Hey Mycroft" right away?**
A: Yes! Download with `./download_pretrained_models.sh --model hey-mycroft`
**Q: How many wake words can I run at once?**
A: 2-3 comfortably on server. Maix Duino can handle 1.
**Q: Can I train my own custom wake word?**
A: Yes! See MYCROFT_PRECISE_GUIDE.md Phase 2.
**Q: Does speaker ID work with multiple wake words?**
A: Yes! Wake word detected → Speaker identified → Personalized response.
**Q: Can I use this on Maix Duino?**
A: Server-side (start here), then convert to KMODEL (advanced).
**Q: How accurate is speaker identification?**
A: 85-95% with good enrollment. Re-enroll if accuracy drops.
**Q: What if someone has a cold?**
A: May reduce accuracy temporarily. System should recover when voice returns to normal.
**Q: Can kids use it?**
A: Yes! Include their voices in training or enroll them separately.
---
**Quick Decision:** Start with pre-trained Hey Mycroft. Add features later!
```bash
./download_pretrained_models.sh --model hey-mycroft
precise-listen ~/precise-models/pretrained/hey-mycroft.net
# It just works! ✨
```