minerva/docs/WAKE_WORD_QUICK_REF.md

# Wake Word Quick Reference Card

## 🎯 TL;DR: What Should I Do?

### Recommendation for Your Setup

**Week 1:** Use pre-trained "Hey Mycroft"
```bash
./download_pretrained_models.sh --model hey-mycroft
precise-listen ~/precise-models/pretrained/hey-mycroft.net
```

**Week 2-3:** Fine-tune with all family members' voices
```bash
cd ~/precise-models/hey-mycroft-family
precise-train -e 30 custom.net . --from-checkpoint ../pretrained/hey-mycroft.net
```

**Week 4+:** Add speaker identification
```bash
pip install resemblyzer
python enroll_speaker.py --name Alan --duration 20
python enroll_speaker.py --name [Family] --duration 20
```

**Month 2+:** Add second wake word (Hey Jarvis for Plex?)
```bash
./download_pretrained_models.sh --model hey-jarvis
# Run both in parallel on server
```

---

## 📋 Pre-trained Models

### Available Models (Ready to Use!)

| Wake Word | Download | Best For |
|-----------|----------|----------|
| **Hey Mycroft** ⭐ | `--model hey-mycroft` | Default choice, most data |
| **Hey Jarvis** | `--model hey-jarvis` | Pop culture, media control |
| **Christopher** | `--model christopher` | Unique, less common |
| **Hey Ezra** | `--model hey-ezra` | Alternative option |

### Quick Download

```bash
# Download one
./download_pretrained_models.sh --model hey-mycroft

# Download all
./download_pretrained_models.sh --test-all

# Test immediately
precise-listen ~/precise-models/pretrained/hey-mycroft.net
```

---

## 🔢 Multiple Wake Words

### Option 1: Multiple Models (Server-Side) ⭐ RECOMMENDED

**What:** Run 2-3 different wake word models simultaneously
**Where:** Heimdall (server)
**Performance:** ~15-30% CPU for 3 models

```bash
# Start with multiple wake words
python voice_server.py \
    --enable-precise \
    --precise-models "\
hey-mycroft:~/models/hey-mycroft.net:0.5,\
hey-jarvis:~/models/hey-jarvis.net:0.5"
```

**Pros:**
- ✅ Can identify which wake word was used
- ✅ Different contexts (Mycroft=commands, Jarvis=media)
- ✅ Easy to add/remove wake words
- ✅ Each can have different sensitivity

**Cons:**
- ❌ Only works server-side (not on Maix Duino)
- ❌ Higher CPU usage (but still reasonable)

**Use When:**
- You want different wake words for different purposes
- Server has CPU to spare (yours does!)
- Want flexibility to add wake words later

### Option 2: Single Multi-Phrase Model (Edge-Compatible)

**What:** One model responds to multiple phrases
**Where:** Server OR Maix Duino
**Performance:** Same as single model

```bash
# Train on multiple phrases
cd ~/precise-models/multi-wake
# Record "Hey Mycroft" samples → wake-word/
# Record "Hey Computer" samples → wake-word/
# Record negatives → not-wake-word/
precise-train -e 60 multi-wake.net .
```

**Pros:**
- ✅ Single model = less compute
- ✅ Works on edge (K210)
- ✅ Simple deployment

**Cons:**
- ❌ Can't tell which wake word was used
- ❌ May reduce accuracy
- ❌ Higher false positive risk

**Use When:**
- Deploying to Maix Duino (edge)
- Want backup wake words
- Don't care which was used

---

## 👥 Multi-User Support

### Option 1: Inclusive Training ⭐ START HERE

**What:** One model, all voices
**How:** All family members record samples

```bash
cd ~/precise-models/family-wake
# Alice records 30 samples
# Bob records 30 samples
# You record 30 samples
precise-train -e 60 family-wake.net .
```

**Pros:**
- ✅ Everyone can use it
- ✅ Simple deployment
- ✅ Single model

**Cons:**
- ❌ Can't identify who spoke
- ❌ No personalization

**Use When:**
- Just getting started
- Don't need to know who spoke
- Want simplicity

### Option 2: Speaker Identification (Week 4+)

**What:** Detect wake word, then identify speaker
**How:** Voice embeddings (resemblyzer or pyannote)

```bash
# Install
pip install resemblyzer

# Enroll users
python enroll_speaker.py --name Alan --duration 20
python enroll_speaker.py --name Alice --duration 20
python enroll_speaker.py --name Bob --duration 20

# Server identifies speaker automatically
```

**Pros:**
- ✅ Personalized responses
- ✅ User-specific permissions
- ✅ Better privacy
- ✅ Track preferences

**Cons:**
- ❌ More complex
- ❌ Requires enrollment
- ❌ +100-200ms latency
- ❌ May fail with similar voices

**Use When:**
- Want personalization
- Need user-specific commands
- Ready for advanced features

### Option 3: Per-User Wake Words (Advanced)

**What:** Each person has their own wake word
**How:** Multiple models, one per person

```bash
# Alice: "Hey Mycroft"
# Bob: "Hey Jarvis"
# You: "Hey Computer"

# Run all 3 models in parallel
```

**Pros:**
- ✅ Automatic user ID
- ✅ Highest accuracy per user
- ✅ Clear separation

**Cons:**
- ❌ 3x models = 3x CPU
- ❌ Users must remember their word
- ❌ Server-only (not edge)

**Use When:**
- Need automatic user ID
- Have CPU to spare
- Users want their own wake word

---

## 🎯 Decision Tree

```
START: Want to use voice assistant
  │
  ├─ Single user or don't care who spoke?
  │   └─ Use: Inclusive Training (Option 1)
  │       └─ Download: Hey Mycroft (pre-trained)
  │
  ├─ Multiple users AND need to know who spoke?
  │   └─ Use: Speaker Identification (Option 2)
  │       └─ Start with: Hey Mycroft + resemblyzer
  │
  ├─ Want different wake words for different purposes?
  │   └─ Use: Multiple Models (Option 1)
  │       └─ Download: Hey Mycroft + Hey Jarvis
  │
  └─ Deploying to Maix Duino (edge)?
      └─ Use: Single Multi-Phrase Model (Option 2)
          └─ Train: Custom model with 2-3 phrases
```

---

## 📊 Comparison Table

| Feature | Inclusive | Speaker ID | Per-User Wake | Multiple Wake |
|---------|-----------|------------|---------------|---------------|
| **Setup Time** | 2 hours | 4 hours | 6 hours | 3 hours |
| **Complexity** | ⭐ Easy | ⭐⭐⭐ Medium | ⭐⭐⭐⭐ Hard | ⭐⭐ Easy |
| **CPU Usage** | 5-10% | 10-15% | 15-30% | 15-30% |
| **Latency** | 100ms | 300ms | 100ms | 100ms |
| **User ID** | ❌ No | ✅ Yes | ✅ Yes | ❌ No |
| **Edge Deploy** | ✅ Yes | ⚠️ Maybe | ❌ No | ⚠️ Partial |
| **Personalize** | ❌ No | ✅ Yes | ✅ Yes | ⚠️ Partial |

---

## 🚀 Recommended Timeline

### Week 1: Get It Working
```bash
# Use pre-trained Hey Mycroft
./download_pretrained_models.sh --model hey-mycroft

# Test it
precise-listen ~/precise-models/pretrained/hey-mycroft.net

# Deploy to server
python voice_server.py --enable-precise \
    --precise-model ~/precise-models/pretrained/hey-mycroft.net
```

### Week 2-3: Make It Yours
```bash
# Fine-tune with your family's voices
cd ~/precise-models/hey-mycroft-family

# Have everyone record 20-30 samples
precise-collect  # Alice
precise-collect  # Bob
precise-collect  # You

# Train
precise-train -e 30 custom.net . \
    --from-checkpoint ../pretrained/hey-mycroft.net
```

### Week 4+: Add Intelligence
```bash
# Speaker identification
pip install resemblyzer
python enroll_speaker.py --name Alan --duration 20
python enroll_speaker.py --name Alice --duration 20

# Now server knows who's speaking!
```

### Month 2+: Expand Features
```bash
# Add second wake word for media control
./download_pretrained_models.sh --model hey-jarvis

# Run both: Mycroft for commands, Jarvis for Plex
python voice_server.py --enable-precise \
    --precise-models "mycroft:hey-mycroft.net:0.5,jarvis:hey-jarvis.net:0.5"
```

---

## 💡 Pro Tips

### Wake Word Selection
- ✅ **DO:** Choose clear, distinct wake words
- ✅ **DO:** Test in your environment
- ❌ **DON'T:** Use similar-sounding words
- ❌ **DON'T:** Use common phrases

### Training
- ✅ **DO:** Include all intended users
- ✅ **DO:** Record in various conditions
- ✅ **DO:** Add false positives to training
- ❌ **DON'T:** Rush the training process

### Deployment
- ✅ **DO:** Start simple (one wake word)
- ✅ **DO:** Test thoroughly before adding features
- ✅ **DO:** Monitor false positive rate
- ❌ **DON'T:** Deploy too many wake words at once

### Speaker ID
- ✅ **DO:** Use 20+ seconds for enrollment
- ✅ **DO:** Re-enroll if accuracy drops
- ✅ **DO:** Test threshold values
- ❌ **DON'T:** Expect 100% accuracy

---

## 🔧 Quick Commands

```bash
# Download pre-trained model
./download_pretrained_models.sh --model hey-mycroft

# Test model
precise-listen ~/precise-models/pretrained/hey-mycroft.net

# Fine-tune from pre-trained
precise-train -e 30 custom.net . \
    --from-checkpoint ~/precise-models/pretrained/hey-mycroft.net

# Enroll speaker
python enroll_speaker.py --name Alan --duration 20

# Start with single wake word
python voice_server.py --enable-precise \
    --precise-model hey-mycroft.net

# Start with multiple wake words
python voice_server.py --enable-precise \
    --precise-models "mycroft:hey-mycroft.net:0.5,jarvis:hey-jarvis.net:0.5"

# Check status
curl http://10.1.10.71:5000/wake-word/status

# Monitor detections
curl http://10.1.10.71:5000/wake-word/detections
```

---

## 📚 See Also

- **Full guide:** [ADVANCED_WAKE_WORD_TOPICS.md](ADVANCED_WAKE_WORD_TOPICS.md)
- **Training:** [MYCROFT_PRECISE_GUIDE.md](MYCROFT_PRECISE_GUIDE.md)
- **Deployment:** [PRECISE_DEPLOYMENT.md](PRECISE_DEPLOYMENT.md)
- **Getting started:** [QUICKSTART.md](QUICKSTART.md)

---

## ❓ FAQ

**Q: Can I use "Hey Mycroft" right away?**
A: Yes! Download with `./download_pretrained_models.sh --model hey-mycroft`

**Q: How many wake words can I run at once?**
A: 2-3 comfortably on server. Maix Duino can handle 1.

**Q: Can I train my own custom wake word?**
A: Yes! See MYCROFT_PRECISE_GUIDE.md Phase 2.

**Q: Does speaker ID work with multiple wake words?**
A: Yes! Wake word detected → Speaker identified → Personalized response.

**Q: Can I use this on Maix Duino?**
A: Server-side (start here), then convert to KMODEL (advanced).

**Q: How accurate is speaker identification?**
A: 85-95% with good enrollment. Re-enroll if accuracy drops.

**Q: What if someone has a cold?**
A: May reduce accuracy temporarily. System should recover when voice returns to normal.

**Q: Can kids use it?**
A: Yes! Include their voices in training or enroll them separately.

---

**Quick Decision:** Start with pre-trained Hey Mycroft. Add features later!

```bash
./download_pretrained_models.sh --model hey-mycroft
precise-listen ~/precise-models/pretrained/hey-mycroft.net
# It just works! ✨
```