minerva/docs/QUESTIONS_ANSWERED.md
pyr0ball 173f7f37d4 feat: import mycroft-precise work as Minerva foundation
Ports prior voice assistant research and prototypes from devl/Devops
into the Minerva repo. Includes:

- docs/: architecture, wake word guides, ESP32-S3 spec, hardware buying guide
- scripts/: voice_server.py, voice_server_enhanced.py, setup scripts
- hardware/maixduino/: edge device scripts with WiFi credentials scrubbed
  (replaced hardcoded password with secrets.py pattern)
- config/.env.example: server config template
- .gitignore: excludes .env, secrets.py, model blobs, ELF firmware
- CLAUDE.md: Minerva product context and connection to cf-voice roadmap
2026-04-06 22:21:12 -07:00

470 lines
10 KiB
Markdown
Executable file

# Your Questions Answered - Quick Reference
## TL;DR: Yes, Yes, and Multiple Options!
### Q1: Pre-trained "Hey Mycroft" Model?
**Answer: YES! ✅**
Download and use immediately:
```bash
./quick_start_hey_mycroft.sh
# Done in 5 minutes - no training!
```
The pre-trained model works great and saves you 1-2 hours of training time.
### Q2: Multiple Wake Words?
**Answer: YES! ✅ (with considerations)**
**Server-side (Heimdall):** Easy, run 3-5 wake words
```bash
python voice_server_enhanced.py \
--enable-precise \
--multi-wake-word
```
**Edge (K210):** Feasible for 1-2, challenging for 3+
### Q3: Adopting New Users' Voices?
**Answer: Multiple approaches ✅**
**Best option:** Train one model with everyone's voices upfront
**Alternative:** Incremental retraining as new users join
**Advanced:** Speaker identification with personalization
---
## Detailed Answers
### 1. Pre-trained "Hey Mycroft" Model
#### Where to Get It
```bash
# Quick start script does this for you
wget https://github.com/MycroftAI/precise-data/raw/models-dev/hey-mycroft.tar.gz
tar xzf hey-mycroft.tar.gz
```
#### How to Use
**Instant deployment:**
```bash
python voice_server.py \
--enable-precise \
--precise-model ~/precise-models/pretrained/hey-mycroft.net
```
**Fine-tune with your voice:**
```bash
# Record 20-30 samples of your voice saying "Hey Mycroft"
precise-collect
# Fine-tune from pre-trained
precise-train -e 30 my-hey-mycroft.net . \
--from-checkpoint ~/precise-models/pretrained/hey-mycroft.net
```
#### Advantages
**Zero training time** - Works immediately
**Proven accuracy** - Tested by thousands
**Good baseline** - Already includes diverse voices
**Easy fine-tuning** - Add your voice in 30 mins vs 60+ mins from scratch
#### When to Use Pre-trained vs Custom
**Use Pre-trained "Hey Mycroft" when:**
- You want to test quickly
- "Hey Mycroft" is an acceptable wake word
- You want proven accuracy out-of-box
**Train Custom when:**
- You want a different wake word ("Hey Computer", "Jarvis", etc.)
- Maximum accuracy for your specific environment
- Family-specific wake word
**Hybrid (Recommended):**
- Start with pre-trained "Hey Mycroft"
- Test and learn the system
- Fine-tune with your samples
- Or add custom wake word later
---
### 2. Multiple Wake Words
#### Can You Have Multiple?
**Yes!** Options:
#### Option A: Server-Side (Recommended)
**Easy implementation:**
```bash
# Use the enhanced server
python voice_server_enhanced.py \
--enable-precise \
--multi-wake-word
```
**Configured wake words:**
- "Hey Mycroft" (pre-trained)
- "Hey Computer" (custom)
- "Jarvis" (custom)
**Resource impact:**
- 3 models = ~15-30% CPU (Heimdall handles easily)
- ~300-600MB RAM
- Each model runs independently
**Example use cases:**
```python
"Hey Mycroft, what's the time?" General assistant
"Jarvis, run diagnostics" Personal assistant mode
"Emergency, call help" Priority/emergency mode
```
#### Option B: Edge (K210)
**Feasible for 1-2 wake words:**
```python
# Sequential checking
for model in ['hey-mycroft.kmodel', 'emergency.kmodel']:
if detect_wake_word(model):
return model
```
**Limitations:**
- +50-100ms latency per additional model
- Memory constraints (6MB total for all models)
- More models = more power consumption
**Recommendation:**
- K210: 1 wake word (optimal)
- K210: 2 wake words (acceptable)
- K210: 3+ wake words (not recommended)
#### Option C: Contextual Wake Words
Different wake words for different purposes:
```python
wake_word_contexts = {
'hey_mycroft': 'general_assistant',
'emergency': 'priority_emergency',
'goodnight': 'bedtime_routine',
}
```
#### Should You Use Multiple?
**One wake word is usually enough!**
Commercial products (Alexa, Google) use one wake word and they work fine.
**Use multiple when:**
- Different family members want different wake words
- You want context-specific behaviors (emergency vs. general)
- You enjoy the flexibility
**Start with one, add more later if needed.**
---
### 3. Adopting New Users' Voices
#### Challenge
Same wake word, different voices:
- Mom says "Hey Mycroft" (soprano)
- Dad says "Hey Mycroft" (bass)
- Kids say "Hey Mycroft" (high-pitched)
All need to work!
#### Solution 1: Diverse Training (Recommended)
**During initial training, have everyone record samples:**
```bash
cd ~/precise-models/family-hey-mycroft
# Session 1: Mom records 30 samples
precise-collect # Mom speaks "Hey Mycroft" 30 times
# Session 2: Dad records 30 samples
precise-collect # Dad speaks "Hey Mycroft" 30 times
# Session 3: Kids record 20 samples each
precise-collect # Kids speak "Hey Mycroft" 40 times total
# Train one model with all voices
precise-train -e 60 family-hey-mycroft.net .
# Deploy
python voice_server.py \
--enable-precise \
--precise-model family-hey-mycroft.net
```
**Pros:**
✅ One model works for everyone
✅ Simple deployment
✅ No switching needed
✅ Works from day one
**Cons:**
❌ Need everyone's time upfront
❌ Slightly lower per-person accuracy than individual models
#### Solution 2: Incremental Training
**Start with one person, add others over time:**
```bash
# Week 1: Train with Dad's voice
precise-train -e 60 hey-mycroft.net .
# Week 2: Mom wants to use it
# Collect Mom's samples
precise-collect # Mom records 20-30 samples
# Add to training set
cp mom-samples/* wake-word/
# Retrain from checkpoint (faster!)
precise-train -e 30 hey-mycroft.net . \
--from-checkpoint hey-mycroft.net
# Now works for both Dad and Mom!
# Week 3: Kids want in
# Repeat process...
```
**Pros:**
✅ Don't need everyone upfront
✅ Easy to add new users
✅ Model improves gradually
**Cons:**
❌ New users may have issues initially
❌ Requires periodic retraining
#### Solution 3: Speaker Identification (Advanced)
**Identify who's speaking, use personalized model/settings:**
```bash
# Install speaker ID
pip install pyannote.audio scipy --break-system-packages
# Use enhanced server
python voice_server_enhanced.py \
--enable-precise \
--enable-speaker-id \
--hf-token YOUR_HF_TOKEN
```
**Enroll users:**
```bash
# Record 30-second voice sample from each person
# POST to /speakers/enroll with audio + name
curl -F "name=alan" \
-F "audio=@alan_voice.wav" \
http://localhost:5000/speakers/enroll
curl -F "name=sarah" \
-F "audio=@sarah_voice.wav" \
http://localhost:5000/speakers/enroll
```
**Benefits:**
```python
# Different responses per user
if speaker == 'alan':
turn_on('light.alan_office')
elif speaker == 'sarah':
turn_on('light.sarah_office')
# Different permissions
if speaker == 'kids' and command.startswith('buy'):
return "Sorry, kids can't make purchases"
```
**Pros:**
✅ Personalized responses
✅ User-specific settings
✅ Better accuracy (optimized per voice)
✅ Can track who said what
**Cons:**
❌ More complex
❌ Privacy considerations
❌ Additional CPU/RAM (~10% + 200MB)
❌ Requires voice enrollment
#### Solution 4: Pre-trained Model (Easiest)
**"Hey Mycroft" already includes diverse voices!**
```bash
# Just use it - already trained on many voices
./quick_start_hey_mycroft.sh
```
The community model was trained with:
- Male and female voices
- Different accents
- Different ages
- Various environments
**It should work for most family members out-of-box!**
Then fine-tune if needed.
---
## Recommended Path for Your Situation
### Scenario: Family of 3-4 People
**Week 1: Quick Start**
```bash
# Use pre-trained "Hey Mycroft"
./quick_start_hey_mycroft.sh
# Test with all family members
# Likely works for everyone already!
```
**Week 2: Fine-tune if Needed**
```bash
# If someone has issues:
# Have them record 20 samples
# Fine-tune the model
precise-train -e 30 family-hey-mycroft.net . \
--from-checkpoint ~/precise-models/pretrained/hey-mycroft.net
```
**Week 3: Add Features**
```bash
# If you want personalization:
python voice_server_enhanced.py \
--enable-speaker-id
# Enroll each family member
```
### Scenario: Just You (or 1-2 People)
**Option 1: Pre-trained**
```bash
./quick_start_hey_mycroft.sh
# Done!
```
**Option 2: Custom Wake Word**
```bash
# Train custom "Hey Computer"
cd ~/precise-models/hey-computer
./1-record-wake-word.sh # 50 samples
./2-record-not-wake-word.sh # 200 samples
./3-train-model.sh
```
### Scenario: Multiple People + Multiple Wake Words
**Full setup:**
```bash
# Pre-trained for family
./quick_start_hey_mycroft.sh
# Personal wake word for Dad
cd ~/precise-models/jarvis
# Train custom wake word
# Emergency wake word
cd ~/precise-models/emergency
# Train emergency wake word
# Run multi-wake-word server
python voice_server_enhanced.py \
--enable-precise \
--multi-wake-word \
--enable-speaker-id
```
---
## Quick Decision Matrix
| Your Situation | Recommendation |
|----------------|----------------|
| **Just getting started** | Pre-trained "Hey Mycroft" |
| **Want different wake word** | Train custom model |
| **Family of 3-4** | Pre-trained + fine-tune if needed |
| **Want personalization** | Add speaker ID |
| **Multiple purposes** | Multiple wake words (server-side) |
| **Deploying to K210** | 1 wake word, no speaker ID |
---
## Files to Use
**Quick start with pre-trained:**
- `quick_start_hey_mycroft.sh` - Zero training, 5 minutes!
**Multiple wake words:**
- `voice_server_enhanced.py` - Multi-wake-word + speaker ID support
**Training custom:**
- `setup_precise.sh` - Setup training environment
- Scripts in `~/precise-models/your-wake-word/`
**Documentation:**
- `WAKE_WORD_ADVANCED.md` - Detailed guide (this is comprehensive!)
- `PRECISE_DEPLOYMENT.md` - Production deployment
---
## Summary
**Yes**, pre-trained "Hey Mycroft" exists and works great
**Yes**, you can have multiple wake words (server-side is easy)
**Yes**, multiple approaches for multi-user support
**Recommended approach:**
1. Start with `./quick_start_hey_mycroft.sh` (5 mins)
2. Test with all family members
3. Fine-tune if anyone has issues
4. Add speaker ID later if you want personalization
5. Consider multiple wake words only if you have specific use cases
**Keep it simple!** One pre-trained wake word works for most people.
---
## Next Actions
**Ready to start?**
```bash
# 5-minute quick start
./quick_start_hey_mycroft.sh
# Or read more first
cat WAKE_WORD_ADVANCED.md
```
**Questions?**
- Pre-trained models: See WAKE_WORD_ADVANCED.md § Pre-trained
- Multiple wake words: See WAKE_WORD_ADVANCED.md § Multiple Wake Words
- Voice adaptation: See WAKE_WORD_ADVANCED.md § Voice Adaptation
**Happy voice assisting! 🎙️**