minerva/docs/QUESTIONS_ANSWERED.md
pyr0ball 173f7f37d4 feat: import mycroft-precise work as Minerva foundation
Ports prior voice assistant research and prototypes from devl/Devops
into the Minerva repo. Includes:

- docs/: architecture, wake word guides, ESP32-S3 spec, hardware buying guide
- scripts/: voice_server.py, voice_server_enhanced.py, setup scripts
- hardware/maixduino/: edge device scripts with WiFi credentials scrubbed
  (replaced hardcoded password with secrets.py pattern)
- config/.env.example: server config template
- .gitignore: excludes .env, secrets.py, model blobs, ELF firmware
- CLAUDE.md: Minerva product context and connection to cf-voice roadmap
2026-04-06 22:21:12 -07:00

10 KiB
Executable file

Your Questions Answered - Quick Reference

TL;DR: Yes, Yes, and Multiple Options!

Q1: Pre-trained "Hey Mycroft" Model?

Answer: YES!

Download and use immediately:

./quick_start_hey_mycroft.sh
# Done in 5 minutes - no training!

The pre-trained model works great and saves you 1-2 hours of training time.

Q2: Multiple Wake Words?

Answer: YES! (with considerations)

Server-side (Heimdall): Easy, run 3-5 wake words

python voice_server_enhanced.py \
    --enable-precise \
    --multi-wake-word

Edge (K210): Feasible for 1-2, challenging for 3+

Q3: Adopting New Users' Voices?

Answer: Multiple approaches

Best option: Train one model with everyone's voices upfront Alternative: Incremental retraining as new users join Advanced: Speaker identification with personalization


Detailed Answers

1. Pre-trained "Hey Mycroft" Model

Where to Get It

# Quick start script does this for you
wget https://github.com/MycroftAI/precise-data/raw/models-dev/hey-mycroft.tar.gz
tar xzf hey-mycroft.tar.gz

How to Use

Instant deployment:

python voice_server.py \
    --enable-precise \
    --precise-model ~/precise-models/pretrained/hey-mycroft.net

Fine-tune with your voice:

# Record 20-30 samples of your voice saying "Hey Mycroft"
precise-collect

# Fine-tune from pre-trained
precise-train -e 30 my-hey-mycroft.net . \
    --from-checkpoint ~/precise-models/pretrained/hey-mycroft.net

Advantages

Zero training time - Works immediately
Proven accuracy - Tested by thousands
Good baseline - Already includes diverse voices
Easy fine-tuning - Add your voice in 30 mins vs 60+ mins from scratch

When to Use Pre-trained vs Custom

Use Pre-trained "Hey Mycroft" when:

  • You want to test quickly
  • "Hey Mycroft" is an acceptable wake word
  • You want proven accuracy out-of-box

Train Custom when:

  • You want a different wake word ("Hey Computer", "Jarvis", etc.)
  • Maximum accuracy for your specific environment
  • Family-specific wake word

Hybrid (Recommended):

  • Start with pre-trained "Hey Mycroft"
  • Test and learn the system
  • Fine-tune with your samples
  • Or add custom wake word later

2. Multiple Wake Words

Can You Have Multiple?

Yes! Options:

Easy implementation:

# Use the enhanced server
python voice_server_enhanced.py \
    --enable-precise \
    --multi-wake-word

Configured wake words:

  • "Hey Mycroft" (pre-trained)
  • "Hey Computer" (custom)
  • "Jarvis" (custom)

Resource impact:

  • 3 models = ~15-30% CPU (Heimdall handles easily)
  • ~300-600MB RAM
  • Each model runs independently

Example use cases:

"Hey Mycroft, what's the time?"  General assistant
"Jarvis, run diagnostics"         Personal assistant mode
"Emergency, call help"            Priority/emergency mode

Option B: Edge (K210)

Feasible for 1-2 wake words:

# Sequential checking
for model in ['hey-mycroft.kmodel', 'emergency.kmodel']:
    if detect_wake_word(model):
        return model

Limitations:

  • +50-100ms latency per additional model
  • Memory constraints (6MB total for all models)
  • More models = more power consumption

Recommendation:

  • K210: 1 wake word (optimal)
  • K210: 2 wake words (acceptable)
  • K210: 3+ wake words (not recommended)

Option C: Contextual Wake Words

Different wake words for different purposes:

wake_word_contexts = {
    'hey_mycroft': 'general_assistant',
    'emergency': 'priority_emergency',
    'goodnight': 'bedtime_routine',
}

Should You Use Multiple?

One wake word is usually enough!

Commercial products (Alexa, Google) use one wake word and they work fine.

Use multiple when:

  • Different family members want different wake words
  • You want context-specific behaviors (emergency vs. general)
  • You enjoy the flexibility

Start with one, add more later if needed.


3. Adopting New Users' Voices

Challenge

Same wake word, different voices:

  • Mom says "Hey Mycroft" (soprano)
  • Dad says "Hey Mycroft" (bass)
  • Kids say "Hey Mycroft" (high-pitched)

All need to work!

During initial training, have everyone record samples:

cd ~/precise-models/family-hey-mycroft

# Session 1: Mom records 30 samples
precise-collect  # Mom speaks "Hey Mycroft" 30 times

# Session 2: Dad records 30 samples  
precise-collect  # Dad speaks "Hey Mycroft" 30 times

# Session 3: Kids record 20 samples each
precise-collect  # Kids speak "Hey Mycroft" 40 times total

# Train one model with all voices
precise-train -e 60 family-hey-mycroft.net .

# Deploy
python voice_server.py \
    --enable-precise \
    --precise-model family-hey-mycroft.net

Pros: One model works for everyone
Simple deployment
No switching needed
Works from day one

Cons: Need everyone's time upfront
Slightly lower per-person accuracy than individual models

Solution 2: Incremental Training

Start with one person, add others over time:

# Week 1: Train with Dad's voice
precise-train -e 60 hey-mycroft.net .

# Week 2: Mom wants to use it
# Collect Mom's samples
precise-collect  # Mom records 20-30 samples

# Add to training set
cp mom-samples/* wake-word/

# Retrain from checkpoint (faster!)
precise-train -e 30 hey-mycroft.net . \
    --from-checkpoint hey-mycroft.net

# Now works for both Dad and Mom!

# Week 3: Kids want in
# Repeat process...

Pros: Don't need everyone upfront
Easy to add new users
Model improves gradually

Cons: New users may have issues initially
Requires periodic retraining

Solution 3: Speaker Identification (Advanced)

Identify who's speaking, use personalized model/settings:

# Install speaker ID
pip install pyannote.audio scipy --break-system-packages

# Use enhanced server
python voice_server_enhanced.py \
    --enable-precise \
    --enable-speaker-id \
    --hf-token YOUR_HF_TOKEN

Enroll users:

# Record 30-second voice sample from each person
# POST to /speakers/enroll with audio + name

curl -F "name=alan" \
     -F "audio=@alan_voice.wav" \
     http://localhost:5000/speakers/enroll

curl -F "name=sarah" \
     -F "audio=@sarah_voice.wav" \
     http://localhost:5000/speakers/enroll

Benefits:

# Different responses per user
if speaker == 'alan':
    turn_on('light.alan_office')
elif speaker == 'sarah':
    turn_on('light.sarah_office')

# Different permissions
if speaker == 'kids' and command.startswith('buy'):
    return "Sorry, kids can't make purchases"

Pros: Personalized responses
User-specific settings
Better accuracy (optimized per voice)
Can track who said what

Cons: More complex
Privacy considerations
Additional CPU/RAM (~10% + 200MB)
Requires voice enrollment

Solution 4: Pre-trained Model (Easiest)

"Hey Mycroft" already includes diverse voices!

# Just use it - already trained on many voices
./quick_start_hey_mycroft.sh

The community model was trained with:

  • Male and female voices
  • Different accents
  • Different ages
  • Various environments

It should work for most family members out-of-box!

Then fine-tune if needed.


Scenario: Family of 3-4 People

Week 1: Quick Start

# Use pre-trained "Hey Mycroft"
./quick_start_hey_mycroft.sh

# Test with all family members
# Likely works for everyone already!

Week 2: Fine-tune if Needed

# If someone has issues:
# Have them record 20 samples
# Fine-tune the model

precise-train -e 30 family-hey-mycroft.net . \
    --from-checkpoint ~/precise-models/pretrained/hey-mycroft.net

Week 3: Add Features

# If you want personalization:
python voice_server_enhanced.py \
    --enable-speaker-id

# Enroll each family member

Scenario: Just You (or 1-2 People)

Option 1: Pre-trained

./quick_start_hey_mycroft.sh
# Done!

Option 2: Custom Wake Word

# Train custom "Hey Computer"
cd ~/precise-models/hey-computer
./1-record-wake-word.sh  # 50 samples
./2-record-not-wake-word.sh  # 200 samples
./3-train-model.sh

Scenario: Multiple People + Multiple Wake Words

Full setup:

# Pre-trained for family
./quick_start_hey_mycroft.sh

# Personal wake word for Dad
cd ~/precise-models/jarvis
# Train custom wake word

# Emergency wake word
cd ~/precise-models/emergency
# Train emergency wake word

# Run multi-wake-word server
python voice_server_enhanced.py \
    --enable-precise \
    --multi-wake-word \
    --enable-speaker-id

Quick Decision Matrix

Your Situation Recommendation
Just getting started Pre-trained "Hey Mycroft"
Want different wake word Train custom model
Family of 3-4 Pre-trained + fine-tune if needed
Want personalization Add speaker ID
Multiple purposes Multiple wake words (server-side)
Deploying to K210 1 wake word, no speaker ID

Files to Use

Quick start with pre-trained:

  • quick_start_hey_mycroft.sh - Zero training, 5 minutes!

Multiple wake words:

  • voice_server_enhanced.py - Multi-wake-word + speaker ID support

Training custom:

  • setup_precise.sh - Setup training environment
  • Scripts in ~/precise-models/your-wake-word/

Documentation:

  • WAKE_WORD_ADVANCED.md - Detailed guide (this is comprehensive!)
  • PRECISE_DEPLOYMENT.md - Production deployment

Summary

Yes, pre-trained "Hey Mycroft" exists and works great
Yes, you can have multiple wake words (server-side is easy)
Yes, multiple approaches for multi-user support

Recommended approach:

  1. Start with ./quick_start_hey_mycroft.sh (5 mins)
  2. Test with all family members
  3. Fine-tune if anyone has issues
  4. Add speaker ID later if you want personalization
  5. Consider multiple wake words only if you have specific use cases

Keep it simple! One pre-trained wake word works for most people.


Next Actions

Ready to start?

# 5-minute quick start
./quick_start_hey_mycroft.sh

# Or read more first
cat WAKE_WORD_ADVANCED.md

Questions?

  • Pre-trained models: See WAKE_WORD_ADVANCED.md § Pre-trained
  • Multiple wake words: See WAKE_WORD_ADVANCED.md § Multiple Wake Words
  • Voice adaptation: See WAKE_WORD_ADVANCED.md § Voice Adaptation

Happy voice assisting! 🎙️