pyr0ball 173f7f37d4 feat: import mycroft-precise work as Minerva foundation

Ports prior voice assistant research and prototypes from devl/Devops
into the Minerva repo. Includes:

- docs/: architecture, wake word guides, ESP32-S3 spec, hardware buying guide
- scripts/: voice_server.py, voice_server_enhanced.py, setup scripts
- hardware/maixduino/: edge device scripts with WiFi credentials scrubbed
  (replaced hardcoded password with secrets.py pattern)
- config/.env.example: server config template
- .gitignore: excludes .env, secrets.py, model blobs, ELF firmware
- CLAUDE.md: Minerva product context and connection to cf-voice roadmap

2026-04-06 22:21:12 -07:00

10 KiB

Executable file

Raw Permalink Blame History

Your Questions Answered - Quick Reference

TL;DR: Yes, Yes, and Multiple Options!

Q1: Pre-trained "Hey Mycroft" Model?

Answer: YES! ✅

Download and use immediately:

./quick_start_hey_mycroft.sh
# Done in 5 minutes - no training!

The pre-trained model works great and saves you 1-2 hours of training time.

Q2: Multiple Wake Words?

Answer: YES! ✅ (with considerations)

Server-side (Heimdall): Easy, run 3-5 wake words

python voice_server_enhanced.py \
    --enable-precise \
    --multi-wake-word

Edge (K210): Feasible for 1-2, challenging for 3+

Q3: Adopting New Users' Voices?

Answer: Multiple approaches ✅

Best option: Train one model with everyone's voices upfront Alternative: Incremental retraining as new users join Advanced: Speaker identification with personalization

Detailed Answers

1. Pre-trained "Hey Mycroft" Model

Where to Get It

# Quick start script does this for you
wget https://github.com/MycroftAI/precise-data/raw/models-dev/hey-mycroft.tar.gz
tar xzf hey-mycroft.tar.gz

How to Use

Instant deployment:

python voice_server.py \
    --enable-precise \
    --precise-model ~/precise-models/pretrained/hey-mycroft.net

Fine-tune with your voice:

# Record 20-30 samples of your voice saying "Hey Mycroft"
precise-collect

# Fine-tune from pre-trained
precise-train -e 30 my-hey-mycroft.net . \
    --from-checkpoint ~/precise-models/pretrained/hey-mycroft.net

Advantages

✅ Zero training time - Works immediately
✅ Proven accuracy - Tested by thousands
✅ Good baseline - Already includes diverse voices
✅ Easy fine-tuning - Add your voice in 30 mins vs 60+ mins from scratch

When to Use Pre-trained vs Custom

Use Pre-trained "Hey Mycroft" when:

You want to test quickly
"Hey Mycroft" is an acceptable wake word
You want proven accuracy out-of-box

Train Custom when:

You want a different wake word ("Hey Computer", "Jarvis", etc.)
Maximum accuracy for your specific environment
Family-specific wake word

Hybrid (Recommended):

Start with pre-trained "Hey Mycroft"
Test and learn the system
Fine-tune with your samples
Or add custom wake word later

2. Multiple Wake Words

Can You Have Multiple?

Yes! Options:

Option A: Server-Side (Recommended)

Easy implementation:

# Use the enhanced server
python voice_server_enhanced.py \
    --enable-precise \
    --multi-wake-word

Configured wake words:

"Hey Mycroft" (pre-trained)
"Hey Computer" (custom)
"Jarvis" (custom)

Resource impact:

3 models = ~15-30% CPU (Heimdall handles easily)
~300-600MB RAM
Each model runs independently

Example use cases:

"Hey Mycroft, what's the time?" → General assistant
"Jarvis, run diagnostics"        → Personal assistant mode
"Emergency, call help"           → Priority/emergency mode

Option B: Edge (K210)

Feasible for 1-2 wake words:

# Sequential checking
for model in ['hey-mycroft.kmodel', 'emergency.kmodel']:
    if detect_wake_word(model):
        return model

Limitations:

+50-100ms latency per additional model
Memory constraints (6MB total for all models)
More models = more power consumption

Recommendation:

K210: 1 wake word (optimal)
K210: 2 wake words (acceptable)
K210: 3+ wake words (not recommended)

Option C: Contextual Wake Words

Different wake words for different purposes:

wake_word_contexts = {
    'hey_mycroft': 'general_assistant',
    'emergency': 'priority_emergency',
    'goodnight': 'bedtime_routine',
}

Should You Use Multiple?

One wake word is usually enough!

Commercial products (Alexa, Google) use one wake word and they work fine.

Use multiple when:

Different family members want different wake words
You want context-specific behaviors (emergency vs. general)
You enjoy the flexibility

Start with one, add more later if needed.

3. Adopting New Users' Voices

Challenge

Same wake word, different voices:

Mom says "Hey Mycroft" (soprano)
Dad says "Hey Mycroft" (bass)
Kids say "Hey Mycroft" (high-pitched)

All need to work!

Solution 1: Diverse Training (Recommended)

During initial training, have everyone record samples:

cd ~/precise-models/family-hey-mycroft

# Session 1: Mom records 30 samples
precise-collect  # Mom speaks "Hey Mycroft" 30 times

# Session 2: Dad records 30 samples  
precise-collect  # Dad speaks "Hey Mycroft" 30 times

# Session 3: Kids record 20 samples each
precise-collect  # Kids speak "Hey Mycroft" 40 times total

# Train one model with all voices
precise-train -e 60 family-hey-mycroft.net .

# Deploy
python voice_server.py \
    --enable-precise \
    --precise-model family-hey-mycroft.net

Pros: ✅ One model works for everyone
✅ Simple deployment
✅ No switching needed
✅ Works from day one

Cons: ❌ Need everyone's time upfront
❌ Slightly lower per-person accuracy than individual models

Solution 2: Incremental Training

Start with one person, add others over time:

# Week 1: Train with Dad's voice
precise-train -e 60 hey-mycroft.net .

# Week 2: Mom wants to use it
# Collect Mom's samples
precise-collect  # Mom records 20-30 samples

# Add to training set
cp mom-samples/* wake-word/

# Retrain from checkpoint (faster!)
precise-train -e 30 hey-mycroft.net . \
    --from-checkpoint hey-mycroft.net

# Now works for both Dad and Mom!

# Week 3: Kids want in
# Repeat process...

Pros: ✅ Don't need everyone upfront
✅ Easy to add new users
✅ Model improves gradually

Cons: ❌ New users may have issues initially
❌ Requires periodic retraining

Solution 3: Speaker Identification (Advanced)

Identify who's speaking, use personalized model/settings:

# Install speaker ID
pip install pyannote.audio scipy --break-system-packages

# Use enhanced server
python voice_server_enhanced.py \
    --enable-precise \
    --enable-speaker-id \
    --hf-token YOUR_HF_TOKEN

Enroll users:

# Record 30-second voice sample from each person
# POST to /speakers/enroll with audio + name

curl -F "name=alan" \
     -F "audio=@alan_voice.wav" \
     http://localhost:5000/speakers/enroll

curl -F "name=sarah" \
     -F "audio=@sarah_voice.wav" \
     http://localhost:5000/speakers/enroll

Benefits:

# Different responses per user
if speaker == 'alan':
    turn_on('light.alan_office')
elif speaker == 'sarah':
    turn_on('light.sarah_office')

# Different permissions
if speaker == 'kids' and command.startswith('buy'):
    return "Sorry, kids can't make purchases"

Pros: ✅ Personalized responses
✅ User-specific settings
✅ Better accuracy (optimized per voice)
✅ Can track who said what

Cons: ❌ More complex
❌ Privacy considerations
❌ Additional CPU/RAM (~10% + 200MB)
❌ Requires voice enrollment

Solution 4: Pre-trained Model (Easiest)

"Hey Mycroft" already includes diverse voices!

# Just use it - already trained on many voices
./quick_start_hey_mycroft.sh

The community model was trained with:

Male and female voices
Different accents
Different ages
Various environments

It should work for most family members out-of-box!

Then fine-tune if needed.

Recommended Path for Your Situation

Scenario: Family of 3-4 People

Week 1: Quick Start

# Use pre-trained "Hey Mycroft"
./quick_start_hey_mycroft.sh

# Test with all family members
# Likely works for everyone already!

Week 2: Fine-tune if Needed

# If someone has issues:
# Have them record 20 samples
# Fine-tune the model

precise-train -e 30 family-hey-mycroft.net . \
    --from-checkpoint ~/precise-models/pretrained/hey-mycroft.net

Week 3: Add Features

# If you want personalization:
python voice_server_enhanced.py \
    --enable-speaker-id

# Enroll each family member

Scenario: Just You (or 1-2 People)

Option 1: Pre-trained

./quick_start_hey_mycroft.sh
# Done!

Option 2: Custom Wake Word

# Train custom "Hey Computer"
cd ~/precise-models/hey-computer
./1-record-wake-word.sh  # 50 samples
./2-record-not-wake-word.sh  # 200 samples
./3-train-model.sh

Scenario: Multiple People + Multiple Wake Words

Full setup:

# Pre-trained for family
./quick_start_hey_mycroft.sh

# Personal wake word for Dad
cd ~/precise-models/jarvis
# Train custom wake word

# Emergency wake word
cd ~/precise-models/emergency
# Train emergency wake word

# Run multi-wake-word server
python voice_server_enhanced.py \
    --enable-precise \
    --multi-wake-word \
    --enable-speaker-id

Quick Decision Matrix

Your Situation	Recommendation
Just getting started	Pre-trained "Hey Mycroft"
Want different wake word	Train custom model
Family of 3-4	Pre-trained + fine-tune if needed
Want personalization	Add speaker ID
Multiple purposes	Multiple wake words (server-side)
Deploying to K210	1 wake word, no speaker ID

Files to Use

Quick start with pre-trained:

quick_start_hey_mycroft.sh - Zero training, 5 minutes!

Multiple wake words:

voice_server_enhanced.py - Multi-wake-word + speaker ID support

Training custom:

setup_precise.sh - Setup training environment
Scripts in ~/precise-models/your-wake-word/

Documentation:

WAKE_WORD_ADVANCED.md - Detailed guide (this is comprehensive!)
PRECISE_DEPLOYMENT.md - Production deployment

Summary

✅ Yes, pre-trained "Hey Mycroft" exists and works great
✅ Yes, you can have multiple wake words (server-side is easy)
✅ Yes, multiple approaches for multi-user support

Recommended approach:

Start with ./quick_start_hey_mycroft.sh (5 mins)
Test with all family members
Fine-tune if anyone has issues
Add speaker ID later if you want personalization
Consider multiple wake words only if you have specific use cases

Keep it simple! One pre-trained wake word works for most people.

Next Actions

Ready to start?

# 5-minute quick start
./quick_start_hey_mycroft.sh

# Or read more first
cat WAKE_WORD_ADVANCED.md

Questions?

Pre-trained models: See WAKE_WORD_ADVANCED.md § Pre-trained
Multiple wake words: See WAKE_WORD_ADVANCED.md § Multiple Wake Words
Voice adaptation: See WAKE_WORD_ADVANCED.md § Voice Adaptation

Happy voice assisting! 🎙️

10 KiB Executable file Raw Permalink Blame History