pyr0ball 173f7f37d4 feat: import mycroft-precise work as Minerva foundation

Ports prior voice assistant research and prototypes from devl/Devops
into the Minerva repo. Includes:

- docs/: architecture, wake word guides, ESP32-S3 spec, hardware buying guide
- scripts/: voice_server.py, voice_server_enhanced.py, setup scripts
- hardware/maixduino/: edge device scripts with WiFi credentials scrubbed
  (replaced hardcoded password with secrets.py pattern)
- config/.env.example: server config template
- .gitignore: excludes .env, secrets.py, model blobs, ELF firmware
- CLAUDE.md: Minerva product context and connection to cf-voice roadmap

2026-04-06 22:21:12 -07:00

12 KiB

Executable file

Raw Permalink Blame History

Maixduino Voice Assistant - Session Progress

Date: 2025-12-03 Session Duration: ~4 hours Goal: Get audio recording and transcription working on Maixduino → Heimdall server

🎉 Major Achievements

✅ Full Audio Pipeline Working!

We successfully built and tested the complete audio capture → compression → transmission → transcription pipeline:

WiFi Connection - Maixduino connects to network (10.1.10.98)
Audio Recording - I2S microphone captures audio (MSM261S4030H0 MEMS mic)
Format Conversion - Converts 32-bit stereo to 16-bit mono (4x size reduction)
μ-law Compression - Compresses PCM audio by 50%
HTTP Transmission - Sends compressed WAV to Heimdall server
Whisper Transcription - Server transcribes and returns text
LCD Display - Shows transcription on Maixduino screen
Button Loop - Press BOOT button for repeated recordings

Total size reduction: 128KB → 32KB (mono) → 16KB (compressed) = 87.5% reduction!

🔧 Technical Accomplishments

Audio Recording Pipeline

Initial Problem: i2s_dev.record() returned immediately (1ms instead of 1000ms)
Root Cause: Recording API is asynchronous/non-blocking
Solution: Use chunked recording with wait_record() blocking calls

Pattern:

for i in range(frame_cnt):
    audio_chunk = i2s_dev.record(chunk_size)
    i2s_dev.wait_record()  # CRITICAL: blocks until complete
    chunks.append(audio_chunk.to_bytes())

Memory Management

K210 has very limited RAM (~6MB total, much less available)
Successfully handled 128KB → 16KB data transformation without OOM errors
Techniques used:
- Record in small chunks (2048 samples)
- Stream HTTP transmission (512-byte chunks with delays)
- In-place data conversion where possible
- Explicit garbage collection hints (audio_data = None)

Network Communication

Raw socket HTTP (no urequests library available)
Chunked streaming with flow control (10ms delays)
Simple WAV format with μ-law compression (format code 7)
Robust error handling with serial output debugging

🐛 MicroPython/MaixPy Quirks Discovered

String Operations

❌ F-strings NOT supported - Must use "text " + str(var) concatenation
❌ Ternary operators fail - Use explicit if/else blocks instead
❌ split() needs explicit delimiter - text.split(" ") not text.split()
❌ Escape sequences problematic - Avoid \n in strings, causes syntax errors

Data Types & Methods

❌ decode() doesn't accept kwargs - Use decode('utf-8') not decode('utf-8', errors='ignore')
❌ RGB tuples not accepted - Must convert to packed integers: (r << 16) | (g << 8) | b
❌ Bytearray item deletion unsupported - del arr[n:] fails, use slicing instead
❌ Arithmetic in string concat - Separate calculations: next = count + 1; "text" + str(next)

I2S Audio Specific

❌ record() is non-blocking - Returns immediately, must use wait_record()
❌ Audio object not directly iterable - Must call .to_bytes() first
⚠️ Data format mismatch - Hardware returns 32-bit stereo even when configured for 16-bit mono (4x expected size)

Network/WiFi

❌ network.WLAN not available - Must use network.ESP32_SPI with full pin config
❌ active() method doesn't exist - Just call connect() directly
⚠️ Requires ALL 6 pins configured - CS, RST, RDY, MOSI, MISO, SCLK

General Syntax

⚠️ if __name__ == "__main__" sometimes causes syntax errors - Safer to just call main() directly
⚠️ Import statements mid-function can cause syntax errors - Keep imports at top of file
⚠️ Some valid Python causes "invalid syntax" for unknown reasons - Simplify complex expressions

📊 Current Status

✅ Working

WiFi connectivity (ESP32 SPI)
I2S audio initialization
Chunked audio recording with wait_record()
Audio format detection and conversion (32-bit stereo → 16-bit mono)
μ-law compression (50% size reduction)
HTTP transmission to server (chunked streaming)
Whisper transcription (server-side)
JSON response parsing
LCD display (with word wrapping)
Button-triggered recording loop
Countdown timer before recording

⚠️ Partially Working

Recording duration - Currently getting ~0.9 seconds instead of full 1 second
- Formula: frame_cnt = seconds * sample_rate // chunk_size
- Current: 7 frames × (2048/16000) = 0.896s
- May need to increase frame_cnt or adjust chunk size

❌ Not Yet Implemented

Mycroft Precise wake word detection
Full voice assistant loop
Command processing
Home Assistant integration
Multi-second recording support
Real-time audio streaming

🔬 Technical Details

Hardware Configuration

Maixduino Board:

Processor: K210 dual-core RISC-V @ 400MHz
RAM: ~6MB total (limited available memory)
WiFi: ESP32 module via SPI
Microphone: MSM261S4030H0 MEMS (onboard)
IP Address: 10.1.10.98

I2S Pins:

Pin 20: I2S0_IN_D0 (data)
Pin 19: I2S0_WS (word select)
Pin 18: I2S0_SCLK (clock)

ESP32 SPI Pins:

Pin 25: CS (chip select)
Pin 8: RST (reset)
Pin 9: RDY (ready)
Pin 28: MOSI (master out)
Pin 26: MISO (master in)
Pin 27: SCLK (clock)

GPIO:

Pin 16: BOOT button (active low, pull-up)

Server Configuration

Heimdall Server:

IP: 10.1.10.71
Port: 3006
Framework: Flask
Model: Whisper base
Environment: Conda whisper_cli

Endpoints:

/health - Health check
/transcribe - POST audio for transcription

Audio Format

Recording:

Sample Rate: 16kHz
Hardware Output: 32-bit stereo (128KB for 1 second)
After Conversion: 16-bit mono (32KB for 1 second)
After Compression: 8-bit μ-law (16KB for 1 second)

WAV Header:

Format Code: 7 (μ-law)
Channels: 1 (mono)
Sample Rate: 16000 Hz
Bits per Sample: 8
Includes fact chunk (required for μ-law)

📝 Code Files

Main Script

File: /Library/Development/devl/Devops/projects/mycroft-precise/maixduino-scripts/maix_simple_record_test.py

Key Functions:

init_wifi() - ESP32 SPI WiFi connection
init_audio() - I2S microphone setup
record_audio() - Chunked recording with wait_record()
convert_to_mono_16bit() - Format conversion (32-bit stereo → 16-bit mono)
compress_ulaw() - μ-law compression
create_wav_header() - WAV file header generation
send_to_server() - HTTP POST with chunked streaming
display_transcription() - LCD output with word wrapping
main() - Button loop for repeated recordings

Server Script

File: /devl/voice-assistant/simple_transcribe_server.py

Features:

Accepts raw WAV or multipart uploads
Whisper base model transcription
JSON response with transcription text
Handles μ-law compressed audio

Documentation

File: /Library/Development/devl/Devops/projects/mycroft-precise/maixduino-scripts/MICROPYTHON_QUIRKS.md

Complete reference of all MicroPython compatibility issues discovered during development.

🎯 Next Steps

Immediate (Tonight)

✅ Switch to Linux laptop with direct serial access
⏭️ Tune recording duration to get full 1 second
- Try frame_cnt = 8 instead of 7
- Or adjust chunk size to get exact timing
⏭️ Test transcription quality with proper-length recordings

Short Term (This Week)

Increase recording duration to 2-3 seconds for better transcription
Test memory limits with longer recordings
Optimize compression/transmission for speed
Add visual feedback during transmission

Medium Term (Next Week)

Install Mycroft Precise in whisper_cli environment
Test "hey mycroft" wake word detection on server
Integrate wake word into recording loop
Add command processing and Home Assistant integration

Long Term (Future)

Explore edge wake word detection (Precise on K210)
Multi-device deployment
Continuous listening mode
Voice profiles and speaker identification

🐛 Known Issues

Recording Duration

Issue: Recording is ~0.9 seconds instead of 1.0 seconds
Cause: Integer division 16000 // 2048 = 7.8 rounds down to 7 frames
Impact: Minor - transcription still works
Fix: Increase frame_cnt to 8 or adjust chunk size

Data Format Mismatch

Issue: Hardware returns 4x expected data (128KB vs 32KB)
Cause: I2S outputting 32-bit stereo despite 16-bit mono config
Impact: None - conversion function handles it
Status: Working as intended

Syntax Error Sensitivity

Issue: Some valid Python causes "invalid syntax" in MicroPython
Patterns: Import statements mid-function, certain arithmetic expressions
Workaround: Simplify code, avoid complex expressions
Status: Documented in MICROPYTHON_QUIRKS.md

💡 Key Learnings

I2S Recording Pattern

The correct pattern for MaixPy I2S recording:

chunk_size = 2048
frame_cnt = seconds * sample_rate // chunk_size

for i in range(frame_cnt):
    audio_chunk = i2s_dev.record(chunk_size)
    i2s_dev.wait_record()  # BLOCKS until recording complete
    data.append(audio_chunk.to_bytes())

Critical: wait_record() is REQUIRED or recording returns immediately!

Memory Management

K210 has very limited RAM. Successful strategies:

Work in small chunks (512-2048 bytes)
Stream data instead of buffering
Free variables explicitly when done
Avoid creating large intermediate buffers

MicroPython Compatibility

MicroPython is NOT Python. Many standard features missing:

F-strings, ternary operators, keyword arguments
Some string methods, complex expressions
Standard libraries (urequests, json parsing)

Rule: Test incrementally, simplify everything, check quirks doc.

📚 Resources Used

Documentation

Code Examples

Tools

MaixPy IDE (copy/paste to board)
Serial monitor (debugging)
Heimdall server (Whisper transcription)

🔄 Ready for Next Session

Current State

✅ Code is working and stable
✅ Can record, compress, transmit, transcribe, display
✅ Button loop allows repeated testing
⚠️ Recording duration slightly short (~0.9s)

Files Ready

/Library/Development/devl/Devops/projects/mycroft-precise/maixduino-scripts/maix_simple_record_test.py
/Library/Development/devl/Devops/projects/mycroft-precise/maixduino-scripts/MICROPYTHON_QUIRKS.md
/devl/voice-assistant/simple_transcribe_server.py

For Serial Access Session

Connect Maixduino via USB to Linux laptop
Install pyserial: pip install pyserial
Find device: ls /dev/ttyUSB* or /dev/ttyACM*
Connect: screen /dev/ttyUSB0 115200 or use MaixPy IDE
Can directly modify code, test immediately, see serial output

Quick Test Commands

# Test WiFi
from network import ESP32_SPI
# ... (full init code in maix_test_simple.py)

# Test I2S
from Maix import I2S
rx = I2S(I2S.DEVICE_0)
# ...

# Test recording
audio = rx.record(2048)
rx.wait_record()
print(len(audio.to_bytes()))

🎊 Success Metrics

Today we achieved:

✅ WiFi connection working
✅ Audio recording working (with proper blocking)
✅ Format conversion working (4x reduction)
✅ Compression working (2x reduction)
✅ Network transmission working (chunked streaming)
✅ Server transcription working
✅ Display output working
✅ Button loop working
✅ End-to-end pipeline complete!

Total: 9/9 core features working! 🚀

Minor tuning needed, but the foundation is solid and ready for wake word integration.

Session Summary: Massive progress! From zero to working audio transcription pipeline in one session. Overcame significant MicroPython compatibility challenges and memory limitations. Ready for next phase: wake word detection.

Status: ✅ Ready for Linux serial access and fine-tuning Next Session: Tune recording duration, then integrate Mycroft Precise wake word detection

End of Session Report - 2025-12-03

12 KiB Executable file Raw Permalink Blame History Unescape Escape

Maixduino Voice Assistant - Session Progress

🎉 Major Achievements

✅ Full Audio Pipeline Working!

🔧 Technical Accomplishments

Audio Recording Pipeline

Memory Management

Network Communication

🐛 MicroPython/MaixPy Quirks Discovered

String Operations

Data Types & Methods

I2S Audio Specific

Network/WiFi

General Syntax

📊 Current Status

✅ Working

⚠️ Partially Working

❌ Not Yet Implemented

🔬 Technical Details

Hardware Configuration

Server Configuration

Audio Format

📝 Code Files

Main Script

Server Script

Documentation

🎯 Next Steps

Immediate (Tonight)

Short Term (This Week)

Medium Term (Next Week)

Long Term (Future)

🐛 Known Issues

Recording Duration

Data Format Mismatch

Syntax Error Sensitivity

💡 Key Learnings

I2S Recording Pattern

Memory Management

MicroPython Compatibility

📚 Resources Used

Documentation

Code Examples

Tools

🔄 Ready for Next Session

Current State

Files Ready

For Serial Access Session

Quick Test Commands

🎊 Success Metrics

12 KiB

Executable file

Raw Permalink Blame History