minerva/hardware/maixduino/SESSION_PROGRESS_2025-12-03.md
pyr0ball 173f7f37d4 feat: import mycroft-precise work as Minerva foundation
Ports prior voice assistant research and prototypes from devl/Devops
into the Minerva repo. Includes:

- docs/: architecture, wake word guides, ESP32-S3 spec, hardware buying guide
- scripts/: voice_server.py, voice_server_enhanced.py, setup scripts
- hardware/maixduino/: edge device scripts with WiFi credentials scrubbed
  (replaced hardcoded password with secrets.py pattern)
- config/.env.example: server config template
- .gitignore: excludes .env, secrets.py, model blobs, ELF firmware
- CLAUDE.md: Minerva product context and connection to cf-voice roadmap
2026-04-06 22:21:12 -07:00

376 lines
12 KiB
Markdown
Executable file
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Maixduino Voice Assistant - Session Progress
**Date:** 2025-12-03
**Session Duration:** ~4 hours
**Goal:** Get audio recording and transcription working on Maixduino → Heimdall server
---
## 🎉 Major Achievements
### ✅ Full Audio Pipeline Working!
We successfully built and tested the complete audio capture → compression → transmission → transcription pipeline:
1. **WiFi Connection** - Maixduino connects to network (10.1.10.98)
2. **Audio Recording** - I2S microphone captures audio (MSM261S4030H0 MEMS mic)
3. **Format Conversion** - Converts 32-bit stereo to 16-bit mono (4x size reduction)
4. **μ-law Compression** - Compresses PCM audio by 50%
5. **HTTP Transmission** - Sends compressed WAV to Heimdall server
6. **Whisper Transcription** - Server transcribes and returns text
7. **LCD Display** - Shows transcription on Maixduino screen
8. **Button Loop** - Press BOOT button for repeated recordings
**Total size reduction:** 128KB → 32KB (mono) → 16KB (compressed) = **87.5% reduction!**
---
## 🔧 Technical Accomplishments
### Audio Recording Pipeline
- **Initial Problem:** `i2s_dev.record()` returned immediately (1ms instead of 1000ms)
- **Root Cause:** Recording API is asynchronous/non-blocking
- **Solution:** Use chunked recording with `wait_record()` blocking calls
- **Pattern:**
```python
for i in range(frame_cnt):
audio_chunk = i2s_dev.record(chunk_size)
i2s_dev.wait_record() # CRITICAL: blocks until complete
chunks.append(audio_chunk.to_bytes())
```
### Memory Management
- **K210 has very limited RAM** (~6MB total, much less available)
- Successfully handled 128KB → 16KB data transformation without OOM errors
- Techniques used:
- Record in small chunks (2048 samples)
- Stream HTTP transmission (512-byte chunks with delays)
- In-place data conversion where possible
- Explicit garbage collection hints (`audio_data = None`)
### Network Communication
- **Raw socket HTTP** (no urequests library available)
- **Chunked streaming** with flow control (10ms delays)
- **Simple WAV format** with μ-law compression (format code 7)
- **Robust error handling** with serial output debugging
---
## 🐛 MicroPython/MaixPy Quirks Discovered
### String Operations
-**F-strings NOT supported** - Must use `"text " + str(var)` concatenation
-**Ternary operators fail** - Use explicit `if/else` blocks instead
-**`split()` needs explicit delimiter** - `text.split(" ")` not `text.split()`
-**Escape sequences problematic** - Avoid `\n` in strings, causes syntax errors
### Data Types & Methods
-**`decode()` doesn't accept kwargs** - Use `decode('utf-8')` not `decode('utf-8', errors='ignore')`
-**RGB tuples not accepted** - Must convert to packed integers: `(r << 16) | (g << 8) | b`
-**Bytearray item deletion unsupported** - `del arr[n:]` fails, use slicing instead
-**Arithmetic in string concat** - Separate calculations: `next = count + 1; "text" + str(next)`
### I2S Audio Specific
-**`record()` is non-blocking** - Returns immediately, must use `wait_record()`
-**Audio object not directly iterable** - Must call `.to_bytes()` first
- ⚠️ **Data format mismatch** - Hardware returns 32-bit stereo even when configured for 16-bit mono (4x expected size)
### Network/WiFi
-**`network.WLAN` not available** - Must use `network.ESP32_SPI` with full pin config
-**`active()` method doesn't exist** - Just call `connect()` directly
- ⚠️ **Requires ALL 6 pins configured** - CS, RST, RDY, MOSI, MISO, SCLK
### General Syntax
- ⚠️ **`if __name__ == "__main__"` sometimes causes syntax errors** - Safer to just call `main()` directly
- ⚠️ **Import statements mid-function can cause syntax errors** - Keep imports at top of file
- ⚠️ **Some valid Python causes "invalid syntax" for unknown reasons** - Simplify complex expressions
---
## 📊 Current Status
### ✅ Working
- WiFi connectivity (ESP32 SPI)
- I2S audio initialization
- Chunked audio recording with `wait_record()`
- Audio format detection and conversion (32-bit stereo → 16-bit mono)
- μ-law compression (50% size reduction)
- HTTP transmission to server (chunked streaming)
- Whisper transcription (server-side)
- JSON response parsing
- LCD display (with word wrapping)
- Button-triggered recording loop
- Countdown timer before recording
### ⚠️ Partially Working
- **Recording duration** - Currently getting ~0.9 seconds instead of full 1 second
- Formula: `frame_cnt = seconds * sample_rate // chunk_size`
- Current: `7 frames × (2048/16000) = 0.896s`
- May need to increase `frame_cnt` or adjust chunk size
### ❌ Not Yet Implemented
- Mycroft Precise wake word detection
- Full voice assistant loop
- Command processing
- Home Assistant integration
- Multi-second recording support
- Real-time audio streaming
---
## 🔬 Technical Details
### Hardware Configuration
**Maixduino Board:**
- Processor: K210 dual-core RISC-V @ 400MHz
- RAM: ~6MB total (limited available memory)
- WiFi: ESP32 module via SPI
- Microphone: MSM261S4030H0 MEMS (onboard)
- IP Address: 10.1.10.98
**I2S Pins:**
- Pin 20: I2S0_IN_D0 (data)
- Pin 19: I2S0_WS (word select)
- Pin 18: I2S0_SCLK (clock)
**ESP32 SPI Pins:**
- Pin 25: CS (chip select)
- Pin 8: RST (reset)
- Pin 9: RDY (ready)
- Pin 28: MOSI (master out)
- Pin 26: MISO (master in)
- Pin 27: SCLK (clock)
**GPIO:**
- Pin 16: BOOT button (active low, pull-up)
### Server Configuration
**Heimdall Server:**
- IP: 10.1.10.71
- Port: 3006
- Framework: Flask
- Model: Whisper base
- Environment: Conda `whisper_cli`
**Endpoints:**
- `/health` - Health check
- `/transcribe` - POST audio for transcription
### Audio Format
**Recording:**
- Sample Rate: 16kHz
- Hardware Output: 32-bit stereo (128KB for 1 second)
- After Conversion: 16-bit mono (32KB for 1 second)
- After Compression: 8-bit μ-law (16KB for 1 second)
**WAV Header:**
- Format Code: 7 (μ-law)
- Channels: 1 (mono)
- Sample Rate: 16000 Hz
- Bits per Sample: 8
- Includes `fact` chunk (required for μ-law)
---
## 📝 Code Files
### Main Script
**File:** `/Library/Development/devl/Devops/projects/mycroft-precise/maixduino-scripts/maix_simple_record_test.py`
**Key Functions:**
- `init_wifi()` - ESP32 SPI WiFi connection
- `init_audio()` - I2S microphone setup
- `record_audio()` - Chunked recording with `wait_record()`
- `convert_to_mono_16bit()` - Format conversion (32-bit stereo → 16-bit mono)
- `compress_ulaw()` - μ-law compression
- `create_wav_header()` - WAV file header generation
- `send_to_server()` - HTTP POST with chunked streaming
- `display_transcription()` - LCD output with word wrapping
- `main()` - Button loop for repeated recordings
### Server Script
**File:** `/devl/voice-assistant/simple_transcribe_server.py`
**Features:**
- Accepts raw WAV or multipart uploads
- Whisper base model transcription
- JSON response with transcription text
- Handles μ-law compressed audio
### Documentation
**File:** `/Library/Development/devl/Devops/projects/mycroft-precise/maixduino-scripts/MICROPYTHON_QUIRKS.md`
Complete reference of all MicroPython compatibility issues discovered during development.
---
## 🎯 Next Steps
### Immediate (Tonight)
1. ✅ Switch to Linux laptop with direct serial access
2. ⏭️ Tune recording duration to get full 1 second
- Try `frame_cnt = 8` instead of 7
- Or adjust chunk size to get exact timing
3. ⏭️ Test transcription quality with proper-length recordings
### Short Term (This Week)
1. Increase recording duration to 2-3 seconds for better transcription
2. Test memory limits with longer recordings
3. Optimize compression/transmission for speed
4. Add visual feedback during transmission
### Medium Term (Next Week)
1. Install Mycroft Precise in `whisper_cli` environment
2. Test "hey mycroft" wake word detection on server
3. Integrate wake word into recording loop
4. Add command processing and Home Assistant integration
### Long Term (Future)
1. Explore edge wake word detection (Precise on K210)
2. Multi-device deployment
3. Continuous listening mode
4. Voice profiles and speaker identification
---
## 🐛 Known Issues
### Recording Duration
- **Issue:** Recording is ~0.9 seconds instead of 1.0 seconds
- **Cause:** Integer division `16000 // 2048 = 7.8` rounds down to 7 frames
- **Impact:** Minor - transcription still works
- **Fix:** Increase `frame_cnt` to 8 or adjust chunk size
### Data Format Mismatch
- **Issue:** Hardware returns 4x expected data (128KB vs 32KB)
- **Cause:** I2S outputting 32-bit stereo despite 16-bit mono config
- **Impact:** None - conversion function handles it
- **Status:** Working as intended
### Syntax Error Sensitivity
- **Issue:** Some valid Python causes "invalid syntax" in MicroPython
- **Patterns:** Import statements mid-function, certain arithmetic expressions
- **Workaround:** Simplify code, avoid complex expressions
- **Status:** Documented in MICROPYTHON_QUIRKS.md
---
## 💡 Key Learnings
### I2S Recording Pattern
The correct pattern for MaixPy I2S recording:
```python
chunk_size = 2048
frame_cnt = seconds * sample_rate // chunk_size
for i in range(frame_cnt):
audio_chunk = i2s_dev.record(chunk_size)
i2s_dev.wait_record() # BLOCKS until recording complete
data.append(audio_chunk.to_bytes())
```
**Critical:** `wait_record()` is REQUIRED or recording returns immediately!
### Memory Management
K210 has very limited RAM. Successful strategies:
- Work in small chunks (512-2048 bytes)
- Stream data instead of buffering
- Free variables explicitly when done
- Avoid creating large intermediate buffers
### MicroPython Compatibility
MicroPython is NOT Python. Many standard features missing:
- F-strings, ternary operators, keyword arguments
- Some string methods, complex expressions
- Standard libraries (urequests, json parsing)
**Rule:** Test incrementally, simplify everything, check quirks doc.
---
## 📚 Resources Used
### Documentation
- [MaixPy I2S API Reference](https://wiki.sipeed.com/soft/maixpy/en/api_reference/Maix/i2s.html)
- [MaixPy I2S Usage Guide](https://wiki.sipeed.com/soft/maixpy/en/modules/on_chip/i2s.html)
- [Maixduino Hardware Wiki](https://wiki.sipeed.com/hardware/en/maix/maixpy_develop_kit_board/maix_duino.html)
### Code Examples
- [Official record_wav.py](https://github.com/sipeed/MaixPy-v1_scripts/blob/master/multimedia/audio/record_wav.py)
- [MaixPy Scripts Repository](https://github.com/sipeed/MaixPy-v1_scripts)
### Tools
- MaixPy IDE (copy/paste to board)
- Serial monitor (debugging)
- Heimdall server (Whisper transcription)
---
## 🔄 Ready for Next Session
### Current State
- ✅ Code is working and stable
- ✅ Can record, compress, transmit, transcribe, display
- ✅ Button loop allows repeated testing
- ⚠️ Recording duration slightly short (~0.9s)
### Files Ready
- `/Library/Development/devl/Devops/projects/mycroft-precise/maixduino-scripts/maix_simple_record_test.py`
- `/Library/Development/devl/Devops/projects/mycroft-precise/maixduino-scripts/MICROPYTHON_QUIRKS.md`
- `/devl/voice-assistant/simple_transcribe_server.py`
### For Serial Access Session
1. Connect Maixduino via USB to Linux laptop
2. Install pyserial: `pip install pyserial`
3. Find device: `ls /dev/ttyUSB*` or `/dev/ttyACM*`
4. Connect: `screen /dev/ttyUSB0 115200` or use MaixPy IDE
5. Can directly modify code, test immediately, see serial output
### Quick Test Commands
```python
# Test WiFi
from network import ESP32_SPI
# ... (full init code in maix_test_simple.py)
# Test I2S
from Maix import I2S
rx = I2S(I2S.DEVICE_0)
# ...
# Test recording
audio = rx.record(2048)
rx.wait_record()
print(len(audio.to_bytes()))
```
---
## 🎊 Success Metrics
Today we achieved:
- ✅ WiFi connection working
- ✅ Audio recording working (with proper blocking)
- ✅ Format conversion working (4x reduction)
- ✅ Compression working (2x reduction)
- ✅ Network transmission working (chunked streaming)
- ✅ Server transcription working
- ✅ Display output working
- ✅ Button loop working
- ✅ End-to-end pipeline complete!
**Total:** 9/9 core features working! 🚀
Minor tuning needed, but the foundation is solid and ready for wake word integration.
---
**Session Summary:** Massive progress! From zero to working audio transcription pipeline in one session. Overcame significant MicroPython compatibility challenges and memory limitations. Ready for next phase: wake word detection.
**Status:** ✅ Ready for Linux serial access and fine-tuning
**Next Session:** Tune recording duration, then integrate Mycroft Precise wake word detection
---
*End of Session Report - 2025-12-03*