# Maixduino Voice Assistant - Session Progress **Date:** 2025-12-03 **Session Duration:** ~4 hours **Goal:** Get audio recording and transcription working on Maixduino → Heimdall server --- ## 🎉 Major Achievements ### ✅ Full Audio Pipeline Working! We successfully built and tested the complete audio capture → compression → transmission → transcription pipeline: 1. **WiFi Connection** - Maixduino connects to network (10.1.10.98) 2. **Audio Recording** - I2S microphone captures audio (MSM261S4030H0 MEMS mic) 3. **Format Conversion** - Converts 32-bit stereo to 16-bit mono (4x size reduction) 4. **μ-law Compression** - Compresses PCM audio by 50% 5. **HTTP Transmission** - Sends compressed WAV to Heimdall server 6. **Whisper Transcription** - Server transcribes and returns text 7. **LCD Display** - Shows transcription on Maixduino screen 8. **Button Loop** - Press BOOT button for repeated recordings **Total size reduction:** 128KB → 32KB (mono) → 16KB (compressed) = **87.5% reduction!** --- ## 🔧 Technical Accomplishments ### Audio Recording Pipeline - **Initial Problem:** `i2s_dev.record()` returned immediately (1ms instead of 1000ms) - **Root Cause:** Recording API is asynchronous/non-blocking - **Solution:** Use chunked recording with `wait_record()` blocking calls - **Pattern:** ```python for i in range(frame_cnt): audio_chunk = i2s_dev.record(chunk_size) i2s_dev.wait_record() # CRITICAL: blocks until complete chunks.append(audio_chunk.to_bytes()) ``` ### Memory Management - **K210 has very limited RAM** (~6MB total, much less available) - Successfully handled 128KB → 16KB data transformation without OOM errors - Techniques used: - Record in small chunks (2048 samples) - Stream HTTP transmission (512-byte chunks with delays) - In-place data conversion where possible - Explicit garbage collection hints (`audio_data = None`) ### Network Communication - **Raw socket HTTP** (no urequests library available) - **Chunked streaming** with flow control (10ms delays) - **Simple WAV format** with μ-law compression (format code 7) - **Robust error handling** with serial output debugging --- ## 🐛 MicroPython/MaixPy Quirks Discovered ### String Operations - ❌ **F-strings NOT supported** - Must use `"text " + str(var)` concatenation - ❌ **Ternary operators fail** - Use explicit `if/else` blocks instead - ❌ **`split()` needs explicit delimiter** - `text.split(" ")` not `text.split()` - ❌ **Escape sequences problematic** - Avoid `\n` in strings, causes syntax errors ### Data Types & Methods - ❌ **`decode()` doesn't accept kwargs** - Use `decode('utf-8')` not `decode('utf-8', errors='ignore')` - ❌ **RGB tuples not accepted** - Must convert to packed integers: `(r << 16) | (g << 8) | b` - ❌ **Bytearray item deletion unsupported** - `del arr[n:]` fails, use slicing instead - ❌ **Arithmetic in string concat** - Separate calculations: `next = count + 1; "text" + str(next)` ### I2S Audio Specific - ❌ **`record()` is non-blocking** - Returns immediately, must use `wait_record()` - ❌ **Audio object not directly iterable** - Must call `.to_bytes()` first - ⚠️ **Data format mismatch** - Hardware returns 32-bit stereo even when configured for 16-bit mono (4x expected size) ### Network/WiFi - ❌ **`network.WLAN` not available** - Must use `network.ESP32_SPI` with full pin config - ❌ **`active()` method doesn't exist** - Just call `connect()` directly - ⚠️ **Requires ALL 6 pins configured** - CS, RST, RDY, MOSI, MISO, SCLK ### General Syntax - ⚠️ **`if __name__ == "__main__"` sometimes causes syntax errors** - Safer to just call `main()` directly - ⚠️ **Import statements mid-function can cause syntax errors** - Keep imports at top of file - ⚠️ **Some valid Python causes "invalid syntax" for unknown reasons** - Simplify complex expressions --- ## 📊 Current Status ### ✅ Working - WiFi connectivity (ESP32 SPI) - I2S audio initialization - Chunked audio recording with `wait_record()` - Audio format detection and conversion (32-bit stereo → 16-bit mono) - μ-law compression (50% size reduction) - HTTP transmission to server (chunked streaming) - Whisper transcription (server-side) - JSON response parsing - LCD display (with word wrapping) - Button-triggered recording loop - Countdown timer before recording ### ⚠️ Partially Working - **Recording duration** - Currently getting ~0.9 seconds instead of full 1 second - Formula: `frame_cnt = seconds * sample_rate // chunk_size` - Current: `7 frames × (2048/16000) = 0.896s` - May need to increase `frame_cnt` or adjust chunk size ### ❌ Not Yet Implemented - Mycroft Precise wake word detection - Full voice assistant loop - Command processing - Home Assistant integration - Multi-second recording support - Real-time audio streaming --- ## 🔬 Technical Details ### Hardware Configuration **Maixduino Board:** - Processor: K210 dual-core RISC-V @ 400MHz - RAM: ~6MB total (limited available memory) - WiFi: ESP32 module via SPI - Microphone: MSM261S4030H0 MEMS (onboard) - IP Address: 10.1.10.98 **I2S Pins:** - Pin 20: I2S0_IN_D0 (data) - Pin 19: I2S0_WS (word select) - Pin 18: I2S0_SCLK (clock) **ESP32 SPI Pins:** - Pin 25: CS (chip select) - Pin 8: RST (reset) - Pin 9: RDY (ready) - Pin 28: MOSI (master out) - Pin 26: MISO (master in) - Pin 27: SCLK (clock) **GPIO:** - Pin 16: BOOT button (active low, pull-up) ### Server Configuration **Heimdall Server:** - IP: 10.1.10.71 - Port: 3006 - Framework: Flask - Model: Whisper base - Environment: Conda `whisper_cli` **Endpoints:** - `/health` - Health check - `/transcribe` - POST audio for transcription ### Audio Format **Recording:** - Sample Rate: 16kHz - Hardware Output: 32-bit stereo (128KB for 1 second) - After Conversion: 16-bit mono (32KB for 1 second) - After Compression: 8-bit μ-law (16KB for 1 second) **WAV Header:** - Format Code: 7 (μ-law) - Channels: 1 (mono) - Sample Rate: 16000 Hz - Bits per Sample: 8 - Includes `fact` chunk (required for μ-law) --- ## 📝 Code Files ### Main Script **File:** `/Library/Development/devl/Devops/projects/mycroft-precise/maixduino-scripts/maix_simple_record_test.py` **Key Functions:** - `init_wifi()` - ESP32 SPI WiFi connection - `init_audio()` - I2S microphone setup - `record_audio()` - Chunked recording with `wait_record()` - `convert_to_mono_16bit()` - Format conversion (32-bit stereo → 16-bit mono) - `compress_ulaw()` - μ-law compression - `create_wav_header()` - WAV file header generation - `send_to_server()` - HTTP POST with chunked streaming - `display_transcription()` - LCD output with word wrapping - `main()` - Button loop for repeated recordings ### Server Script **File:** `/devl/voice-assistant/simple_transcribe_server.py` **Features:** - Accepts raw WAV or multipart uploads - Whisper base model transcription - JSON response with transcription text - Handles μ-law compressed audio ### Documentation **File:** `/Library/Development/devl/Devops/projects/mycroft-precise/maixduino-scripts/MICROPYTHON_QUIRKS.md` Complete reference of all MicroPython compatibility issues discovered during development. --- ## 🎯 Next Steps ### Immediate (Tonight) 1. ✅ Switch to Linux laptop with direct serial access 2. ⏭️ Tune recording duration to get full 1 second - Try `frame_cnt = 8` instead of 7 - Or adjust chunk size to get exact timing 3. ⏭️ Test transcription quality with proper-length recordings ### Short Term (This Week) 1. Increase recording duration to 2-3 seconds for better transcription 2. Test memory limits with longer recordings 3. Optimize compression/transmission for speed 4. Add visual feedback during transmission ### Medium Term (Next Week) 1. Install Mycroft Precise in `whisper_cli` environment 2. Test "hey mycroft" wake word detection on server 3. Integrate wake word into recording loop 4. Add command processing and Home Assistant integration ### Long Term (Future) 1. Explore edge wake word detection (Precise on K210) 2. Multi-device deployment 3. Continuous listening mode 4. Voice profiles and speaker identification --- ## 🐛 Known Issues ### Recording Duration - **Issue:** Recording is ~0.9 seconds instead of 1.0 seconds - **Cause:** Integer division `16000 // 2048 = 7.8` rounds down to 7 frames - **Impact:** Minor - transcription still works - **Fix:** Increase `frame_cnt` to 8 or adjust chunk size ### Data Format Mismatch - **Issue:** Hardware returns 4x expected data (128KB vs 32KB) - **Cause:** I2S outputting 32-bit stereo despite 16-bit mono config - **Impact:** None - conversion function handles it - **Status:** Working as intended ### Syntax Error Sensitivity - **Issue:** Some valid Python causes "invalid syntax" in MicroPython - **Patterns:** Import statements mid-function, certain arithmetic expressions - **Workaround:** Simplify code, avoid complex expressions - **Status:** Documented in MICROPYTHON_QUIRKS.md --- ## 💡 Key Learnings ### I2S Recording Pattern The correct pattern for MaixPy I2S recording: ```python chunk_size = 2048 frame_cnt = seconds * sample_rate // chunk_size for i in range(frame_cnt): audio_chunk = i2s_dev.record(chunk_size) i2s_dev.wait_record() # BLOCKS until recording complete data.append(audio_chunk.to_bytes()) ``` **Critical:** `wait_record()` is REQUIRED or recording returns immediately! ### Memory Management K210 has very limited RAM. Successful strategies: - Work in small chunks (512-2048 bytes) - Stream data instead of buffering - Free variables explicitly when done - Avoid creating large intermediate buffers ### MicroPython Compatibility MicroPython is NOT Python. Many standard features missing: - F-strings, ternary operators, keyword arguments - Some string methods, complex expressions - Standard libraries (urequests, json parsing) **Rule:** Test incrementally, simplify everything, check quirks doc. --- ## 📚 Resources Used ### Documentation - [MaixPy I2S API Reference](https://wiki.sipeed.com/soft/maixpy/en/api_reference/Maix/i2s.html) - [MaixPy I2S Usage Guide](https://wiki.sipeed.com/soft/maixpy/en/modules/on_chip/i2s.html) - [Maixduino Hardware Wiki](https://wiki.sipeed.com/hardware/en/maix/maixpy_develop_kit_board/maix_duino.html) ### Code Examples - [Official record_wav.py](https://github.com/sipeed/MaixPy-v1_scripts/blob/master/multimedia/audio/record_wav.py) - [MaixPy Scripts Repository](https://github.com/sipeed/MaixPy-v1_scripts) ### Tools - MaixPy IDE (copy/paste to board) - Serial monitor (debugging) - Heimdall server (Whisper transcription) --- ## 🔄 Ready for Next Session ### Current State - ✅ Code is working and stable - ✅ Can record, compress, transmit, transcribe, display - ✅ Button loop allows repeated testing - ⚠️ Recording duration slightly short (~0.9s) ### Files Ready - `/Library/Development/devl/Devops/projects/mycroft-precise/maixduino-scripts/maix_simple_record_test.py` - `/Library/Development/devl/Devops/projects/mycroft-precise/maixduino-scripts/MICROPYTHON_QUIRKS.md` - `/devl/voice-assistant/simple_transcribe_server.py` ### For Serial Access Session 1. Connect Maixduino via USB to Linux laptop 2. Install pyserial: `pip install pyserial` 3. Find device: `ls /dev/ttyUSB*` or `/dev/ttyACM*` 4. Connect: `screen /dev/ttyUSB0 115200` or use MaixPy IDE 5. Can directly modify code, test immediately, see serial output ### Quick Test Commands ```python # Test WiFi from network import ESP32_SPI # ... (full init code in maix_test_simple.py) # Test I2S from Maix import I2S rx = I2S(I2S.DEVICE_0) # ... # Test recording audio = rx.record(2048) rx.wait_record() print(len(audio.to_bytes())) ``` --- ## 🎊 Success Metrics Today we achieved: - ✅ WiFi connection working - ✅ Audio recording working (with proper blocking) - ✅ Format conversion working (4x reduction) - ✅ Compression working (2x reduction) - ✅ Network transmission working (chunked streaming) - ✅ Server transcription working - ✅ Display output working - ✅ Button loop working - ✅ End-to-end pipeline complete! **Total:** 9/9 core features working! 🚀 Minor tuning needed, but the foundation is solid and ready for wake word integration. --- **Session Summary:** Massive progress! From zero to working audio transcription pipeline in one session. Overcame significant MicroPython compatibility challenges and memory limitations. Ready for next phase: wake word detection. **Status:** ✅ Ready for Linux serial access and fine-tuning **Next Session:** Tune recording duration, then integrate Mycroft Precise wake word detection --- *End of Session Report - 2025-12-03*