minerva/hardware/maixduino/SESSION_PROGRESS_2025-12-03.md

# Maixduino Voice Assistant - Session Progress

**Date:** 2025-12-03
**Session Duration:** ~4 hours
**Goal:** Get audio recording and transcription working on Maixduino → Heimdall server

---

## 🎉 Major Achievements

### ✅ Full Audio Pipeline Working!
We successfully built and tested the complete audio capture → compression → transmission → transcription pipeline:

1. **WiFi Connection** - Maixduino connects to network (10.1.10.98)
2. **Audio Recording** - I2S microphone captures audio (MSM261S4030H0 MEMS mic)
3. **Format Conversion** - Converts 32-bit stereo to 16-bit mono (4x size reduction)
4. **μ-law Compression** - Compresses PCM audio by 50%
5. **HTTP Transmission** - Sends compressed WAV to Heimdall server
6. **Whisper Transcription** - Server transcribes and returns text
7. **LCD Display** - Shows transcription on Maixduino screen
8. **Button Loop** - Press BOOT button for repeated recordings

**Total size reduction:** 128KB → 32KB (mono) → 16KB (compressed) = **87.5% reduction!**

---

## 🔧 Technical Accomplishments

### Audio Recording Pipeline
- **Initial Problem:** `i2s_dev.record()` returned immediately (1ms instead of 1000ms)
- **Root Cause:** Recording API is asynchronous/non-blocking
- **Solution:** Use chunked recording with `wait_record()` blocking calls
- **Pattern:**
  ```python
  for i in range(frame_cnt):
      audio_chunk = i2s_dev.record(chunk_size)
      i2s_dev.wait_record()  # CRITICAL: blocks until complete
      chunks.append(audio_chunk.to_bytes())
  ```

### Memory Management
- **K210 has very limited RAM** (~6MB total, much less available)
- Successfully handled 128KB → 16KB data transformation without OOM errors
- Techniques used:
  - Record in small chunks (2048 samples)
  - Stream HTTP transmission (512-byte chunks with delays)
  - In-place data conversion where possible
  - Explicit garbage collection hints (`audio_data = None`)

### Network Communication
- **Raw socket HTTP** (no urequests library available)
- **Chunked streaming** with flow control (10ms delays)
- **Simple WAV format** with μ-law compression (format code 7)
- **Robust error handling** with serial output debugging

---

## 🐛 MicroPython/MaixPy Quirks Discovered

### String Operations
- ❌ **F-strings NOT supported** - Must use `"text " + str(var)` concatenation
- ❌ **Ternary operators fail** - Use explicit `if/else` blocks instead
- ❌ **`split()` needs explicit delimiter** - `text.split(" ")` not `text.split()`
- ❌ **Escape sequences problematic** - Avoid `\n` in strings, causes syntax errors

### Data Types & Methods
- ❌ **`decode()` doesn't accept kwargs** - Use `decode('utf-8')` not `decode('utf-8', errors='ignore')`
- ❌ **RGB tuples not accepted** - Must convert to packed integers: `(r << 16) | (g << 8) | b`
- ❌ **Bytearray item deletion unsupported** - `del arr[n:]` fails, use slicing instead
- ❌ **Arithmetic in string concat** - Separate calculations: `next = count + 1; "text" + str(next)`

### I2S Audio Specific
- ❌ **`record()` is non-blocking** - Returns immediately, must use `wait_record()`
- ❌ **Audio object not directly iterable** - Must call `.to_bytes()` first
- ⚠️ **Data format mismatch** - Hardware returns 32-bit stereo even when configured for 16-bit mono (4x expected size)

### Network/WiFi
- ❌ **`network.WLAN` not available** - Must use `network.ESP32_SPI` with full pin config
- ❌ **`active()` method doesn't exist** - Just call `connect()` directly
- ⚠️ **Requires ALL 6 pins configured** - CS, RST, RDY, MOSI, MISO, SCLK

### General Syntax
- ⚠️ **`if __name__ == "__main__"` sometimes causes syntax errors** - Safer to just call `main()` directly
- ⚠️ **Import statements mid-function can cause syntax errors** - Keep imports at top of file
- ⚠️ **Some valid Python causes "invalid syntax" for unknown reasons** - Simplify complex expressions

---

## 📊 Current Status

### ✅ Working
- WiFi connectivity (ESP32 SPI)
- I2S audio initialization
- Chunked audio recording with `wait_record()`
- Audio format detection and conversion (32-bit stereo → 16-bit mono)
- μ-law compression (50% size reduction)
- HTTP transmission to server (chunked streaming)
- Whisper transcription (server-side)
- JSON response parsing
- LCD display (with word wrapping)
- Button-triggered recording loop
- Countdown timer before recording

### ⚠️ Partially Working
- **Recording duration** - Currently getting ~0.9 seconds instead of full 1 second
  - Formula: `frame_cnt = seconds * sample_rate // chunk_size`
  - Current: `7 frames × (2048/16000) = 0.896s`
  - May need to increase `frame_cnt` or adjust chunk size

### ❌ Not Yet Implemented
- Mycroft Precise wake word detection
- Full voice assistant loop
- Command processing
- Home Assistant integration
- Multi-second recording support
- Real-time audio streaming

---

## 🔬 Technical Details

### Hardware Configuration

**Maixduino Board:**
- Processor: K210 dual-core RISC-V @ 400MHz
- RAM: ~6MB total (limited available memory)
- WiFi: ESP32 module via SPI
- Microphone: MSM261S4030H0 MEMS (onboard)
- IP Address: 10.1.10.98

**I2S Pins:**
- Pin 20: I2S0_IN_D0 (data)
- Pin 19: I2S0_WS (word select)
- Pin 18: I2S0_SCLK (clock)

**ESP32 SPI Pins:**
- Pin 25: CS (chip select)
- Pin 8: RST (reset)
- Pin 9: RDY (ready)
- Pin 28: MOSI (master out)
- Pin 26: MISO (master in)
- Pin 27: SCLK (clock)

**GPIO:**
- Pin 16: BOOT button (active low, pull-up)

### Server Configuration

**Heimdall Server:**
- IP: 10.1.10.71
- Port: 3006
- Framework: Flask
- Model: Whisper base
- Environment: Conda `whisper_cli`

**Endpoints:**
- `/health` - Health check
- `/transcribe` - POST audio for transcription

### Audio Format

**Recording:**
- Sample Rate: 16kHz
- Hardware Output: 32-bit stereo (128KB for 1 second)
- After Conversion: 16-bit mono (32KB for 1 second)
- After Compression: 8-bit μ-law (16KB for 1 second)

**WAV Header:**
- Format Code: 7 (μ-law)
- Channels: 1 (mono)
- Sample Rate: 16000 Hz
- Bits per Sample: 8
- Includes `fact` chunk (required for μ-law)

---

## 📝 Code Files

### Main Script
**File:** `/Library/Development/devl/Devops/projects/mycroft-precise/maixduino-scripts/maix_simple_record_test.py`

**Key Functions:**
- `init_wifi()` - ESP32 SPI WiFi connection
- `init_audio()` - I2S microphone setup
- `record_audio()` - Chunked recording with `wait_record()`
- `convert_to_mono_16bit()` - Format conversion (32-bit stereo → 16-bit mono)
- `compress_ulaw()` - μ-law compression
- `create_wav_header()` - WAV file header generation
- `send_to_server()` - HTTP POST with chunked streaming
- `display_transcription()` - LCD output with word wrapping
- `main()` - Button loop for repeated recordings

### Server Script
**File:** `/devl/voice-assistant/simple_transcribe_server.py`

**Features:**
- Accepts raw WAV or multipart uploads
- Whisper base model transcription
- JSON response with transcription text
- Handles μ-law compressed audio

### Documentation
**File:** `/Library/Development/devl/Devops/projects/mycroft-precise/maixduino-scripts/MICROPYTHON_QUIRKS.md`

Complete reference of all MicroPython compatibility issues discovered during development.

---

## 🎯 Next Steps

### Immediate (Tonight)
1. ✅ Switch to Linux laptop with direct serial access
2. ⏭️ Tune recording duration to get full 1 second
   - Try `frame_cnt = 8` instead of 7
   - Or adjust chunk size to get exact timing
3. ⏭️ Test transcription quality with proper-length recordings

### Short Term (This Week)
1. Increase recording duration to 2-3 seconds for better transcription
2. Test memory limits with longer recordings
3. Optimize compression/transmission for speed
4. Add visual feedback during transmission

### Medium Term (Next Week)
1. Install Mycroft Precise in `whisper_cli` environment
2. Test "hey mycroft" wake word detection on server
3. Integrate wake word into recording loop
4. Add command processing and Home Assistant integration

### Long Term (Future)
1. Explore edge wake word detection (Precise on K210)
2. Multi-device deployment
3. Continuous listening mode
4. Voice profiles and speaker identification

---

## 🐛 Known Issues

### Recording Duration
- **Issue:** Recording is ~0.9 seconds instead of 1.0 seconds
- **Cause:** Integer division `16000 // 2048 = 7.8` rounds down to 7 frames
- **Impact:** Minor - transcription still works
- **Fix:** Increase `frame_cnt` to 8 or adjust chunk size

### Data Format Mismatch
- **Issue:** Hardware returns 4x expected data (128KB vs 32KB)
- **Cause:** I2S outputting 32-bit stereo despite 16-bit mono config
- **Impact:** None - conversion function handles it
- **Status:** Working as intended

### Syntax Error Sensitivity
- **Issue:** Some valid Python causes "invalid syntax" in MicroPython
- **Patterns:** Import statements mid-function, certain arithmetic expressions
- **Workaround:** Simplify code, avoid complex expressions
- **Status:** Documented in MICROPYTHON_QUIRKS.md

---

## 💡 Key Learnings

### I2S Recording Pattern
The correct pattern for MaixPy I2S recording:
```python
chunk_size = 2048
frame_cnt = seconds * sample_rate // chunk_size

for i in range(frame_cnt):
    audio_chunk = i2s_dev.record(chunk_size)
    i2s_dev.wait_record()  # BLOCKS until recording complete
    data.append(audio_chunk.to_bytes())
```

**Critical:** `wait_record()` is REQUIRED or recording returns immediately!

### Memory Management
K210 has very limited RAM. Successful strategies:
- Work in small chunks (512-2048 bytes)
- Stream data instead of buffering
- Free variables explicitly when done
- Avoid creating large intermediate buffers

### MicroPython Compatibility
MicroPython is NOT Python. Many standard features missing:
- F-strings, ternary operators, keyword arguments
- Some string methods, complex expressions
- Standard libraries (urequests, json parsing)

**Rule:** Test incrementally, simplify everything, check quirks doc.

---

## 📚 Resources Used

### Documentation
- [MaixPy I2S API Reference](https://wiki.sipeed.com/soft/maixpy/en/api_reference/Maix/i2s.html)
- [MaixPy I2S Usage Guide](https://wiki.sipeed.com/soft/maixpy/en/modules/on_chip/i2s.html)
- [Maixduino Hardware Wiki](https://wiki.sipeed.com/hardware/en/maix/maixpy_develop_kit_board/maix_duino.html)

### Code Examples
- [Official record_wav.py](https://github.com/sipeed/MaixPy-v1_scripts/blob/master/multimedia/audio/record_wav.py)
- [MaixPy Scripts Repository](https://github.com/sipeed/MaixPy-v1_scripts)

### Tools
- MaixPy IDE (copy/paste to board)
- Serial monitor (debugging)
- Heimdall server (Whisper transcription)

---

## 🔄 Ready for Next Session

### Current State
- ✅ Code is working and stable
- ✅ Can record, compress, transmit, transcribe, display
- ✅ Button loop allows repeated testing
- ⚠️ Recording duration slightly short (~0.9s)

### Files Ready
- `/Library/Development/devl/Devops/projects/mycroft-precise/maixduino-scripts/maix_simple_record_test.py`
- `/Library/Development/devl/Devops/projects/mycroft-precise/maixduino-scripts/MICROPYTHON_QUIRKS.md`
- `/devl/voice-assistant/simple_transcribe_server.py`

### For Serial Access Session
1. Connect Maixduino via USB to Linux laptop
2. Install pyserial: `pip install pyserial`
3. Find device: `ls /dev/ttyUSB*` or `/dev/ttyACM*`
4. Connect: `screen /dev/ttyUSB0 115200` or use MaixPy IDE
5. Can directly modify code, test immediately, see serial output

### Quick Test Commands
```python
# Test WiFi
from network import ESP32_SPI
# ... (full init code in maix_test_simple.py)

# Test I2S
from Maix import I2S
rx = I2S(I2S.DEVICE_0)
# ...

# Test recording
audio = rx.record(2048)
rx.wait_record()
print(len(audio.to_bytes()))
```

---

## 🎊 Success Metrics

Today we achieved:
- ✅ WiFi connection working
- ✅ Audio recording working (with proper blocking)
- ✅ Format conversion working (4x reduction)
- ✅ Compression working (2x reduction)
- ✅ Network transmission working (chunked streaming)
- ✅ Server transcription working
- ✅ Display output working
- ✅ Button loop working
- ✅ End-to-end pipeline complete!

**Total:** 9/9 core features working! 🚀

Minor tuning needed, but the foundation is solid and ready for wake word integration.

---

**Session Summary:** Massive progress! From zero to working audio transcription pipeline in one session. Overcame significant MicroPython compatibility challenges and memory limitations. Ready for next phase: wake word detection.

**Status:** ✅ Ready for Linux serial access and fine-tuning
**Next Session:** Tune recording duration, then integrate Mycroft Precise wake word detection

---

*End of Session Report - 2025-12-03*