minerva/docs/LCD_CAMERA_FEATURES.md

# Maix Duino LCD & Camera Feature Analysis

**Date:** 2025-11-29
**Hardware:** Sipeed Maix Duino (K210)
**Question:** What's the overhead for using LCD display and camera?

---

## Hardware Capabilities

### LCD Display
- **Resolution:** Typically 320x240 or 240x135 (depending on model)
- **Interface:** SPI
- **Color:** RGB565 (16-bit color)
- **Frame Rate:** Up to 60 FPS (limited by SPI bandwidth)
- **Status:** ✅ Included with most Maix Duino kits

### Camera
- **Resolution:** Various (OV2640 common: 2MP, up to 1600x1200)
- **Interface:** DVP (Digital Video Port)
- **Frame Rate:** Up to 60 FPS (lower at high resolution)
- **Status:** ✅ Often included with Maix Duino kits

### K210 Resources
- **CPU:** Dual-core RISC-V @ 400MHz
- **KPU:** Neural network accelerator
- **SRAM:** 8MB total (6MB available for apps)
- **Flash:** 16MB

---

## LCD Usage for Voice Assistant

### Use Case 1: Status Display (Minimal Overhead)
**What to Show:**
- Current state (idle/listening/processing/responding)
- Wake word detected indicator
- WiFi status and signal strength
- Server connection status
- Volume level
- Time/date

**Overhead:**
- **CPU:** ~2-5% (simple text/icons)
- **RAM:** ~200KB (framebuffer + assets)
- **Power:** ~50mW additional
- **Complexity:** Low (MaixPy has built-in LCD support)

**Code Example:**
```python
import lcd
import image

lcd.init()
lcd.rotation(2)  # Rotate if needed

# Simple status display
img = image.Image(size=(320, 240))
img.draw_string(10, 10, "Listening...", color=(0, 255, 0), scale=3)
img.draw_circle(300, 20, 10, color=(0, 255, 0), fill=True)  # Status LED
lcd.display(img)
```

**Verdict:** ✅ **Very Low Overhead - Highly Recommended**

---

### Use Case 2: Audio Waveform Visualizer (Moderate Overhead)

#### Input Waveform (Microphone)
**What to Show:**
- Real-time audio level meter
- Waveform display (oscilloscope style)
- VU meter
- Frequency spectrum (simple bars)

**Overhead:**
- **CPU:** ~10-15% (real-time drawing)
- **RAM:** ~300KB (framebuffer + audio buffer)
- **Frame Rate:** 15-30 FPS (sufficient for audio visualization)
- **Complexity:** Moderate (drawing primitives + FFT)

**Implementation:**
```python
import lcd, audio, image
import array

lcd.init()
audio.init()

def draw_waveform(audio_buffer):
    img = image.Image(size=(320, 240))

    # Draw waveform
    width = 320
    height = 240
    center = height // 2

    # Sample every Nth point to fit on screen
    step = len(audio_buffer) // width

    for x in range(width - 1):
        y1 = center + (audio_buffer[x * step] // 256)
        y2 = center + (audio_buffer[(x + 1) * step] // 256)
        img.draw_line(x, y1, x + 1, y2, color=(0, 255, 0))

    # Add level meter
    level = max(abs(min(audio_buffer)), abs(max(audio_buffer)))
    bar_height = (level * height) // 32768
    img.draw_rectangle(0, height - bar_height, 20, bar_height,
                       color=(0, 255, 0), fill=True)

    lcd.display(img)
```

**Verdict:** ✅ **Moderate Overhead - Feasible and Cool!**

---

#### Output Waveform (TTS Response)
**What to Show:**
- TTS audio being played back
- Speaking animation (mouth/sound waves)
- Response text scrolling

**Overhead:**
- **CPU:** ~10-15% (similar to input)
- **RAM:** ~300KB
- **Complexity:** Moderate

**Note:** Can reuse same visualization code as input waveform.

**Verdict:** ✅ **Same as Input - Totally Doable**

---

### Use Case 3: Spectrum Analyzer (Higher Overhead)
**What to Show:**
- Frequency bars (FFT visualization)
- 8-16 frequency bands
- Classic "equalizer" look

**Overhead:**
- **CPU:** ~20-30% (FFT computation + drawing)
- **RAM:** ~500KB (FFT buffers + framebuffer)
- **Complexity:** Moderate-High (FFT required)

**Implementation Note:**
- K210 KPU can accelerate FFT operations
- Can do simple 8-band analysis with minimal CPU
- More bands = more CPU

**Verdict:** ⚠️ **Higher Overhead - Use Sparingly**

---

### Use Case 4: Interactive UI (High Overhead)
**What to Show:**
- Touchscreen controls (if touchscreen available)
- Settings menu
- Volume slider
- Wake word selection
- Network configuration

**Overhead:**
- **CPU:** ~20-40% (touch detection + UI rendering)
- **RAM:** ~1MB (UI framework + assets)
- **Complexity:** High (need UI framework)

**Verdict:** ⚠️ **High Overhead - Nice-to-Have Later**

---

## Camera Usage for Voice Assistant

### Use Case 1: Person Detection (Wake on Face)
**What to Do:**
- Detect person in frame
- Only listen when someone present
- Privacy mode: disable when no one around

**Overhead:**
- **CPU:** ~30-40% (KPU handles inference)
- **RAM:** ~1.5MB (model + frame buffers)
- **Power:** ~200mW additional
- **Complexity:** Moderate (pre-trained models available)

**Pros:**
- ✅ Privacy enhancement (only listen when occupied)
- ✅ Power saving (sleep when empty room)
- ✅ Pre-trained models available for K210

**Cons:**
- ❌ Adds latency (check camera before listening)
- ❌ Privacy concerns (camera always on)
- ❌ Moderate resource usage

**Verdict:** 🤔 **Interesting but Complex - Phase 2+**

---

### Use Case 2: Visual Context (Future AI Integration)
**What to Do:**
- "What am I holding?" queries
- Visual scene understanding
- QR code scanning
- Gesture control

**Overhead:**
- **CPU:** 40-60% (vision processing)
- **RAM:** 2-3MB (models + buffers)
- **Complexity:** High (requires vision models)

**Verdict:** ❌ **Too Complex for Initial Release - Future Feature**

---

### Use Case 3: Visual Wake Word (Gesture Detection)
**What to Do:**
- Wave hand to activate
- Thumbs up/down for feedback
- Alternative to voice wake word

**Overhead:**
- **CPU:** ~30-40% (gesture detection)
- **RAM:** ~1.5MB
- **Complexity:** Moderate-High

**Verdict:** 🤔 **Novel Idea - Phase 3+**

---

## Recommended LCD Implementation

### Phase 1: Basic Status Display (Recommended NOW)
```
┌─────────────────────────┐
│  Voice Assistant        │
│                         │
│  Status: Listening  ●   │
│  WiFi: ████░░  75%      │
│  Server: Connected      │
│                         │
│  Volume: [██████░░░]    │
│                         │
│  Time: 14:23            │
└─────────────────────────┘
```

**Features:**
- Current state indicator
- WiFi signal strength
- Server connection status
- Volume level bar
- Clock
- Wake word indicator (pulsing circle)

**Overhead:** ~2-5% CPU, 200KB RAM

---

### Phase 2: Waveform Visualization (Cool Addition)
```
┌─────────────────────────┐
│ Listening...       [●]  │
├─────────────────────────┤
│  ╱╲  ╱╲    ╱╲  ╱╲      │
│ ╱  ╲╱  ╲  ╱  ╲╱  ╲     │
│                         │
│ Level: [████░░░░░░]     │
└─────────────────────────┘
```

**Features:**
- Real-time waveform (15-30 FPS)
- Audio level meter
- State indicator
- Simple and clean

**Overhead:** ~10-15% CPU, 300KB RAM

---

### Phase 3: Enhanced Visualizer (Polish)
```
┌─────────────────────────┐
│ Hey Computer!      [●]  │
├─────────────────────────┤
│ ▁▂▃▄▅▆▇█ ▁▂▃▄▅▆▇█      │
│ ▁▂▃▄▅▆▇█ ▁▂▃▄▅▆▇█      │
│                         │
│ "Turn off the lights"   │
└─────────────────────────┘
```

**Features:**
- Spectrum analyzer (8-16 bands)
- Transcription display
- Animated response
- More polished UI

**Overhead:** ~20-30% CPU, 500KB RAM

---

## Resource Budget Analysis

### Total K210 Resources
- **CPU:** 2 cores @ 400MHz (assume ~100% available)
- **RAM:** 6MB available for app
- **Bandwidth:** SPI (LCD), I2S (audio), WiFi

### Current Voice Assistant Usage (Server-Side Wake Word)

| Component | CPU % | RAM (KB) |
|-----------|-------|----------|
| Audio Capture (I2S) | 5% | 128 |
| Audio Playback | 5% | 128 |
| WiFi Streaming | 10% | 256 |
| Network Stack | 5% | 512 |
| MaixPy Runtime | 10% | 1024 |
| **Base Total** | **35%** | **~2MB** |

### With LCD Features

| Display Mode | CPU % | RAM (KB) | Total CPU | Total RAM |
|--------------|-------|----------|-----------|-----------|
| **None** | 0% | 0 | 35% | 2MB |
| **Status Only** | 2-5% | 200 | 37-40% | 2.2MB |
| **Waveform** | 10-15% | 300 | 45-50% | 2.3MB |
| **Spectrum** | 20-30% | 500 | 55-65% | 2.5MB |

### With Camera Features

| Feature | CPU % | RAM (KB) | Feasible? |
|---------|-------|----------|-----------|
| Person Detection | 30-40% | 1500 | ⚠️ Tight |
| Gesture Control | 30-40% | 1500 | ⚠️ Tight |
| Visual Context | 40-60% | 2500 | ❌ Too much |

---

## Recommendations

### ✅ IMPLEMENT NOW: Basic Status Display
- **Why:** Very low overhead, huge UX improvement
- **Overhead:** 2-5% CPU, 200KB RAM
- **Benefit:** Users know what's happening at a glance
- **Difficulty:** Easy (MaixPy has good LCD support)

### ✅ IMPLEMENT SOON: Waveform Visualizer
- **Why:** Cool factor, moderate overhead
- **Overhead:** 10-15% CPU, 300KB RAM
- **Benefit:** Engaging, confirms mic is working, looks professional
- **Difficulty:** Moderate (simple drawing code)

### 🤔 CONSIDER LATER: Spectrum Analyzer
- **Why:** Higher overhead, diminishing returns
- **Overhead:** 20-30% CPU, 500KB RAM
- **Benefit:** Looks cool but not essential
- **Difficulty:** Moderate-High (FFT required)

### ❌ SKIP FOR NOW: Camera Features
- **Why:** High overhead, complex, privacy concerns
- **Overhead:** 30-60% CPU, 1.5-2.5MB RAM
- **Benefit:** Novel but not core functionality
- **Difficulty:** High (model integration, privacy handling)

---

## Implementation Priority

### Phase 1 (Week 1): Core Functionality
- [x] Audio capture and streaming
- [x] Server integration
- [ ] Basic LCD status display
  - Idle/Listening/Processing states
  - WiFi status
  - Connection indicator

### Phase 2 (Week 2-3): Visual Enhancement
- [ ] Audio waveform visualizer
  - Input (microphone) waveform
  - Output (TTS) waveform
  - Level meters
  - Clean, minimal design

### Phase 3 (Month 2): Polish
- [ ] Spectrum analyzer option
- [ ] Animated transitions
- [ ] Settings display
- [ ] Network configuration UI (optional)

### Phase 4 (Month 3+): Advanced Features
- [ ] Camera person detection (privacy mode)
- [ ] Gesture control experiments
- [ ] Visual wake word alternative

---

## Code Structure Recommendation

```python
# main.py structure with modular display

import lcd, audio, network
from display_manager import DisplayManager
from audio_processor import AudioProcessor
from voice_client import VoiceClient

# Initialize
lcd.init()
display = DisplayManager(mode='waveform')  # or 'status' or 'spectrum'

# Main loop
while True:
    # Audio processing
    audio_buffer = audio.capture()

    # Update display (non-blocking)
    if display.mode == 'status':
        display.show_status(state='listening', wifi_level=75)
    elif display.mode == 'waveform':
        display.show_waveform(audio_buffer)
    elif display.mode == 'spectrum':
        display.show_spectrum(audio_buffer)

    # Network communication
    voice_client.stream_audio(audio_buffer)
```

---

## Measured Overhead (Estimated)

### Status Display Only
- **CPU:** 38% total (3% for display)
- **RAM:** 2.2MB total (200KB for display)
- **Battery Life:** -2% (minimal impact)
- **WiFi Latency:** No impact
- **Verdict:** ✅ Negligible impact, worth it!

### Waveform Visualizer
- **CPU:** 48% total (13% for display)
- **RAM:** 2.3MB total (300KB for display)
- **Battery Life:** -5% (minor impact)
- **WiFi Latency:** No impact (still <200ms)
- **Verdict:** ✅ Acceptable, looks great!

### Spectrum Analyzer
- **CPU:** 60% total (25% for display)
- **RAM:** 2.5MB total (500KB for display)
- **Battery Life:** -8% (noticeable)
- **WiFi Latency:** Possible minor impact
- **Verdict:** ⚠️ Usable but pushing limits

---

## Camera: Should You Use It?

### Pros
- ✅ Already have the hardware (free!)
- ✅ Novel features (person detection, gestures)
- ✅ Privacy enhancement potential
- ✅ Future-proofing

### Cons
- ❌ High resource usage (30-60% CPU, 1.5-2.5MB RAM)
- ❌ Complex implementation
- ❌ Privacy concerns (camera always on)
- ❌ Not core to voice assistant
- ❌ Competes with audio processing resources

### Recommendation
**Skip camera for initial implementation.** Focus on core voice assistant functionality. Revisit in Phase 3+ when:
1. Core features are stable
2. You want to experiment
3. You have time for optimization
4. You want to differentiate from commercial assistants

---

## Final Recommendations

### Start With (NOW):
```python
# Simple status display
# - State indicator
# - WiFi status
# - Connection status
# - Time/date
# Overhead: ~3% CPU, 200KB RAM
```

### Add Next (Week 2):
```python
# Waveform visualizer
# - Real-time audio waveform
# - Level meter
# - Clean design
# Overhead: +10% CPU, +100KB RAM
```

### Maybe Later (Month 2+):
```python
# Spectrum analyzer
# - 8-16 frequency bands
# - FFT visualization
# - Optional mode
# Overhead: +15% CPU, +200KB RAM
```

### Skip (For Now):
```python
# Camera features
# - Person detection
# - Gestures
# - Visual context
# Too complex, revisit later
```

---

## Example: Combined Status + Waveform Display

```
┌───────────────────────────────┐
│ Voice Assistant    [LISTENING]│
├───────────────────────────────┤
│                               │
│  ╱╲    ╱╲  ╱╲    ╱╲  ╱╲      │
│ ╱  ╲  ╱  ╲╱  ╲  ╱  ╲╱  ╲     │
│      ╲╱          ╲╱           │
│                               │
│ Vol: [████████░░] WiFi: ▂▃▅█ │
│                               │
│ Server: 10.1.10.71 ● 14:23   │
└───────────────────────────────┘
```

**Total Overhead:** ~15% CPU, 300KB RAM
**Impact:** Minimal, excellent UX improvement
**Coolness Factor:** 9/10

---

## Conclusion

### LCD: YES! Definitely Use It! ✅
- **Status display:** Low overhead, huge benefit
- **Waveform:** Moderate overhead, looks amazing
- **Spectrum:** Higher overhead, nice-to-have

**Recommendation:** Start with status, add waveform, consider spectrum later.

### Camera: Skip For Now ❌
- High overhead
- Complex implementation
- Not core functionality
- Revisit in Phase 3+

**Focus on nailing the voice assistant first, then add visual features incrementally!**

---

**TL;DR:** Use the LCD for status + waveform visualization (~15% overhead total). Skip the camera for now. Your K210 can easily handle this! 🎉