minerva/docs/ESP32_S3_VOICE_ASSISTANT_SPEC.md

# ESP32-S3-Touch-LCD Voice Assistant - Technical Specification

**Date:** 2026-01-01
**Hardware:** Waveshare ESP32-S3-Touch-LCD-1.69
**Display:** 240×280 ST7789V2 with Capacitive Touch
**Framework:** ESP-IDF v5.3.1+ with LVGL 8.4.0+
**Purpose:** Voice assistant endpoint with real-time audio waveform visualization

---

## Overview

Voice assistant client for ESP32-S3 with integrated LVGL-based visual feedback showing:
- Real-time audio waveform during listening
- Wake word detection animation
- Processing/thinking state
- Response state with audio output visualization
- Touch controls for volume, sensitivity, settings

**Architecture:**
```
┌─────────────────────────────────┐
│  ESP32-S3-Touch-LCD-1.69        │
│                                 │
│  ┌──────────────────────────┐  │
│  │   LVGL UI (240×280)      │  │
│  │   - Waveform Canvas      │  │
│  │   - State Indicators     │  │──┐
│  │   - Touch Controls       │  │  │
│  └──────────────────────────┘  │  │
│                                 │  │
│  ┌──────────────────────────┐  │  │ WiFi
│  │   Audio Pipeline         │  │  │ Audio Stream
│  │   - I2S Mic Input        │  │  │
│  │   - I2S Speaker Output   │  │──┤
│  │   - Buffer Management    │  │  │
│  └──────────────────────────┘  │  │
│                                 │  │
│  ┌──────────────────────────┐  │  │
│  │   State Machine          │  │  │
│  │   - Idle → Listening     │  │  │
│  │   - Processing → Speaking│  │──┘
│  └──────────────────────────┘  │
└─────────────────────────────────┘
         │
         │ TCP/HTTP
         ↓
┌─────────────────────────────────┐
│  Heimdall Voice Server          │
│  (10.1.10.71:3006)              │
│                                 │
│  - Mycroft Precise Wake Word    │
│  - Whisper STT                  │
│  - Home Assistant Integration   │
│  - Piper TTS                    │
└─────────────────────────────────┘
```

---

## Visual States & UI Design

### State Machine

```
        ┌─────────┐
        │  IDLE   │ ◄──────────────┐
        └────┬────┘                │
             │                     │
    Wake Word Detected             │
             │                     │
             ↓                     │
      ┌──────────┐                │
      │LISTENING │                │
      └────┬─────┘                │
           │                      │
   End of Speech                  │
           │                      │
           ↓                      │
    ┌───────────┐                │
    │PROCESSING │                │
    └─────┬─────┘                │
          │                      │
    Response Ready               │
          │                      │
          ↓                      │
    ┌──────────┐                │
    │ SPEAKING │ ───────────────┘
    └──────────┘
```

### Visual Feedback Per State

#### 1. IDLE State
**Display:**
- Subtle pulsing ring animation (like Google Home)
- Time display from RTC
- Status icons (WiFi strength, battery level)
- Dim backlight (30-50%)

**Colors:**
- Background: Dark blue (#001F3F)
- Pulse ring: Cyan (#00BFFF)
- Text: White (#FFFFFF)

**LVGL Widgets:**
```c
lv_obj_t *idle_screen;
lv_obj_t *pulse_ring;      // Arc widget, animated rotation
lv_obj_t *time_label;      // Label with RTC time
lv_obj_t *status_bar;      // Container for icons
```

**Animation:**
- Slow pulse: 2-second breathing cycle
- Rotation: 360° over 10 seconds

---

#### 2. LISTENING State
**Display:**
- Real-time audio waveform visualization
- Bright backlight (100%)
- "Listening..." text
- Cancel button (touch)

**Waveform Visualization:**

**Option A: Canvas-Based Waveform (Recommended)**
- Use LVGL `lv_canvas` for custom drawing
- Draw waveform from audio buffer samples
- Scrolling waveform (left-to-right)
- Update rate: 30-60 FPS

**Option B: Bar Chart Spectrum**
- Use `lv_chart` with bar type
- FFT-based spectrum analyzer
- 8-16 bars for frequency bins
- Update rate: 15-30 FPS

**Colors:**
- Background: Dark gray (#1A1A1A)
- Waveform: Green (#00FF00)
- Peak indicators: Yellow (#FFFF00)
- Clipping: Red (#FF0000)

**LVGL Implementation:**
```c
// Canvas-based waveform
lv_obj_t *listening_screen;
lv_obj_t *waveform_canvas;    // 240×180 canvas
lv_obj_t *listening_label;    // "Listening..."
lv_obj_t *cancel_btn;         // Touch to cancel

// Waveform buffer (circular buffer)
#define WAVEFORM_WIDTH 240
#define WAVEFORM_HEIGHT 180
#define WAVEFORM_CENTER (WAVEFORM_HEIGHT / 2)
int16_t waveform_buffer[WAVEFORM_WIDTH];
uint16_t waveform_index = 0;

// Drawing function (called from audio callback)
void draw_waveform(lv_obj_t *canvas, int16_t *audio_samples, size_t count) {
    lv_canvas_fill_bg(canvas, lv_color_hex(0x1A1A1A), LV_OPA_COVER);

    lv_draw_line_dsc_t line_dsc;
    lv_draw_line_dsc_init(&line_dsc);
    line_dsc.color = lv_color_hex(0x00FF00);
    line_dsc.width = 2;

    // Draw waveform line
    for (int x = 0; x < WAVEFORM_WIDTH - 1; x++) {
        int16_t y1 = WAVEFORM_CENTER + (waveform_buffer[x] / 256);
        int16_t y2 = WAVEFORM_CENTER + (waveform_buffer[x + 1] / 256);

        lv_point_t points[] = {{x, y1}, {x + 1, y2}};
        lv_canvas_draw_line(canvas, points, 2, &line_dsc);
    }
}

// Audio callback (I2S task)
void audio_i2s_callback(int16_t *samples, size_t count) {
    // Downsample audio for waveform display
    for (int i = 0; i < count; i += (count / WAVEFORM_WIDTH)) {
        waveform_buffer[waveform_index] = samples[i];
        waveform_index = (waveform_index + 1) % WAVEFORM_WIDTH;
    }

    // Trigger LVGL update (use event or flag)
    xEventGroupSetBits(ui_event_group, WAVEFORM_UPDATE_BIT);
}
```

**Touch Controls:**
- Tap anywhere: Cancel listening
- Swipe down: Lower sensitivity
- Swipe up: Increase sensitivity

---

#### 3. PROCESSING State
**Display:**
- Animated spinner/thinking indicator
- "Processing..." text
- Waveform fades out smoothly

**Animation:**
- Circular spinner with gradient
- Rotation: 360° per 1 second
- Pulsing opacity

**Colors:**
- Background: Dark gray (#1A1A1A)
- Spinner: Blue (#0080FF)
- Text: Light gray (#CCCCCC)

**LVGL Implementation:**
```c
lv_obj_t *processing_screen;
lv_obj_t *spinner;           // lv_spinner widget
lv_obj_t *processing_label;  // "Processing..."

// Transition from listening to processing
void transition_to_processing(void) {
    // Fade out waveform
    lv_anim_t fade_out;
    lv_anim_init(&fade_out);
    lv_anim_set_var(&fade_out, waveform_canvas);
    lv_anim_set_values(&fade_out, LV_OPA_COVER, LV_OPA_TRANSP);
    lv_anim_set_time(&fade_out, 300);
    lv_anim_set_exec_cb(&fade_out, lv_obj_set_style_opa);
    lv_anim_start(&fade_out);

    // Show spinner after fade
    lv_timer_t *timer = lv_timer_create(show_spinner_callback, 300, NULL);
    lv_timer_set_repeat_count(timer, 1);
}
```

---

#### 4. SPEAKING State
**Display:**
- Audio output waveform (TTS playback visualization)
- "Speaking..." or response text snippet
- Volume indicator

**Waveform:**
- Same canvas as LISTENING but different color
- Shows output audio being played
- Synchronized with speaker output

**Colors:**
- Background: Dark gray (#1A1A1A)
- Waveform: Blue (#0080FF)
- Text: White (#FFFFFF)

**LVGL Implementation:**
```c
lv_obj_t *speaking_screen;
lv_obj_t *output_waveform_canvas;  // Same size as input waveform
lv_obj_t *response_label;          // Show part of response text
lv_obj_t *volume_bar;              // lv_bar widget for volume level

// Similar drawing to listening state, but fed from speaker buffer
void draw_output_waveform(lv_obj_t *canvas, int16_t *speaker_samples, size_t count) {
    // Same logic as input waveform, different color
    line_dsc.color = lv_color_hex(0x0080FF);
    // ... draw logic
}
```

**Touch Controls:**
- Tap: Skip response (go back to idle)
- Volume slider: Adjust speaker volume

---

### Additional UI Elements

#### Status Bar (All States)
**Location:** Top 20 pixels
**Contents:**
- WiFi icon + signal strength
- Battery icon + percentage
- Time (from RTC)
- Mute icon (if muted)

**LVGL Implementation:**
```c
lv_obj_t *status_bar;
lv_obj_t *wifi_icon;
lv_obj_t *battery_icon;
lv_obj_t *time_label;
lv_obj_t *mute_icon;

// Update every second
void update_status_bar(lv_timer_t *timer) {
    // Update WiFi strength
    int8_t rssi = wifi_get_rssi();
    lv_img_set_src(wifi_icon, get_wifi_icon(rssi));

    // Update battery
    uint8_t battery_pct = battery_get_percentage();
    lv_img_set_src(battery_icon, get_battery_icon(battery_pct));

    // Update time from RTC
    rtc_time_t time;
    pcf85063_get_time(&time);
    lv_label_set_text_fmt(time_label, "%02d:%02d", time.hour, time.min);
}

// Create timer for status bar updates
lv_timer_create(update_status_bar, 1000, NULL);
```

#### Settings Screen (Touch Access)
**Trigger:** Long-press on idle screen
**Contents:**
- Volume slider
- Brightness slider
- Wake word sensitivity slider
- WiFi settings button
- About/Info button

**LVGL Implementation:**
```c
lv_obj_t *settings_screen;
lv_obj_t *volume_slider;
lv_obj_t *brightness_slider;
lv_obj_t *sensitivity_slider;
lv_obj_t *wifi_btn;
lv_obj_t *about_btn;
lv_obj_t *back_btn;

// Slider event handler
static void slider_event_cb(lv_event_t *e) {
    lv_obj_t *slider = lv_event_get_target(e);
    int32_t value = lv_slider_get_value(slider);

    if (slider == volume_slider) {
        set_speaker_volume(value);
    } else if (slider == brightness_slider) {
        set_backlight_brightness(value);
    } else if (slider == sensitivity_slider) {
        set_wake_word_sensitivity(value);
    }
}
```

---

## Audio Pipeline Integration

### I2S Configuration

**Microphone (INMP441):**
```c
#define I2S_MIC_NUM         I2S_NUM_0
#define I2S_MIC_BCLK_PIN    GPIO_NUM_4   // Verify with board schematic
#define I2S_MIC_WS_PIN      GPIO_NUM_5
#define I2S_MIC_DIN_PIN     GPIO_NUM_6
#define I2S_MIC_SAMPLE_RATE 16000
#define I2S_MIC_BITS        16
#define I2S_MIC_CHANNELS    1

i2s_config_t i2s_mic_config = {
    .mode = I2S_MODE_MASTER | I2S_MODE_RX,
    .sample_rate = I2S_MIC_SAMPLE_RATE,
    .bits_per_sample = I2S_BITS_PER_SAMPLE_16BIT,
    .channel_format = I2S_CHANNEL_FMT_ONLY_LEFT,
    .communication_format = I2S_COMM_FORMAT_STAND_I2S,
    .intr_alloc_flags = ESP_INTR_FLAG_LEVEL1,
    .dma_buf_count = 8,
    .dma_buf_len = 256,
    .use_apll = false,
    .tx_desc_auto_clear = false,
    .fixed_mclk = 0
};

i2s_pin_config_t i2s_mic_pins = {
    .bck_io_num = I2S_MIC_BCLK_PIN,
    .ws_io_num = I2S_MIC_WS_PIN,
    .data_out_num = I2S_PIN_NO_CHANGE,
    .data_in_num = I2S_MIC_DIN_PIN
};

void audio_init_microphone(void) {
    i2s_driver_install(I2S_MIC_NUM, &i2s_mic_config, 0, NULL);
    i2s_set_pin(I2S_MIC_NUM, &i2s_mic_pins);
    i2s_zero_dma_buffer(I2S_MIC_NUM);
}
```

**Speaker (MAX98357A I2S Amp):**
```c
#define I2S_SPK_NUM         I2S_NUM_1
#define I2S_SPK_BCLK_PIN    GPIO_NUM_7   // Verify with board schematic
#define I2S_SPK_WS_PIN      GPIO_NUM_8
#define I2S_SPK_DOUT_PIN    GPIO_NUM_9
#define I2S_SPK_SAMPLE_RATE 16000
#define I2S_SPK_BITS        16
#define I2S_SPK_CHANNELS    1

i2s_config_t i2s_spk_config = {
    .mode = I2S_MODE_MASTER | I2S_MODE_TX,
    .sample_rate = I2S_SPK_SAMPLE_RATE,
    .bits_per_sample = I2S_BITS_PER_SAMPLE_16BIT,
    .channel_format = I2S_CHANNEL_FMT_ONLY_LEFT,
    .communication_format = I2S_COMM_FORMAT_STAND_I2S,
    .intr_alloc_flags = ESP_INTR_FLAG_LEVEL1,
    .dma_buf_count = 8,
    .dma_buf_len = 256,
    .use_apll = false,
    .tx_desc_auto_clear = true,
    .fixed_mclk = 0
};

i2s_pin_config_t i2s_spk_pins = {
    .bck_io_num = I2S_SPK_BCLK_PIN,
    .ws_io_num = I2S_SPK_WS_PIN,
    .data_out_num = I2S_SPK_DOUT_PIN,
    .data_in_num = I2S_PIN_NO_CHANGE
};

void audio_init_speaker(void) {
    i2s_driver_install(I2S_SPK_NUM, &i2s_spk_config, 0, NULL);
    i2s_set_pin(I2S_SPK_NUM, &i2s_spk_pins);
    i2s_zero_dma_buffer(I2S_SPK_NUM);
}
```

### Audio Buffer Management

**Circular Buffer for Waveform:**
```c
#define AUDIO_BUFFER_SIZE 2048
#define WAVEFORM_DECIMATION 8  // Downsample for display

typedef struct {
    int16_t samples[AUDIO_BUFFER_SIZE];
    uint16_t write_idx;
    uint16_t read_idx;
    SemaphoreHandle_t mutex;
} audio_buffer_t;

audio_buffer_t mic_buffer;
audio_buffer_t spk_buffer;

void audio_buffer_init(audio_buffer_t *buf) {
    memset(buf->samples, 0, sizeof(buf->samples));
    buf->write_idx = 0;
    buf->read_idx = 0;
    buf->mutex = xSemaphoreCreateMutex();
}

void audio_buffer_write(audio_buffer_t *buf, int16_t *samples, size_t count) {
    xSemaphoreTake(buf->mutex, portMAX_DELAY);
    for (size_t i = 0; i < count; i++) {
        buf->samples[buf->write_idx] = samples[i];
        buf->write_idx = (buf->write_idx + 1) % AUDIO_BUFFER_SIZE;
    }
    xSemaphoreGive(buf->mutex);
}

// Get downsampled samples for waveform display
void audio_buffer_get_waveform(audio_buffer_t *buf, int16_t *out, size_t out_count) {
    xSemaphoreTake(buf->mutex, portMAX_DELAY);
    for (size_t i = 0; i < out_count; i++) {
        size_t src_idx = (buf->write_idx + (i * WAVEFORM_DECIMATION)) % AUDIO_BUFFER_SIZE;
        out[i] = buf->samples[src_idx];
    }
    xSemaphoreGive(buf->mutex);
}
```

### Audio Streaming Task

**Microphone Input Task:**
```c
void audio_mic_task(void *pvParameters) {
    int16_t i2s_buffer[256];
    size_t bytes_read;

    while (1) {
        // Read from I2S microphone
        i2s_read(I2S_MIC_NUM, i2s_buffer, sizeof(i2s_buffer), &bytes_read, portMAX_DELAY);
        size_t samples_read = bytes_read / sizeof(int16_t);

        if (current_state == STATE_LISTENING) {
            // Write to circular buffer for waveform display
            audio_buffer_write(&mic_buffer, i2s_buffer, samples_read);

            // Send to Heimdall server via WiFi
            audio_send_to_server(i2s_buffer, samples_read);

            // Trigger waveform update
            xEventGroupSetBits(ui_event_group, WAVEFORM_UPDATE_BIT);
        }
    }
}
```

**Speaker Output Task:**
```c
void audio_speaker_task(void *pvParameters) {
    int16_t i2s_buffer[256];
    size_t bytes_written;

    while (1) {
        // Receive audio from Heimdall server
        size_t samples_received = audio_receive_from_server(i2s_buffer, 256);

        if (samples_received > 0 && current_state == STATE_SPEAKING) {
            // Write to circular buffer for waveform display
            audio_buffer_write(&spk_buffer, i2s_buffer, samples_received);

            // Play through I2S speaker
            i2s_write(I2S_SPK_NUM, i2s_buffer, samples_received * sizeof(int16_t),
                     &bytes_written, portMAX_DELAY);

            // Trigger waveform update
            xEventGroupSetBits(ui_event_group, WAVEFORM_UPDATE_BIT);
        } else {
            vTaskDelay(pdMS_TO_TICKS(10));
        }
    }
}
```

### LVGL Update Task

**Waveform Rendering Task:**
```c
void lvgl_waveform_task(void *pvParameters) {
    int16_t waveform_samples[WAVEFORM_WIDTH];

    while (1) {
        // Wait for waveform update event
        EventBits_t bits = xEventGroupWaitBits(ui_event_group, WAVEFORM_UPDATE_BIT,
                                               pdTRUE, pdFALSE, pdMS_TO_TICKS(50));

        if (bits & WAVEFORM_UPDATE_BIT) {
            if (current_state == STATE_LISTENING) {
                // Get downsampled mic data
                audio_buffer_get_waveform(&mic_buffer, waveform_samples, WAVEFORM_WIDTH);

                // Draw on LVGL canvas (must lock LVGL)
                lvgl_lock();
                draw_waveform(waveform_canvas, waveform_samples, WAVEFORM_WIDTH);
                lvgl_unlock();

            } else if (current_state == STATE_SPEAKING) {
                // Get downsampled speaker data
                audio_buffer_get_waveform(&spk_buffer, waveform_samples, WAVEFORM_WIDTH);

                lvgl_lock();
                draw_output_waveform(output_waveform_canvas, waveform_samples, WAVEFORM_WIDTH);
                lvgl_unlock();
            }
        }
    }
}
```

---

## Touch Gesture Integration

### Touch Controller (CST816D)

**Gestures Supported:**
- Single tap
- Long press
- Swipe up/down/left/right

**Implementation:**
```c
#define TOUCH_I2C_NUM       I2C_NUM_0
#define TOUCH_SDA_PIN       GPIO_NUM_6
#define TOUCH_SCL_PIN       GPIO_NUM_7
#define TOUCH_INT_PIN       GPIO_NUM_9
#define TOUCH_RST_PIN       GPIO_NUM_10

typedef enum {
    GESTURE_NONE = 0,
    GESTURE_TAP,
    GESTURE_LONG_PRESS,
    GESTURE_SWIPE_UP,
    GESTURE_SWIPE_DOWN,
    GESTURE_SWIPE_LEFT,
    GESTURE_SWIPE_RIGHT
} touch_gesture_t;

void touch_init(void) {
    // I2C init for CST816D
    i2c_config_t conf = {
        .mode = I2C_MODE_MASTER,
        .sda_io_num = TOUCH_SDA_PIN,
        .scl_io_num = TOUCH_SCL_PIN,
        .sda_pullup_en = GPIO_PULLUP_ENABLE,
        .scl_pullup_en = GPIO_PULLUP_ENABLE,
        .master.clk_speed = 100000,
    };
    i2c_param_config(TOUCH_I2C_NUM, &conf);
    i2c_driver_install(TOUCH_I2C_NUM, conf.mode, 0, 0, 0);

    // Reset touch controller
    gpio_set_direction(TOUCH_RST_PIN, GPIO_MODE_OUTPUT);
    gpio_set_level(TOUCH_RST_PIN, 0);
    vTaskDelay(pdMS_TO_TICKS(10));
    gpio_set_level(TOUCH_RST_PIN, 1);
    vTaskDelay(pdMS_TO_TICKS(50));

    // Configure interrupt pin
    gpio_set_direction(TOUCH_INT_PIN, GPIO_MODE_INPUT);
    gpio_set_intr_type(TOUCH_INT_PIN, GPIO_INTR_NEGEDGE);
    gpio_install_isr_service(0);
    gpio_isr_handler_add(TOUCH_INT_PIN, touch_isr_handler, NULL);
}

touch_gesture_t touch_read_gesture(void) {
    uint8_t data[8];
    // Read gesture from CST816D register 0x01
    i2c_master_read_from_device(TOUCH_I2C_NUM, CST816D_ADDR, 0x01, data, 8, pdMS_TO_TICKS(100));
    return (touch_gesture_t)data[0];
}
```

### Gesture Actions by State

**IDLE State:**
- **Tap:** Wake up display (if dimmed)
- **Long Press:** Open settings screen
- **Swipe Up:** Show more info (weather, calendar)

**LISTENING State:**
- **Tap:** Cancel listening, return to idle
- **Swipe Down:** Lower wake word sensitivity
- **Swipe Up:** Raise wake word sensitivity

**SPEAKING State:**
- **Tap:** Skip response, return to idle
- **Swipe Left/Right:** Volume down/up

**PROCESSING State:**
- **Tap:** Cancel processing (if possible)

---

## Network Communication

### WiFi Configuration

**Connection:**
```c
#define WIFI_SSID           "YourNetworkName"
#define WIFI_PASSWORD       "YourPassword"
#define SERVER_URL          "http://10.1.10.71:3006"

void wifi_init(void) {
    esp_netif_init();
    esp_event_loop_create_default();
    esp_netif_create_default_wifi_sta();

    wifi_init_config_t cfg = WIFI_INIT_CONFIG_DEFAULT();
    esp_wifi_init(&cfg);

    wifi_config_t wifi_config = {
        .sta = {
            .ssid = WIFI_SSID,
            .password = WIFI_PASSWORD,
        },
    };

    esp_wifi_set_mode(WIFI_MODE_STA);
    esp_wifi_set_config(WIFI_IF_STA, &wifi_config);
    esp_wifi_start();
    esp_wifi_connect();
}
```

### Server Communication Protocol

**Endpoints:**
- `GET /health` - Server health check
- `POST /audio/stream` - Stream audio to server (multipart)
- `GET /audio/tts` - Receive TTS audio response
- `GET /wake-word/status` - Check wake word detection status

**Audio Streaming (WebSockets Recommended):**
```c
#include "esp_websocket_client.h"

esp_websocket_client_handle_t ws_client;

void websocket_init(void) {
    esp_websocket_client_config_t ws_cfg = {
        .uri = "ws://10.1.10.71:3006/ws/audio",
        .buffer_size = 2048,
    };

    ws_client = esp_websocket_client_init(&ws_cfg);
    esp_websocket_register_events(ws_client, WEBSOCKET_EVENT_ANY,
                                   websocket_event_handler, NULL);
    esp_websocket_client_start(ws_client);
}

void audio_send_to_server(int16_t *samples, size_t count) {
    if (esp_websocket_client_is_connected(ws_client)) {
        esp_websocket_client_send_bin(ws_client, (char*)samples,
                                     count * sizeof(int16_t), portMAX_DELAY);
    }
}

size_t audio_receive_from_server(int16_t *out_buffer, size_t max_samples) {
    // Receive audio from server (blocking with timeout)
    int len = esp_websocket_client_recv(ws_client, (char*)out_buffer,
                                       max_samples * sizeof(int16_t), pdMS_TO_TICKS(100));
    return (len > 0) ? (len / sizeof(int16_t)) : 0;
}
```

**Alternative: HTTP Chunked Transfer (Simpler):**
```c
void audio_stream_http(void) {
    esp_http_client_config_t config = {
        .url = "http://10.1.10.71:3006/audio/stream",
        .method = HTTP_METHOD_POST,
    };
    esp_http_client_handle_t client = esp_http_client_init(&config);

    // Set headers
    esp_http_client_set_header(client, "Content-Type", "audio/pcm");
    esp_http_client_set_header(client, "Transfer-Encoding", "chunked");

    esp_http_client_open(client, -1);  // -1 = chunked mode

    // Stream audio chunks
    int16_t buffer[256];
    while (current_state == STATE_LISTENING) {
        // Read from mic
        size_t bytes_read;
        i2s_read(I2S_MIC_NUM, buffer, sizeof(buffer), &bytes_read, portMAX_DELAY);

        // Send to server
        esp_http_client_write(client, (char*)buffer, bytes_read);
    }

    esp_http_client_close(client);
    esp_http_client_cleanup(client);
}
```

---

## Power Management

### Battery Monitoring

**ETA6098 Charging Chip:**
```c
#define BATTERY_ADC_CHANNEL ADC1_CHANNEL_0  // GPIO1 (example)
#define BATTERY_FULL_MV     4200
#define BATTERY_EMPTY_MV    3300

void battery_init(void) {
    adc1_config_width(ADC_WIDTH_BIT_12);
    adc1_config_channel_atten(BATTERY_ADC_CHANNEL, ADC_ATTEN_DB_11);
}

uint8_t battery_get_percentage(void) {
    int adc_reading = adc1_get_raw(BATTERY_ADC_CHANNEL);
    int voltage_mv = esp_adc_cal_raw_to_voltage(adc_reading, &adc_chars);

    if (voltage_mv >= BATTERY_FULL_MV) return 100;
    if (voltage_mv <= BATTERY_EMPTY_MV) return 0;

    return ((voltage_mv - BATTERY_EMPTY_MV) * 100) / (BATTERY_FULL_MV - BATTERY_EMPTY_MV);
}

bool battery_is_charging(void) {
    // Check SYS_OUT pin (GPIO36) - high when charging
    gpio_set_direction(GPIO_NUM_36, GPIO_MODE_INPUT);
    return gpio_get_level(GPIO_NUM_36);
}
```

### Low Power Modes

**Deep Sleep When Idle (Optional):**
```c
#define IDLE_TIMEOUT_MS 300000  // 5 minutes

void enter_deep_sleep(void) {
    // Save state to RTC memory
    RTC_DATA_ATTR static uint32_t boot_count = 0;
    boot_count++;

    // Configure wake sources
    esp_sleep_enable_ext0_wakeup(TOUCH_INT_PIN, 0);  // Wake on touch
    esp_sleep_enable_timer_wakeup(3600 * 1000000ULL); // Wake every hour

    // Turn off display
    gpio_set_level(LCD_BL_PIN, 0);

    // Enter deep sleep
    esp_deep_sleep_start();
}
```

---

## Performance Optimization

### LVGL Performance

**Buffer Configuration:**
```c
#define LVGL_BUFFER_SIZE (240 * 280 * 2)  // Full screen buffer

static lv_color_t buf_1[LVGL_BUFFER_SIZE / 10];  // 1/10 screen buffer
static lv_color_t buf_2[LVGL_BUFFER_SIZE / 10];  // Double buffering

lv_disp_draw_buf_t draw_buf;
lv_disp_draw_buf_init(&draw_buf, buf_1, buf_2, LVGL_BUFFER_SIZE / 10);
```

**Task Priority:**
```c
#define LVGL_TASK_PRIORITY      5
#define AUDIO_MIC_TASK_PRIORITY 10  // Higher priority for audio
#define AUDIO_SPK_TASK_PRIORITY 10
#define WIFI_TASK_PRIORITY      8
#define WAVEFORM_TASK_PRIORITY  4   // Lower priority for visuals

void app_main(void) {
    // Create tasks with priorities
    xTaskCreatePinnedToCore(lvgl_task, "LVGL", 8192, NULL, LVGL_TASK_PRIORITY, NULL, 1);
    xTaskCreatePinnedToCore(audio_mic_task, "MIC", 4096, NULL, AUDIO_MIC_TASK_PRIORITY, NULL, 0);
    xTaskCreatePinnedToCore(audio_speaker_task, "SPK", 4096, NULL, AUDIO_SPK_TASK_PRIORITY, NULL, 0);
    xTaskCreatePinnedToCore(lvgl_waveform_task, "WAVE", 4096, NULL, WAVEFORM_TASK_PRIORITY, NULL, 1);
}
```

**Reduce Waveform Update Rate:**
```c
// Only update waveform at 30 FPS, not every audio sample
#define WAVEFORM_UPDATE_MS 33  // ~30 FPS

void lvgl_waveform_task(void *pvParameters) {
    TickType_t last_update = xTaskGetTickCount();

    while (1) {
        TickType_t now = xTaskGetTickCount();
        if ((now - last_update) >= pdMS_TO_TICKS(WAVEFORM_UPDATE_MS)) {
            // Update waveform
            last_update = now;
        }
        vTaskDelay(pdMS_TO_TICKS(10));
    }
}
```

### Memory Management

**PSRAM Usage:**
```c
// Allocate large buffers in PSRAM (8MB available)
#define AUDIO_LARGE_BUFFER_SIZE (16000 * 10)  // 10 seconds at 16kHz

int16_t *audio_history = heap_caps_malloc(AUDIO_LARGE_BUFFER_SIZE * sizeof(int16_t),
                                          MALLOC_CAP_SPIRAM);

// Check if allocation succeeded
if (audio_history == NULL) {
    ESP_LOGE(TAG, "Failed to allocate PSRAM buffer");
}
```

**Heap Monitoring:**
```c
void log_memory_stats(void) {
    ESP_LOGI(TAG, "Free heap: %d bytes", esp_get_free_heap_size());
    ESP_LOGI(TAG, "Free PSRAM: %d bytes", heap_caps_get_free_size(MALLOC_CAP_SPIRAM));
    ESP_LOGI(TAG, "Min free heap: %d bytes", esp_get_minimum_free_heap_size());
}
```

---

## Example Code Structure

### File Organization

```
esp32_voice_assistant/
├── main/
│   ├── main.c                  # Entry point, task creation
│   ├── audio/
│   │   ├── audio_input.c       # I2S microphone handling
│   │   ├── audio_output.c      # I2S speaker handling
│   │   ├── audio_buffer.c      # Circular buffer management
│   │   └── audio_network.c     # WebSocket/HTTP streaming
│   ├── ui/
│   │   ├── ui_init.c           # LVGL setup, screen creation
│   │   ├── ui_idle.c           # Idle screen UI
│   │   ├── ui_listening.c      # Listening screen + waveform
│   │   ├── ui_processing.c     # Processing screen + spinner
│   │   ├── ui_speaking.c       # Speaking screen + output waveform
│   │   ├── ui_settings.c       # Settings screen
│   │   └── ui_waveform.c       # Waveform drawing functions
│   ├── touch/
│   │   ├── touch_cst816d.c     # Touch controller driver
│   │   └── touch_gestures.c    # Gesture recognition
│   ├── network/
│   │   └── wifi_manager.c      # WiFi connection management
│   ├── power/
│   │   ├── battery.c           # Battery monitoring
│   │   └── power_mgmt.c        # Sleep modes
│   └── state_machine.c         # Voice assistant state machine
├── components/
│   └── lvgl/                   # LVGL library (ESP-IDF component)
├── CMakeLists.txt
└── sdkconfig                   # ESP-IDF configuration
```

### Main Entry Point

```c
// main/main.c
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "esp_log.h"

static const char *TAG = "VOICE_ASSISTANT";

void app_main(void) {
    ESP_LOGI(TAG, "Voice Assistant Starting...");

    // Initialize hardware
    nvs_flash_init();           // Non-volatile storage
    gpio_install_isr_service(0);// GPIO interrupts

    // Power management
    battery_init();

    // Display and touch
    lcd_init();
    touch_init();
    ui_init();

    // Audio pipeline
    audio_init_microphone();
    audio_init_speaker();
    audio_buffer_init(&mic_buffer);
    audio_buffer_init(&spk_buffer);

    // Network
    wifi_init();
    websocket_init();

    // State machine
    state_machine_init();

    // Create FreeRTOS tasks
    xTaskCreatePinnedToCore(lvgl_task, "LVGL", 8192, NULL, 5, NULL, 1);
    xTaskCreatePinnedToCore(audio_mic_task, "MIC", 4096, NULL, 10, NULL, 0);
    xTaskCreatePinnedToCore(audio_speaker_task, "SPK", 4096, NULL, 10, NULL, 0);
    xTaskCreatePinnedToCore(lvgl_waveform_task, "WAVE", 4096, NULL, 4, NULL, 1);
    xTaskCreatePinnedToCore(state_machine_task, "STATE", 4096, NULL, 7, NULL, 0);

    ESP_LOGI(TAG, "Voice Assistant Running!");
}
```

---

## Testing Plan

### Phase 1: Hardware Validation
- [ ] LCD display working (show test pattern)
- [ ] Touch controller responding (log touch coordinates)
- [ ] Buzzer working (play test tone)
- [ ] WiFi connecting (check IP address)
- [ ] Battery reading (log voltage)
- [ ] RTC working (log time)
- [ ] IMU working (log accelerometer values)

### Phase 2: Audio Pipeline
- [ ] I2S microphone reading audio (log levels)
- [ ] Audio streaming to Heimdall server
- [ ] I2S speaker playing audio (test tone)
- [ ] TTS audio playback from server
- [ ] Audio buffer management (no overflows)

### Phase 3: LVGL UI
- [ ] Idle screen displays correctly
- [ ] State transitions smooth
- [ ] Waveform renders at 30 FPS
- [ ] Touch gestures recognized
- [ ] Settings screen functional
- [ ] Status bar updates correctly

### Phase 4: Integration
- [ ] Wake word detection triggers listening state
- [ ] Waveform shows mic input in real-time
- [ ] Processing state shows after speech ends
- [ ] TTS response plays with output waveform
- [ ] Touch cancel works in all states
- [ ] Battery indicator accurate

### Phase 5: Optimization
- [ ] Memory usage stable (no leaks)
- [ ] CPU usage acceptable (<80% average)
- [ ] WiFi latency <100ms
- [ ] Audio latency <200ms end-to-end
- [ ] Display framerate stable (30 FPS)
- [ ] Battery life >4 hours continuous

---

## Bill of Materials (BOM)

| Component | Part Number | Quantity | Unit Price | Total |
|-----------|-------------|----------|------------|-------|
| ESP32-S3-Touch-LCD-1.69 | Waveshare | 1 | $12.00 | $12.00 |
| I2S MEMS Microphone | INMP441 | 1 | $3.50 | $3.50 |
| I2S Amplifier | MAX98357A | 1 | $3.50 | $3.50 |
| Speaker (3W 8Ω) | Generic | 1 | $5.00 | $5.00 |
| LiPo Battery (1000mAh) | 503040 JST 1.25 | 1 | $7.00 | $7.00 |
| MicroSD Card (8GB) | SanDisk | 1 | $5.00 | $5.00 |
| Breadboard + Wires | Generic | 1 | $5.00 | $5.00 |
| **Total** | | | | **$41.00** |

**Optional:**
- Enclosure/Case (3D printed or project box): $5-10
- Backup battery: $7
- USB-C cable: $3

**Grand Total with Options:** ~$56-63

---

## References & Resources

### LVGL Audio Visualization Examples
- **Music Player with FFT Spectrum** - [Instructables Guide](https://www.instructables.com/Design-Music-Player-UI-With-LVGL/)
  - Source: https://github.com/moononournation/LVGL_Music_Player.git
  - Shows FFT-based audio visualization on LVGL canvas

- **LVGL Audio FFT Spectrum (Xiao S3)** - [GitHub: genvex/LVGL_Audio_FFT_Spectrum_xiaoS3_oled](https://github.com/genvex/LVGL_Audio_FFT_Spectrum_xiaoS3_oled)
  - Real-time FFT visualization using low-level LVGL drawing

- **LVGL Audio FFT Spectrum** - [GitHub: imliubo/LVGL_Audio_FFT_Spectrum](https://github.com/imliubo/LVGL_Audio_FFT_Spectrum)
  - Alternative FFT spectrum implementation

- **Moving Waveform Discussion** - [LVGL Forum Thread](https://forum.lvgl.io/t/best-method-to-display-a-moving-waveform/17361)
  - Tips on efficiently displaying moving waveforms

### ESP32-S3 Resources
- **Waveshare Wiki** - https://www.waveshare.com/wiki/ESP32-S3-LCD-1.69
- **LVGL ESP32 Port** - [GitHub: lvgl/lv_port_esp32](https://github.com/lvgl/lv_port_esp32)
- **ESP-IDF Documentation** - https://docs.espressif.com/projects/esp-idf/en/latest/

### Voice Assistant Project
- **Mycroft Precise Documentation** - https://github.com/MycroftAI/mycroft-precise
- **Whisper OpenAI** - https://github.com/openai/whisper
- **Piper TTS** - https://github.com/rhasspy/piper

---

## Next Steps

1. **Order Hardware** - ESP32-S3-Touch-LCD + audio components (~$41)
2. **Setup ESP-IDF** - Install ESP-IDF v5.3.1+ on development machine
3. **Clone Examples** - Get LVGL audio visualization examples for reference
4. **Start Simple** - Begin with LCD + LVGL test (no audio)
5. **Add Audio** - Wire I2S mic, test audio streaming
6. **Waveform MVP** - Get basic waveform rendering working
7. **Full Integration** - Connect to Heimdall voice server
8. **Polish** - Add touch controls, settings, battery support

---

**Version:** 1.0
**Created:** 2026-01-01
**Status:** Specification Complete, Ready for Implementation