# ESP32-S3-Touch-LCD Voice Assistant - Technical Specification **Date:** 2026-01-01 **Hardware:** Waveshare ESP32-S3-Touch-LCD-1.69 **Display:** 240×280 ST7789V2 with Capacitive Touch **Framework:** ESP-IDF v5.3.1+ with LVGL 8.4.0+ **Purpose:** Voice assistant endpoint with real-time audio waveform visualization --- ## Overview Voice assistant client for ESP32-S3 with integrated LVGL-based visual feedback showing: - Real-time audio waveform during listening - Wake word detection animation - Processing/thinking state - Response state with audio output visualization - Touch controls for volume, sensitivity, settings **Architecture:** ``` ┌─────────────────────────────────┐ │ ESP32-S3-Touch-LCD-1.69 │ │ │ │ ┌──────────────────────────┐ │ │ │ LVGL UI (240×280) │ │ │ │ - Waveform Canvas │ │ │ │ - State Indicators │ │──┐ │ │ - Touch Controls │ │ │ │ └──────────────────────────┘ │ │ │ │ │ │ ┌──────────────────────────┐ │ │ WiFi │ │ Audio Pipeline │ │ │ Audio Stream │ │ - I2S Mic Input │ │ │ │ │ - I2S Speaker Output │ │──┤ │ │ - Buffer Management │ │ │ │ └──────────────────────────┘ │ │ │ │ │ │ ┌──────────────────────────┐ │ │ │ │ State Machine │ │ │ │ │ - Idle → Listening │ │ │ │ │ - Processing → Speaking│ │──┘ │ └──────────────────────────┘ │ └─────────────────────────────────┘ │ │ TCP/HTTP ↓ ┌─────────────────────────────────┐ │ Heimdall Voice Server │ │ (10.1.10.71:3006) │ │ │ │ - Mycroft Precise Wake Word │ │ - Whisper STT │ │ - Home Assistant Integration │ │ - Piper TTS │ └─────────────────────────────────┘ ``` --- ## Visual States & UI Design ### State Machine ``` ┌─────────┐ │ IDLE │ ◄──────────────┐ └────┬────┘ │ │ │ Wake Word Detected │ │ │ ↓ │ ┌──────────┐ │ │LISTENING │ │ └────┬─────┘ │ │ │ End of Speech │ │ │ ↓ │ ┌───────────┐ │ │PROCESSING │ │ └─────┬─────┘ │ │ │ Response Ready │ │ │ ↓ │ ┌──────────┐ │ │ SPEAKING │ ───────────────┘ └──────────┘ ``` ### Visual Feedback Per State #### 1. IDLE State **Display:** - Subtle pulsing ring animation (like Google Home) - Time display from RTC - Status icons (WiFi strength, battery level) - Dim backlight (30-50%) **Colors:** - Background: Dark blue (#001F3F) - Pulse ring: Cyan (#00BFFF) - Text: White (#FFFFFF) **LVGL Widgets:** ```c lv_obj_t *idle_screen; lv_obj_t *pulse_ring; // Arc widget, animated rotation lv_obj_t *time_label; // Label with RTC time lv_obj_t *status_bar; // Container for icons ``` **Animation:** - Slow pulse: 2-second breathing cycle - Rotation: 360° over 10 seconds --- #### 2. LISTENING State **Display:** - Real-time audio waveform visualization - Bright backlight (100%) - "Listening..." text - Cancel button (touch) **Waveform Visualization:** **Option A: Canvas-Based Waveform (Recommended)** - Use LVGL `lv_canvas` for custom drawing - Draw waveform from audio buffer samples - Scrolling waveform (left-to-right) - Update rate: 30-60 FPS **Option B: Bar Chart Spectrum** - Use `lv_chart` with bar type - FFT-based spectrum analyzer - 8-16 bars for frequency bins - Update rate: 15-30 FPS **Colors:** - Background: Dark gray (#1A1A1A) - Waveform: Green (#00FF00) - Peak indicators: Yellow (#FFFF00) - Clipping: Red (#FF0000) **LVGL Implementation:** ```c // Canvas-based waveform lv_obj_t *listening_screen; lv_obj_t *waveform_canvas; // 240×180 canvas lv_obj_t *listening_label; // "Listening..." lv_obj_t *cancel_btn; // Touch to cancel // Waveform buffer (circular buffer) #define WAVEFORM_WIDTH 240 #define WAVEFORM_HEIGHT 180 #define WAVEFORM_CENTER (WAVEFORM_HEIGHT / 2) int16_t waveform_buffer[WAVEFORM_WIDTH]; uint16_t waveform_index = 0; // Drawing function (called from audio callback) void draw_waveform(lv_obj_t *canvas, int16_t *audio_samples, size_t count) { lv_canvas_fill_bg(canvas, lv_color_hex(0x1A1A1A), LV_OPA_COVER); lv_draw_line_dsc_t line_dsc; lv_draw_line_dsc_init(&line_dsc); line_dsc.color = lv_color_hex(0x00FF00); line_dsc.width = 2; // Draw waveform line for (int x = 0; x < WAVEFORM_WIDTH - 1; x++) { int16_t y1 = WAVEFORM_CENTER + (waveform_buffer[x] / 256); int16_t y2 = WAVEFORM_CENTER + (waveform_buffer[x + 1] / 256); lv_point_t points[] = {{x, y1}, {x + 1, y2}}; lv_canvas_draw_line(canvas, points, 2, &line_dsc); } } // Audio callback (I2S task) void audio_i2s_callback(int16_t *samples, size_t count) { // Downsample audio for waveform display for (int i = 0; i < count; i += (count / WAVEFORM_WIDTH)) { waveform_buffer[waveform_index] = samples[i]; waveform_index = (waveform_index + 1) % WAVEFORM_WIDTH; } // Trigger LVGL update (use event or flag) xEventGroupSetBits(ui_event_group, WAVEFORM_UPDATE_BIT); } ``` **Touch Controls:** - Tap anywhere: Cancel listening - Swipe down: Lower sensitivity - Swipe up: Increase sensitivity --- #### 3. PROCESSING State **Display:** - Animated spinner/thinking indicator - "Processing..." text - Waveform fades out smoothly **Animation:** - Circular spinner with gradient - Rotation: 360° per 1 second - Pulsing opacity **Colors:** - Background: Dark gray (#1A1A1A) - Spinner: Blue (#0080FF) - Text: Light gray (#CCCCCC) **LVGL Implementation:** ```c lv_obj_t *processing_screen; lv_obj_t *spinner; // lv_spinner widget lv_obj_t *processing_label; // "Processing..." // Transition from listening to processing void transition_to_processing(void) { // Fade out waveform lv_anim_t fade_out; lv_anim_init(&fade_out); lv_anim_set_var(&fade_out, waveform_canvas); lv_anim_set_values(&fade_out, LV_OPA_COVER, LV_OPA_TRANSP); lv_anim_set_time(&fade_out, 300); lv_anim_set_exec_cb(&fade_out, lv_obj_set_style_opa); lv_anim_start(&fade_out); // Show spinner after fade lv_timer_t *timer = lv_timer_create(show_spinner_callback, 300, NULL); lv_timer_set_repeat_count(timer, 1); } ``` --- #### 4. SPEAKING State **Display:** - Audio output waveform (TTS playback visualization) - "Speaking..." or response text snippet - Volume indicator **Waveform:** - Same canvas as LISTENING but different color - Shows output audio being played - Synchronized with speaker output **Colors:** - Background: Dark gray (#1A1A1A) - Waveform: Blue (#0080FF) - Text: White (#FFFFFF) **LVGL Implementation:** ```c lv_obj_t *speaking_screen; lv_obj_t *output_waveform_canvas; // Same size as input waveform lv_obj_t *response_label; // Show part of response text lv_obj_t *volume_bar; // lv_bar widget for volume level // Similar drawing to listening state, but fed from speaker buffer void draw_output_waveform(lv_obj_t *canvas, int16_t *speaker_samples, size_t count) { // Same logic as input waveform, different color line_dsc.color = lv_color_hex(0x0080FF); // ... draw logic } ``` **Touch Controls:** - Tap: Skip response (go back to idle) - Volume slider: Adjust speaker volume --- ### Additional UI Elements #### Status Bar (All States) **Location:** Top 20 pixels **Contents:** - WiFi icon + signal strength - Battery icon + percentage - Time (from RTC) - Mute icon (if muted) **LVGL Implementation:** ```c lv_obj_t *status_bar; lv_obj_t *wifi_icon; lv_obj_t *battery_icon; lv_obj_t *time_label; lv_obj_t *mute_icon; // Update every second void update_status_bar(lv_timer_t *timer) { // Update WiFi strength int8_t rssi = wifi_get_rssi(); lv_img_set_src(wifi_icon, get_wifi_icon(rssi)); // Update battery uint8_t battery_pct = battery_get_percentage(); lv_img_set_src(battery_icon, get_battery_icon(battery_pct)); // Update time from RTC rtc_time_t time; pcf85063_get_time(&time); lv_label_set_text_fmt(time_label, "%02d:%02d", time.hour, time.min); } // Create timer for status bar updates lv_timer_create(update_status_bar, 1000, NULL); ``` #### Settings Screen (Touch Access) **Trigger:** Long-press on idle screen **Contents:** - Volume slider - Brightness slider - Wake word sensitivity slider - WiFi settings button - About/Info button **LVGL Implementation:** ```c lv_obj_t *settings_screen; lv_obj_t *volume_slider; lv_obj_t *brightness_slider; lv_obj_t *sensitivity_slider; lv_obj_t *wifi_btn; lv_obj_t *about_btn; lv_obj_t *back_btn; // Slider event handler static void slider_event_cb(lv_event_t *e) { lv_obj_t *slider = lv_event_get_target(e); int32_t value = lv_slider_get_value(slider); if (slider == volume_slider) { set_speaker_volume(value); } else if (slider == brightness_slider) { set_backlight_brightness(value); } else if (slider == sensitivity_slider) { set_wake_word_sensitivity(value); } } ``` --- ## Audio Pipeline Integration ### I2S Configuration **Microphone (INMP441):** ```c #define I2S_MIC_NUM I2S_NUM_0 #define I2S_MIC_BCLK_PIN GPIO_NUM_4 // Verify with board schematic #define I2S_MIC_WS_PIN GPIO_NUM_5 #define I2S_MIC_DIN_PIN GPIO_NUM_6 #define I2S_MIC_SAMPLE_RATE 16000 #define I2S_MIC_BITS 16 #define I2S_MIC_CHANNELS 1 i2s_config_t i2s_mic_config = { .mode = I2S_MODE_MASTER | I2S_MODE_RX, .sample_rate = I2S_MIC_SAMPLE_RATE, .bits_per_sample = I2S_BITS_PER_SAMPLE_16BIT, .channel_format = I2S_CHANNEL_FMT_ONLY_LEFT, .communication_format = I2S_COMM_FORMAT_STAND_I2S, .intr_alloc_flags = ESP_INTR_FLAG_LEVEL1, .dma_buf_count = 8, .dma_buf_len = 256, .use_apll = false, .tx_desc_auto_clear = false, .fixed_mclk = 0 }; i2s_pin_config_t i2s_mic_pins = { .bck_io_num = I2S_MIC_BCLK_PIN, .ws_io_num = I2S_MIC_WS_PIN, .data_out_num = I2S_PIN_NO_CHANGE, .data_in_num = I2S_MIC_DIN_PIN }; void audio_init_microphone(void) { i2s_driver_install(I2S_MIC_NUM, &i2s_mic_config, 0, NULL); i2s_set_pin(I2S_MIC_NUM, &i2s_mic_pins); i2s_zero_dma_buffer(I2S_MIC_NUM); } ``` **Speaker (MAX98357A I2S Amp):** ```c #define I2S_SPK_NUM I2S_NUM_1 #define I2S_SPK_BCLK_PIN GPIO_NUM_7 // Verify with board schematic #define I2S_SPK_WS_PIN GPIO_NUM_8 #define I2S_SPK_DOUT_PIN GPIO_NUM_9 #define I2S_SPK_SAMPLE_RATE 16000 #define I2S_SPK_BITS 16 #define I2S_SPK_CHANNELS 1 i2s_config_t i2s_spk_config = { .mode = I2S_MODE_MASTER | I2S_MODE_TX, .sample_rate = I2S_SPK_SAMPLE_RATE, .bits_per_sample = I2S_BITS_PER_SAMPLE_16BIT, .channel_format = I2S_CHANNEL_FMT_ONLY_LEFT, .communication_format = I2S_COMM_FORMAT_STAND_I2S, .intr_alloc_flags = ESP_INTR_FLAG_LEVEL1, .dma_buf_count = 8, .dma_buf_len = 256, .use_apll = false, .tx_desc_auto_clear = true, .fixed_mclk = 0 }; i2s_pin_config_t i2s_spk_pins = { .bck_io_num = I2S_SPK_BCLK_PIN, .ws_io_num = I2S_SPK_WS_PIN, .data_out_num = I2S_SPK_DOUT_PIN, .data_in_num = I2S_PIN_NO_CHANGE }; void audio_init_speaker(void) { i2s_driver_install(I2S_SPK_NUM, &i2s_spk_config, 0, NULL); i2s_set_pin(I2S_SPK_NUM, &i2s_spk_pins); i2s_zero_dma_buffer(I2S_SPK_NUM); } ``` ### Audio Buffer Management **Circular Buffer for Waveform:** ```c #define AUDIO_BUFFER_SIZE 2048 #define WAVEFORM_DECIMATION 8 // Downsample for display typedef struct { int16_t samples[AUDIO_BUFFER_SIZE]; uint16_t write_idx; uint16_t read_idx; SemaphoreHandle_t mutex; } audio_buffer_t; audio_buffer_t mic_buffer; audio_buffer_t spk_buffer; void audio_buffer_init(audio_buffer_t *buf) { memset(buf->samples, 0, sizeof(buf->samples)); buf->write_idx = 0; buf->read_idx = 0; buf->mutex = xSemaphoreCreateMutex(); } void audio_buffer_write(audio_buffer_t *buf, int16_t *samples, size_t count) { xSemaphoreTake(buf->mutex, portMAX_DELAY); for (size_t i = 0; i < count; i++) { buf->samples[buf->write_idx] = samples[i]; buf->write_idx = (buf->write_idx + 1) % AUDIO_BUFFER_SIZE; } xSemaphoreGive(buf->mutex); } // Get downsampled samples for waveform display void audio_buffer_get_waveform(audio_buffer_t *buf, int16_t *out, size_t out_count) { xSemaphoreTake(buf->mutex, portMAX_DELAY); for (size_t i = 0; i < out_count; i++) { size_t src_idx = (buf->write_idx + (i * WAVEFORM_DECIMATION)) % AUDIO_BUFFER_SIZE; out[i] = buf->samples[src_idx]; } xSemaphoreGive(buf->mutex); } ``` ### Audio Streaming Task **Microphone Input Task:** ```c void audio_mic_task(void *pvParameters) { int16_t i2s_buffer[256]; size_t bytes_read; while (1) { // Read from I2S microphone i2s_read(I2S_MIC_NUM, i2s_buffer, sizeof(i2s_buffer), &bytes_read, portMAX_DELAY); size_t samples_read = bytes_read / sizeof(int16_t); if (current_state == STATE_LISTENING) { // Write to circular buffer for waveform display audio_buffer_write(&mic_buffer, i2s_buffer, samples_read); // Send to Heimdall server via WiFi audio_send_to_server(i2s_buffer, samples_read); // Trigger waveform update xEventGroupSetBits(ui_event_group, WAVEFORM_UPDATE_BIT); } } } ``` **Speaker Output Task:** ```c void audio_speaker_task(void *pvParameters) { int16_t i2s_buffer[256]; size_t bytes_written; while (1) { // Receive audio from Heimdall server size_t samples_received = audio_receive_from_server(i2s_buffer, 256); if (samples_received > 0 && current_state == STATE_SPEAKING) { // Write to circular buffer for waveform display audio_buffer_write(&spk_buffer, i2s_buffer, samples_received); // Play through I2S speaker i2s_write(I2S_SPK_NUM, i2s_buffer, samples_received * sizeof(int16_t), &bytes_written, portMAX_DELAY); // Trigger waveform update xEventGroupSetBits(ui_event_group, WAVEFORM_UPDATE_BIT); } else { vTaskDelay(pdMS_TO_TICKS(10)); } } } ``` ### LVGL Update Task **Waveform Rendering Task:** ```c void lvgl_waveform_task(void *pvParameters) { int16_t waveform_samples[WAVEFORM_WIDTH]; while (1) { // Wait for waveform update event EventBits_t bits = xEventGroupWaitBits(ui_event_group, WAVEFORM_UPDATE_BIT, pdTRUE, pdFALSE, pdMS_TO_TICKS(50)); if (bits & WAVEFORM_UPDATE_BIT) { if (current_state == STATE_LISTENING) { // Get downsampled mic data audio_buffer_get_waveform(&mic_buffer, waveform_samples, WAVEFORM_WIDTH); // Draw on LVGL canvas (must lock LVGL) lvgl_lock(); draw_waveform(waveform_canvas, waveform_samples, WAVEFORM_WIDTH); lvgl_unlock(); } else if (current_state == STATE_SPEAKING) { // Get downsampled speaker data audio_buffer_get_waveform(&spk_buffer, waveform_samples, WAVEFORM_WIDTH); lvgl_lock(); draw_output_waveform(output_waveform_canvas, waveform_samples, WAVEFORM_WIDTH); lvgl_unlock(); } } } } ``` --- ## Touch Gesture Integration ### Touch Controller (CST816D) **Gestures Supported:** - Single tap - Long press - Swipe up/down/left/right **Implementation:** ```c #define TOUCH_I2C_NUM I2C_NUM_0 #define TOUCH_SDA_PIN GPIO_NUM_6 #define TOUCH_SCL_PIN GPIO_NUM_7 #define TOUCH_INT_PIN GPIO_NUM_9 #define TOUCH_RST_PIN GPIO_NUM_10 typedef enum { GESTURE_NONE = 0, GESTURE_TAP, GESTURE_LONG_PRESS, GESTURE_SWIPE_UP, GESTURE_SWIPE_DOWN, GESTURE_SWIPE_LEFT, GESTURE_SWIPE_RIGHT } touch_gesture_t; void touch_init(void) { // I2C init for CST816D i2c_config_t conf = { .mode = I2C_MODE_MASTER, .sda_io_num = TOUCH_SDA_PIN, .scl_io_num = TOUCH_SCL_PIN, .sda_pullup_en = GPIO_PULLUP_ENABLE, .scl_pullup_en = GPIO_PULLUP_ENABLE, .master.clk_speed = 100000, }; i2c_param_config(TOUCH_I2C_NUM, &conf); i2c_driver_install(TOUCH_I2C_NUM, conf.mode, 0, 0, 0); // Reset touch controller gpio_set_direction(TOUCH_RST_PIN, GPIO_MODE_OUTPUT); gpio_set_level(TOUCH_RST_PIN, 0); vTaskDelay(pdMS_TO_TICKS(10)); gpio_set_level(TOUCH_RST_PIN, 1); vTaskDelay(pdMS_TO_TICKS(50)); // Configure interrupt pin gpio_set_direction(TOUCH_INT_PIN, GPIO_MODE_INPUT); gpio_set_intr_type(TOUCH_INT_PIN, GPIO_INTR_NEGEDGE); gpio_install_isr_service(0); gpio_isr_handler_add(TOUCH_INT_PIN, touch_isr_handler, NULL); } touch_gesture_t touch_read_gesture(void) { uint8_t data[8]; // Read gesture from CST816D register 0x01 i2c_master_read_from_device(TOUCH_I2C_NUM, CST816D_ADDR, 0x01, data, 8, pdMS_TO_TICKS(100)); return (touch_gesture_t)data[0]; } ``` ### Gesture Actions by State **IDLE State:** - **Tap:** Wake up display (if dimmed) - **Long Press:** Open settings screen - **Swipe Up:** Show more info (weather, calendar) **LISTENING State:** - **Tap:** Cancel listening, return to idle - **Swipe Down:** Lower wake word sensitivity - **Swipe Up:** Raise wake word sensitivity **SPEAKING State:** - **Tap:** Skip response, return to idle - **Swipe Left/Right:** Volume down/up **PROCESSING State:** - **Tap:** Cancel processing (if possible) --- ## Network Communication ### WiFi Configuration **Connection:** ```c #define WIFI_SSID "YourNetworkName" #define WIFI_PASSWORD "YourPassword" #define SERVER_URL "http://10.1.10.71:3006" void wifi_init(void) { esp_netif_init(); esp_event_loop_create_default(); esp_netif_create_default_wifi_sta(); wifi_init_config_t cfg = WIFI_INIT_CONFIG_DEFAULT(); esp_wifi_init(&cfg); wifi_config_t wifi_config = { .sta = { .ssid = WIFI_SSID, .password = WIFI_PASSWORD, }, }; esp_wifi_set_mode(WIFI_MODE_STA); esp_wifi_set_config(WIFI_IF_STA, &wifi_config); esp_wifi_start(); esp_wifi_connect(); } ``` ### Server Communication Protocol **Endpoints:** - `GET /health` - Server health check - `POST /audio/stream` - Stream audio to server (multipart) - `GET /audio/tts` - Receive TTS audio response - `GET /wake-word/status` - Check wake word detection status **Audio Streaming (WebSockets Recommended):** ```c #include "esp_websocket_client.h" esp_websocket_client_handle_t ws_client; void websocket_init(void) { esp_websocket_client_config_t ws_cfg = { .uri = "ws://10.1.10.71:3006/ws/audio", .buffer_size = 2048, }; ws_client = esp_websocket_client_init(&ws_cfg); esp_websocket_register_events(ws_client, WEBSOCKET_EVENT_ANY, websocket_event_handler, NULL); esp_websocket_client_start(ws_client); } void audio_send_to_server(int16_t *samples, size_t count) { if (esp_websocket_client_is_connected(ws_client)) { esp_websocket_client_send_bin(ws_client, (char*)samples, count * sizeof(int16_t), portMAX_DELAY); } } size_t audio_receive_from_server(int16_t *out_buffer, size_t max_samples) { // Receive audio from server (blocking with timeout) int len = esp_websocket_client_recv(ws_client, (char*)out_buffer, max_samples * sizeof(int16_t), pdMS_TO_TICKS(100)); return (len > 0) ? (len / sizeof(int16_t)) : 0; } ``` **Alternative: HTTP Chunked Transfer (Simpler):** ```c void audio_stream_http(void) { esp_http_client_config_t config = { .url = "http://10.1.10.71:3006/audio/stream", .method = HTTP_METHOD_POST, }; esp_http_client_handle_t client = esp_http_client_init(&config); // Set headers esp_http_client_set_header(client, "Content-Type", "audio/pcm"); esp_http_client_set_header(client, "Transfer-Encoding", "chunked"); esp_http_client_open(client, -1); // -1 = chunked mode // Stream audio chunks int16_t buffer[256]; while (current_state == STATE_LISTENING) { // Read from mic size_t bytes_read; i2s_read(I2S_MIC_NUM, buffer, sizeof(buffer), &bytes_read, portMAX_DELAY); // Send to server esp_http_client_write(client, (char*)buffer, bytes_read); } esp_http_client_close(client); esp_http_client_cleanup(client); } ``` --- ## Power Management ### Battery Monitoring **ETA6098 Charging Chip:** ```c #define BATTERY_ADC_CHANNEL ADC1_CHANNEL_0 // GPIO1 (example) #define BATTERY_FULL_MV 4200 #define BATTERY_EMPTY_MV 3300 void battery_init(void) { adc1_config_width(ADC_WIDTH_BIT_12); adc1_config_channel_atten(BATTERY_ADC_CHANNEL, ADC_ATTEN_DB_11); } uint8_t battery_get_percentage(void) { int adc_reading = adc1_get_raw(BATTERY_ADC_CHANNEL); int voltage_mv = esp_adc_cal_raw_to_voltage(adc_reading, &adc_chars); if (voltage_mv >= BATTERY_FULL_MV) return 100; if (voltage_mv <= BATTERY_EMPTY_MV) return 0; return ((voltage_mv - BATTERY_EMPTY_MV) * 100) / (BATTERY_FULL_MV - BATTERY_EMPTY_MV); } bool battery_is_charging(void) { // Check SYS_OUT pin (GPIO36) - high when charging gpio_set_direction(GPIO_NUM_36, GPIO_MODE_INPUT); return gpio_get_level(GPIO_NUM_36); } ``` ### Low Power Modes **Deep Sleep When Idle (Optional):** ```c #define IDLE_TIMEOUT_MS 300000 // 5 minutes void enter_deep_sleep(void) { // Save state to RTC memory RTC_DATA_ATTR static uint32_t boot_count = 0; boot_count++; // Configure wake sources esp_sleep_enable_ext0_wakeup(TOUCH_INT_PIN, 0); // Wake on touch esp_sleep_enable_timer_wakeup(3600 * 1000000ULL); // Wake every hour // Turn off display gpio_set_level(LCD_BL_PIN, 0); // Enter deep sleep esp_deep_sleep_start(); } ``` --- ## Performance Optimization ### LVGL Performance **Buffer Configuration:** ```c #define LVGL_BUFFER_SIZE (240 * 280 * 2) // Full screen buffer static lv_color_t buf_1[LVGL_BUFFER_SIZE / 10]; // 1/10 screen buffer static lv_color_t buf_2[LVGL_BUFFER_SIZE / 10]; // Double buffering lv_disp_draw_buf_t draw_buf; lv_disp_draw_buf_init(&draw_buf, buf_1, buf_2, LVGL_BUFFER_SIZE / 10); ``` **Task Priority:** ```c #define LVGL_TASK_PRIORITY 5 #define AUDIO_MIC_TASK_PRIORITY 10 // Higher priority for audio #define AUDIO_SPK_TASK_PRIORITY 10 #define WIFI_TASK_PRIORITY 8 #define WAVEFORM_TASK_PRIORITY 4 // Lower priority for visuals void app_main(void) { // Create tasks with priorities xTaskCreatePinnedToCore(lvgl_task, "LVGL", 8192, NULL, LVGL_TASK_PRIORITY, NULL, 1); xTaskCreatePinnedToCore(audio_mic_task, "MIC", 4096, NULL, AUDIO_MIC_TASK_PRIORITY, NULL, 0); xTaskCreatePinnedToCore(audio_speaker_task, "SPK", 4096, NULL, AUDIO_SPK_TASK_PRIORITY, NULL, 0); xTaskCreatePinnedToCore(lvgl_waveform_task, "WAVE", 4096, NULL, WAVEFORM_TASK_PRIORITY, NULL, 1); } ``` **Reduce Waveform Update Rate:** ```c // Only update waveform at 30 FPS, not every audio sample #define WAVEFORM_UPDATE_MS 33 // ~30 FPS void lvgl_waveform_task(void *pvParameters) { TickType_t last_update = xTaskGetTickCount(); while (1) { TickType_t now = xTaskGetTickCount(); if ((now - last_update) >= pdMS_TO_TICKS(WAVEFORM_UPDATE_MS)) { // Update waveform last_update = now; } vTaskDelay(pdMS_TO_TICKS(10)); } } ``` ### Memory Management **PSRAM Usage:** ```c // Allocate large buffers in PSRAM (8MB available) #define AUDIO_LARGE_BUFFER_SIZE (16000 * 10) // 10 seconds at 16kHz int16_t *audio_history = heap_caps_malloc(AUDIO_LARGE_BUFFER_SIZE * sizeof(int16_t), MALLOC_CAP_SPIRAM); // Check if allocation succeeded if (audio_history == NULL) { ESP_LOGE(TAG, "Failed to allocate PSRAM buffer"); } ``` **Heap Monitoring:** ```c void log_memory_stats(void) { ESP_LOGI(TAG, "Free heap: %d bytes", esp_get_free_heap_size()); ESP_LOGI(TAG, "Free PSRAM: %d bytes", heap_caps_get_free_size(MALLOC_CAP_SPIRAM)); ESP_LOGI(TAG, "Min free heap: %d bytes", esp_get_minimum_free_heap_size()); } ``` --- ## Example Code Structure ### File Organization ``` esp32_voice_assistant/ ├── main/ │ ├── main.c # Entry point, task creation │ ├── audio/ │ │ ├── audio_input.c # I2S microphone handling │ │ ├── audio_output.c # I2S speaker handling │ │ ├── audio_buffer.c # Circular buffer management │ │ └── audio_network.c # WebSocket/HTTP streaming │ ├── ui/ │ │ ├── ui_init.c # LVGL setup, screen creation │ │ ├── ui_idle.c # Idle screen UI │ │ ├── ui_listening.c # Listening screen + waveform │ │ ├── ui_processing.c # Processing screen + spinner │ │ ├── ui_speaking.c # Speaking screen + output waveform │ │ ├── ui_settings.c # Settings screen │ │ └── ui_waveform.c # Waveform drawing functions │ ├── touch/ │ │ ├── touch_cst816d.c # Touch controller driver │ │ └── touch_gestures.c # Gesture recognition │ ├── network/ │ │ └── wifi_manager.c # WiFi connection management │ ├── power/ │ │ ├── battery.c # Battery monitoring │ │ └── power_mgmt.c # Sleep modes │ └── state_machine.c # Voice assistant state machine ├── components/ │ └── lvgl/ # LVGL library (ESP-IDF component) ├── CMakeLists.txt └── sdkconfig # ESP-IDF configuration ``` ### Main Entry Point ```c // main/main.c #include "freertos/FreeRTOS.h" #include "freertos/task.h" #include "esp_log.h" static const char *TAG = "VOICE_ASSISTANT"; void app_main(void) { ESP_LOGI(TAG, "Voice Assistant Starting..."); // Initialize hardware nvs_flash_init(); // Non-volatile storage gpio_install_isr_service(0);// GPIO interrupts // Power management battery_init(); // Display and touch lcd_init(); touch_init(); ui_init(); // Audio pipeline audio_init_microphone(); audio_init_speaker(); audio_buffer_init(&mic_buffer); audio_buffer_init(&spk_buffer); // Network wifi_init(); websocket_init(); // State machine state_machine_init(); // Create FreeRTOS tasks xTaskCreatePinnedToCore(lvgl_task, "LVGL", 8192, NULL, 5, NULL, 1); xTaskCreatePinnedToCore(audio_mic_task, "MIC", 4096, NULL, 10, NULL, 0); xTaskCreatePinnedToCore(audio_speaker_task, "SPK", 4096, NULL, 10, NULL, 0); xTaskCreatePinnedToCore(lvgl_waveform_task, "WAVE", 4096, NULL, 4, NULL, 1); xTaskCreatePinnedToCore(state_machine_task, "STATE", 4096, NULL, 7, NULL, 0); ESP_LOGI(TAG, "Voice Assistant Running!"); } ``` --- ## Testing Plan ### Phase 1: Hardware Validation - [ ] LCD display working (show test pattern) - [ ] Touch controller responding (log touch coordinates) - [ ] Buzzer working (play test tone) - [ ] WiFi connecting (check IP address) - [ ] Battery reading (log voltage) - [ ] RTC working (log time) - [ ] IMU working (log accelerometer values) ### Phase 2: Audio Pipeline - [ ] I2S microphone reading audio (log levels) - [ ] Audio streaming to Heimdall server - [ ] I2S speaker playing audio (test tone) - [ ] TTS audio playback from server - [ ] Audio buffer management (no overflows) ### Phase 3: LVGL UI - [ ] Idle screen displays correctly - [ ] State transitions smooth - [ ] Waveform renders at 30 FPS - [ ] Touch gestures recognized - [ ] Settings screen functional - [ ] Status bar updates correctly ### Phase 4: Integration - [ ] Wake word detection triggers listening state - [ ] Waveform shows mic input in real-time - [ ] Processing state shows after speech ends - [ ] TTS response plays with output waveform - [ ] Touch cancel works in all states - [ ] Battery indicator accurate ### Phase 5: Optimization - [ ] Memory usage stable (no leaks) - [ ] CPU usage acceptable (<80% average) - [ ] WiFi latency <100ms - [ ] Audio latency <200ms end-to-end - [ ] Display framerate stable (30 FPS) - [ ] Battery life >4 hours continuous --- ## Bill of Materials (BOM) | Component | Part Number | Quantity | Unit Price | Total | |-----------|-------------|----------|------------|-------| | ESP32-S3-Touch-LCD-1.69 | Waveshare | 1 | $12.00 | $12.00 | | I2S MEMS Microphone | INMP441 | 1 | $3.50 | $3.50 | | I2S Amplifier | MAX98357A | 1 | $3.50 | $3.50 | | Speaker (3W 8Ω) | Generic | 1 | $5.00 | $5.00 | | LiPo Battery (1000mAh) | 503040 JST 1.25 | 1 | $7.00 | $7.00 | | MicroSD Card (8GB) | SanDisk | 1 | $5.00 | $5.00 | | Breadboard + Wires | Generic | 1 | $5.00 | $5.00 | | **Total** | | | | **$41.00** | **Optional:** - Enclosure/Case (3D printed or project box): $5-10 - Backup battery: $7 - USB-C cable: $3 **Grand Total with Options:** ~$56-63 --- ## References & Resources ### LVGL Audio Visualization Examples - **Music Player with FFT Spectrum** - [Instructables Guide](https://www.instructables.com/Design-Music-Player-UI-With-LVGL/) - Source: https://github.com/moononournation/LVGL_Music_Player.git - Shows FFT-based audio visualization on LVGL canvas - **LVGL Audio FFT Spectrum (Xiao S3)** - [GitHub: genvex/LVGL_Audio_FFT_Spectrum_xiaoS3_oled](https://github.com/genvex/LVGL_Audio_FFT_Spectrum_xiaoS3_oled) - Real-time FFT visualization using low-level LVGL drawing - **LVGL Audio FFT Spectrum** - [GitHub: imliubo/LVGL_Audio_FFT_Spectrum](https://github.com/imliubo/LVGL_Audio_FFT_Spectrum) - Alternative FFT spectrum implementation - **Moving Waveform Discussion** - [LVGL Forum Thread](https://forum.lvgl.io/t/best-method-to-display-a-moving-waveform/17361) - Tips on efficiently displaying moving waveforms ### ESP32-S3 Resources - **Waveshare Wiki** - https://www.waveshare.com/wiki/ESP32-S3-LCD-1.69 - **LVGL ESP32 Port** - [GitHub: lvgl/lv_port_esp32](https://github.com/lvgl/lv_port_esp32) - **ESP-IDF Documentation** - https://docs.espressif.com/projects/esp-idf/en/latest/ ### Voice Assistant Project - **Mycroft Precise Documentation** - https://github.com/MycroftAI/mycroft-precise - **Whisper OpenAI** - https://github.com/openai/whisper - **Piper TTS** - https://github.com/rhasspy/piper --- ## Next Steps 1. **Order Hardware** - ESP32-S3-Touch-LCD + audio components (~$41) 2. **Setup ESP-IDF** - Install ESP-IDF v5.3.1+ on development machine 3. **Clone Examples** - Get LVGL audio visualization examples for reference 4. **Start Simple** - Begin with LCD + LVGL test (no audio) 5. **Add Audio** - Wire I2S mic, test audio streaming 6. **Waveform MVP** - Get basic waveform rendering working 7. **Full Integration** - Connect to Heimdall voice server 8. **Polish** - Add touch controls, settings, battery support --- **Version:** 1.0 **Created:** 2026-01-01 **Status:** Specification Complete, Ready for Implementation