pyr0ball 173f7f37d4 feat: import mycroft-precise work as Minerva foundation

Ports prior voice assistant research and prototypes from devl/Devops
into the Minerva repo. Includes:

- docs/: architecture, wake word guides, ESP32-S3 spec, hardware buying guide
- scripts/: voice_server.py, voice_server_enhanced.py, setup scripts
- hardware/maixduino/: edge device scripts with WiFi credentials scrubbed
  (replaced hardcoded password with secrets.py pattern)
- config/.env.example: server config template
- .gitignore: excludes .env, secrets.py, model blobs, ELF firmware
- CLAUDE.md: Minerva product context and connection to cf-voice roadmap

2026-04-06 22:21:12 -07:00

32 KiB

Executable file

Raw Permalink Blame History

ESP32-S3-Touch-LCD Voice Assistant - Technical Specification

Date: 2026-01-01 Hardware: Waveshare ESP32-S3-Touch-LCD-1.69 Display: 240×280 ST7789V2 with Capacitive Touch Framework: ESP-IDF v5.3.1+ with LVGL 8.4.0+ Purpose: Voice assistant endpoint with real-time audio waveform visualization

Overview

Voice assistant client for ESP32-S3 with integrated LVGL-based visual feedback showing:

Real-time audio waveform during listening
Wake word detection animation
Processing/thinking state
Response state with audio output visualization
Touch controls for volume, sensitivity, settings

Architecture:

┌─────────────────────────────────┐
│  ESP32-S3-Touch-LCD-1.69        │
│                                 │
│  ┌──────────────────────────┐  │
│  │   LVGL UI (240×280)      │  │
│  │   - Waveform Canvas      │  │
│  │   - State Indicators     │  │──┐
│  │   - Touch Controls       │  │  │
│  └──────────────────────────┘  │  │
│                                 │  │
│  ┌──────────────────────────┐  │  │ WiFi
│  │   Audio Pipeline         │  │  │ Audio Stream
│  │   - I2S Mic Input        │  │  │
│  │   - I2S Speaker Output   │  │──┤
│  │   - Buffer Management    │  │  │
│  └──────────────────────────┘  │  │
│                                 │  │
│  ┌──────────────────────────┐  │  │
│  │   State Machine          │  │  │
│  │   - Idle → Listening     │  │  │
│  │   - Processing → Speaking│  │──┘
│  └──────────────────────────┘  │
└─────────────────────────────────┘
         │
         │ TCP/HTTP
         ↓
┌─────────────────────────────────┐
│  Heimdall Voice Server          │
│  (10.1.10.71:3006)              │
│                                 │
│  - Mycroft Precise Wake Word    │
│  - Whisper STT                  │
│  - Home Assistant Integration   │
│  - Piper TTS                    │
└─────────────────────────────────┘

Visual States & UI Design

State Machine

        ┌─────────┐
        │  IDLE   │ ◄──────────────┐
        └────┬────┘                │
             │                     │
    Wake Word Detected             │
             │                     │
             ↓                     │
      ┌──────────┐                │
      │LISTENING │                │
      └────┬─────┘                │
           │                      │
   End of Speech                  │
           │                      │
           ↓                      │
    ┌───────────┐                │
    │PROCESSING │                │
    └─────┬─────┘                │
          │                      │
    Response Ready               │
          │                      │
          ↓                      │
    ┌──────────┐                │
    │ SPEAKING │ ───────────────┘
    └──────────┘

Visual Feedback Per State

1. IDLE State

Display:

Subtle pulsing ring animation (like Google Home)
Time display from RTC
Status icons (WiFi strength, battery level)
Dim backlight (30-50%)

Colors:

Background: Dark blue (#001F3F)
Pulse ring: Cyan (#00BFFF)
Text: White (#FFFFFF)

LVGL Widgets:

lv_obj_t *idle_screen;
lv_obj_t *pulse_ring;      // Arc widget, animated rotation
lv_obj_t *time_label;      // Label with RTC time
lv_obj_t *status_bar;      // Container for icons

Animation:

Slow pulse: 2-second breathing cycle
Rotation: 360° over 10 seconds

2. LISTENING State

Display:

Real-time audio waveform visualization
Bright backlight (100%)
"Listening..." text
Cancel button (touch)

Waveform Visualization:

Option A: Canvas-Based Waveform (Recommended)

Use LVGL lv_canvas for custom drawing
Draw waveform from audio buffer samples
Scrolling waveform (left-to-right)
Update rate: 30-60 FPS

Option B: Bar Chart Spectrum

Use lv_chart with bar type
FFT-based spectrum analyzer
8-16 bars for frequency bins
Update rate: 15-30 FPS

Colors:

Background: Dark gray (#1A1A1A)
Waveform: Green (#00FF00)
Peak indicators: Yellow (#FFFF00)
Clipping: Red (#FF0000)

LVGL Implementation:

// Canvas-based waveform
lv_obj_t *listening_screen;
lv_obj_t *waveform_canvas;    // 240×180 canvas
lv_obj_t *listening_label;    // "Listening..."
lv_obj_t *cancel_btn;         // Touch to cancel

// Waveform buffer (circular buffer)
#define WAVEFORM_WIDTH 240
#define WAVEFORM_HEIGHT 180
#define WAVEFORM_CENTER (WAVEFORM_HEIGHT / 2)
int16_t waveform_buffer[WAVEFORM_WIDTH];
uint16_t waveform_index = 0;

// Drawing function (called from audio callback)
void draw_waveform(lv_obj_t *canvas, int16_t *audio_samples, size_t count) {
    lv_canvas_fill_bg(canvas, lv_color_hex(0x1A1A1A), LV_OPA_COVER);

    lv_draw_line_dsc_t line_dsc;
    lv_draw_line_dsc_init(&line_dsc);
    line_dsc.color = lv_color_hex(0x00FF00);
    line_dsc.width = 2;

    // Draw waveform line
    for (int x = 0; x < WAVEFORM_WIDTH - 1; x++) {
        int16_t y1 = WAVEFORM_CENTER + (waveform_buffer[x] / 256);
        int16_t y2 = WAVEFORM_CENTER + (waveform_buffer[x + 1] / 256);

        lv_point_t points[] = {{x, y1}, {x + 1, y2}};
        lv_canvas_draw_line(canvas, points, 2, &line_dsc);
    }
}

// Audio callback (I2S task)
void audio_i2s_callback(int16_t *samples, size_t count) {
    // Downsample audio for waveform display
    for (int i = 0; i < count; i += (count / WAVEFORM_WIDTH)) {
        waveform_buffer[waveform_index] = samples[i];
        waveform_index = (waveform_index + 1) % WAVEFORM_WIDTH;
    }

    // Trigger LVGL update (use event or flag)
    xEventGroupSetBits(ui_event_group, WAVEFORM_UPDATE_BIT);
}

Touch Controls:

Tap anywhere: Cancel listening
Swipe down: Lower sensitivity
Swipe up: Increase sensitivity

3. PROCESSING State

Display:

Animated spinner/thinking indicator
"Processing..." text
Waveform fades out smoothly

Animation:

Circular spinner with gradient
Rotation: 360° per 1 second
Pulsing opacity

Colors:

Background: Dark gray (#1A1A1A)
Spinner: Blue (#0080FF)
Text: Light gray (#CCCCCC)

LVGL Implementation:

lv_obj_t *processing_screen;
lv_obj_t *spinner;           // lv_spinner widget
lv_obj_t *processing_label;  // "Processing..."

// Transition from listening to processing
void transition_to_processing(void) {
    // Fade out waveform
    lv_anim_t fade_out;
    lv_anim_init(&fade_out);
    lv_anim_set_var(&fade_out, waveform_canvas);
    lv_anim_set_values(&fade_out, LV_OPA_COVER, LV_OPA_TRANSP);
    lv_anim_set_time(&fade_out, 300);
    lv_anim_set_exec_cb(&fade_out, lv_obj_set_style_opa);
    lv_anim_start(&fade_out);

    // Show spinner after fade
    lv_timer_t *timer = lv_timer_create(show_spinner_callback, 300, NULL);
    lv_timer_set_repeat_count(timer, 1);
}

4. SPEAKING State

Display:

Audio output waveform (TTS playback visualization)
"Speaking..." or response text snippet
Volume indicator

Waveform:

Same canvas as LISTENING but different color
Shows output audio being played
Synchronized with speaker output

Colors:

Background: Dark gray (#1A1A1A)
Waveform: Blue (#0080FF)
Text: White (#FFFFFF)

LVGL Implementation:

lv_obj_t *speaking_screen;
lv_obj_t *output_waveform_canvas;  // Same size as input waveform
lv_obj_t *response_label;          // Show part of response text
lv_obj_t *volume_bar;              // lv_bar widget for volume level

// Similar drawing to listening state, but fed from speaker buffer
void draw_output_waveform(lv_obj_t *canvas, int16_t *speaker_samples, size_t count) {
    // Same logic as input waveform, different color
    line_dsc.color = lv_color_hex(0x0080FF);
    // ... draw logic
}

Touch Controls:

Tap: Skip response (go back to idle)
Volume slider: Adjust speaker volume

Additional UI Elements

Status Bar (All States)

Location: Top 20 pixels Contents:

WiFi icon + signal strength
Battery icon + percentage
Time (from RTC)
Mute icon (if muted)

LVGL Implementation:

lv_obj_t *status_bar;
lv_obj_t *wifi_icon;
lv_obj_t *battery_icon;
lv_obj_t *time_label;
lv_obj_t *mute_icon;

// Update every second
void update_status_bar(lv_timer_t *timer) {
    // Update WiFi strength
    int8_t rssi = wifi_get_rssi();
    lv_img_set_src(wifi_icon, get_wifi_icon(rssi));

    // Update battery
    uint8_t battery_pct = battery_get_percentage();
    lv_img_set_src(battery_icon, get_battery_icon(battery_pct));

    // Update time from RTC
    rtc_time_t time;
    pcf85063_get_time(&time);
    lv_label_set_text_fmt(time_label, "%02d:%02d", time.hour, time.min);
}

// Create timer for status bar updates
lv_timer_create(update_status_bar, 1000, NULL);

Settings Screen (Touch Access)

Trigger: Long-press on idle screen Contents:

Volume slider
Brightness slider
Wake word sensitivity slider
WiFi settings button
About/Info button

LVGL Implementation:

lv_obj_t *settings_screen;
lv_obj_t *volume_slider;
lv_obj_t *brightness_slider;
lv_obj_t *sensitivity_slider;
lv_obj_t *wifi_btn;
lv_obj_t *about_btn;
lv_obj_t *back_btn;

// Slider event handler
static void slider_event_cb(lv_event_t *e) {
    lv_obj_t *slider = lv_event_get_target(e);
    int32_t value = lv_slider_get_value(slider);

    if (slider == volume_slider) {
        set_speaker_volume(value);
    } else if (slider == brightness_slider) {
        set_backlight_brightness(value);
    } else if (slider == sensitivity_slider) {
        set_wake_word_sensitivity(value);
    }
}

Audio Pipeline Integration

I2S Configuration

Microphone (INMP441):

#define I2S_MIC_NUM         I2S_NUM_0
#define I2S_MIC_BCLK_PIN    GPIO_NUM_4   // Verify with board schematic
#define I2S_MIC_WS_PIN      GPIO_NUM_5
#define I2S_MIC_DIN_PIN     GPIO_NUM_6
#define I2S_MIC_SAMPLE_RATE 16000
#define I2S_MIC_BITS        16
#define I2S_MIC_CHANNELS    1

i2s_config_t i2s_mic_config = {
    .mode = I2S_MODE_MASTER | I2S_MODE_RX,
    .sample_rate = I2S_MIC_SAMPLE_RATE,
    .bits_per_sample = I2S_BITS_PER_SAMPLE_16BIT,
    .channel_format = I2S_CHANNEL_FMT_ONLY_LEFT,
    .communication_format = I2S_COMM_FORMAT_STAND_I2S,
    .intr_alloc_flags = ESP_INTR_FLAG_LEVEL1,
    .dma_buf_count = 8,
    .dma_buf_len = 256,
    .use_apll = false,
    .tx_desc_auto_clear = false,
    .fixed_mclk = 0
};

i2s_pin_config_t i2s_mic_pins = {
    .bck_io_num = I2S_MIC_BCLK_PIN,
    .ws_io_num = I2S_MIC_WS_PIN,
    .data_out_num = I2S_PIN_NO_CHANGE,
    .data_in_num = I2S_MIC_DIN_PIN
};

void audio_init_microphone(void) {
    i2s_driver_install(I2S_MIC_NUM, &i2s_mic_config, 0, NULL);
    i2s_set_pin(I2S_MIC_NUM, &i2s_mic_pins);
    i2s_zero_dma_buffer(I2S_MIC_NUM);
}

Speaker (MAX98357A I2S Amp):

#define I2S_SPK_NUM         I2S_NUM_1
#define I2S_SPK_BCLK_PIN    GPIO_NUM_7   // Verify with board schematic
#define I2S_SPK_WS_PIN      GPIO_NUM_8
#define I2S_SPK_DOUT_PIN    GPIO_NUM_9
#define I2S_SPK_SAMPLE_RATE 16000
#define I2S_SPK_BITS        16
#define I2S_SPK_CHANNELS    1

i2s_config_t i2s_spk_config = {
    .mode = I2S_MODE_MASTER | I2S_MODE_TX,
    .sample_rate = I2S_SPK_SAMPLE_RATE,
    .bits_per_sample = I2S_BITS_PER_SAMPLE_16BIT,
    .channel_format = I2S_CHANNEL_FMT_ONLY_LEFT,
    .communication_format = I2S_COMM_FORMAT_STAND_I2S,
    .intr_alloc_flags = ESP_INTR_FLAG_LEVEL1,
    .dma_buf_count = 8,
    .dma_buf_len = 256,
    .use_apll = false,
    .tx_desc_auto_clear = true,
    .fixed_mclk = 0
};

i2s_pin_config_t i2s_spk_pins = {
    .bck_io_num = I2S_SPK_BCLK_PIN,
    .ws_io_num = I2S_SPK_WS_PIN,
    .data_out_num = I2S_SPK_DOUT_PIN,
    .data_in_num = I2S_PIN_NO_CHANGE
};

void audio_init_speaker(void) {
    i2s_driver_install(I2S_SPK_NUM, &i2s_spk_config, 0, NULL);
    i2s_set_pin(I2S_SPK_NUM, &i2s_spk_pins);
    i2s_zero_dma_buffer(I2S_SPK_NUM);
}

Audio Buffer Management

Circular Buffer for Waveform:

#define AUDIO_BUFFER_SIZE 2048
#define WAVEFORM_DECIMATION 8  // Downsample for display

typedef struct {
    int16_t samples[AUDIO_BUFFER_SIZE];
    uint16_t write_idx;
    uint16_t read_idx;
    SemaphoreHandle_t mutex;
} audio_buffer_t;

audio_buffer_t mic_buffer;
audio_buffer_t spk_buffer;

void audio_buffer_init(audio_buffer_t *buf) {
    memset(buf->samples, 0, sizeof(buf->samples));
    buf->write_idx = 0;
    buf->read_idx = 0;
    buf->mutex = xSemaphoreCreateMutex();
}

void audio_buffer_write(audio_buffer_t *buf, int16_t *samples, size_t count) {
    xSemaphoreTake(buf->mutex, portMAX_DELAY);
    for (size_t i = 0; i < count; i++) {
        buf->samples[buf->write_idx] = samples[i];
        buf->write_idx = (buf->write_idx + 1) % AUDIO_BUFFER_SIZE;
    }
    xSemaphoreGive(buf->mutex);
}

// Get downsampled samples for waveform display
void audio_buffer_get_waveform(audio_buffer_t *buf, int16_t *out, size_t out_count) {
    xSemaphoreTake(buf->mutex, portMAX_DELAY);
    for (size_t i = 0; i < out_count; i++) {
        size_t src_idx = (buf->write_idx + (i * WAVEFORM_DECIMATION)) % AUDIO_BUFFER_SIZE;
        out[i] = buf->samples[src_idx];
    }
    xSemaphoreGive(buf->mutex);
}

Audio Streaming Task

Microphone Input Task:

void audio_mic_task(void *pvParameters) {
    int16_t i2s_buffer[256];
    size_t bytes_read;

    while (1) {
        // Read from I2S microphone
        i2s_read(I2S_MIC_NUM, i2s_buffer, sizeof(i2s_buffer), &bytes_read, portMAX_DELAY);
        size_t samples_read = bytes_read / sizeof(int16_t);

        if (current_state == STATE_LISTENING) {
            // Write to circular buffer for waveform display
            audio_buffer_write(&mic_buffer, i2s_buffer, samples_read);

            // Send to Heimdall server via WiFi
            audio_send_to_server(i2s_buffer, samples_read);

            // Trigger waveform update
            xEventGroupSetBits(ui_event_group, WAVEFORM_UPDATE_BIT);
        }
    }
}

Speaker Output Task:

void audio_speaker_task(void *pvParameters) {
    int16_t i2s_buffer[256];
    size_t bytes_written;

    while (1) {
        // Receive audio from Heimdall server
        size_t samples_received = audio_receive_from_server(i2s_buffer, 256);

        if (samples_received > 0 && current_state == STATE_SPEAKING) {
            // Write to circular buffer for waveform display
            audio_buffer_write(&spk_buffer, i2s_buffer, samples_received);

            // Play through I2S speaker
            i2s_write(I2S_SPK_NUM, i2s_buffer, samples_received * sizeof(int16_t),
                     &bytes_written, portMAX_DELAY);

            // Trigger waveform update
            xEventGroupSetBits(ui_event_group, WAVEFORM_UPDATE_BIT);
        } else {
            vTaskDelay(pdMS_TO_TICKS(10));
        }
    }
}

LVGL Update Task

Waveform Rendering Task:

void lvgl_waveform_task(void *pvParameters) {
    int16_t waveform_samples[WAVEFORM_WIDTH];

    while (1) {
        // Wait for waveform update event
        EventBits_t bits = xEventGroupWaitBits(ui_event_group, WAVEFORM_UPDATE_BIT,
                                               pdTRUE, pdFALSE, pdMS_TO_TICKS(50));

        if (bits & WAVEFORM_UPDATE_BIT) {
            if (current_state == STATE_LISTENING) {
                // Get downsampled mic data
                audio_buffer_get_waveform(&mic_buffer, waveform_samples, WAVEFORM_WIDTH);

                // Draw on LVGL canvas (must lock LVGL)
                lvgl_lock();
                draw_waveform(waveform_canvas, waveform_samples, WAVEFORM_WIDTH);
                lvgl_unlock();

            } else if (current_state == STATE_SPEAKING) {
                // Get downsampled speaker data
                audio_buffer_get_waveform(&spk_buffer, waveform_samples, WAVEFORM_WIDTH);

                lvgl_lock();
                draw_output_waveform(output_waveform_canvas, waveform_samples, WAVEFORM_WIDTH);
                lvgl_unlock();
            }
        }
    }
}

Touch Gesture Integration

Touch Controller (CST816D)

Gestures Supported:

Single tap
Long press
Swipe up/down/left/right

Implementation:

#define TOUCH_I2C_NUM       I2C_NUM_0
#define TOUCH_SDA_PIN       GPIO_NUM_6
#define TOUCH_SCL_PIN       GPIO_NUM_7
#define TOUCH_INT_PIN       GPIO_NUM_9
#define TOUCH_RST_PIN       GPIO_NUM_10

typedef enum {
    GESTURE_NONE = 0,
    GESTURE_TAP,
    GESTURE_LONG_PRESS,
    GESTURE_SWIPE_UP,
    GESTURE_SWIPE_DOWN,
    GESTURE_SWIPE_LEFT,
    GESTURE_SWIPE_RIGHT
} touch_gesture_t;

void touch_init(void) {
    // I2C init for CST816D
    i2c_config_t conf = {
        .mode = I2C_MODE_MASTER,
        .sda_io_num = TOUCH_SDA_PIN,
        .scl_io_num = TOUCH_SCL_PIN,
        .sda_pullup_en = GPIO_PULLUP_ENABLE,
        .scl_pullup_en = GPIO_PULLUP_ENABLE,
        .master.clk_speed = 100000,
    };
    i2c_param_config(TOUCH_I2C_NUM, &conf);
    i2c_driver_install(TOUCH_I2C_NUM, conf.mode, 0, 0, 0);

    // Reset touch controller
    gpio_set_direction(TOUCH_RST_PIN, GPIO_MODE_OUTPUT);
    gpio_set_level(TOUCH_RST_PIN, 0);
    vTaskDelay(pdMS_TO_TICKS(10));
    gpio_set_level(TOUCH_RST_PIN, 1);
    vTaskDelay(pdMS_TO_TICKS(50));

    // Configure interrupt pin
    gpio_set_direction(TOUCH_INT_PIN, GPIO_MODE_INPUT);
    gpio_set_intr_type(TOUCH_INT_PIN, GPIO_INTR_NEGEDGE);
    gpio_install_isr_service(0);
    gpio_isr_handler_add(TOUCH_INT_PIN, touch_isr_handler, NULL);
}

touch_gesture_t touch_read_gesture(void) {
    uint8_t data[8];
    // Read gesture from CST816D register 0x01
    i2c_master_read_from_device(TOUCH_I2C_NUM, CST816D_ADDR, 0x01, data, 8, pdMS_TO_TICKS(100));
    return (touch_gesture_t)data[0];
}

Gesture Actions by State

IDLE State:

Tap: Wake up display (if dimmed)
Long Press: Open settings screen
Swipe Up: Show more info (weather, calendar)

LISTENING State:

Tap: Cancel listening, return to idle
Swipe Down: Lower wake word sensitivity
Swipe Up: Raise wake word sensitivity

SPEAKING State:

Tap: Skip response, return to idle
Swipe Left/Right: Volume down/up

PROCESSING State:

Tap: Cancel processing (if possible)

Network Communication

WiFi Configuration

Connection:

#define WIFI_SSID           "YourNetworkName"
#define WIFI_PASSWORD       "YourPassword"
#define SERVER_URL          "http://10.1.10.71:3006"

void wifi_init(void) {
    esp_netif_init();
    esp_event_loop_create_default();
    esp_netif_create_default_wifi_sta();

    wifi_init_config_t cfg = WIFI_INIT_CONFIG_DEFAULT();
    esp_wifi_init(&cfg);

    wifi_config_t wifi_config = {
        .sta = {
            .ssid = WIFI_SSID,
            .password = WIFI_PASSWORD,
        },
    };

    esp_wifi_set_mode(WIFI_MODE_STA);
    esp_wifi_set_config(WIFI_IF_STA, &wifi_config);
    esp_wifi_start();
    esp_wifi_connect();
}

Server Communication Protocol

Endpoints:

GET /health - Server health check
POST /audio/stream - Stream audio to server (multipart)
GET /audio/tts - Receive TTS audio response
GET /wake-word/status - Check wake word detection status

Audio Streaming (WebSockets Recommended):

#include "esp_websocket_client.h"

esp_websocket_client_handle_t ws_client;

void websocket_init(void) {
    esp_websocket_client_config_t ws_cfg = {
        .uri = "ws://10.1.10.71:3006/ws/audio",
        .buffer_size = 2048,
    };

    ws_client = esp_websocket_client_init(&ws_cfg);
    esp_websocket_register_events(ws_client, WEBSOCKET_EVENT_ANY,
                                   websocket_event_handler, NULL);
    esp_websocket_client_start(ws_client);
}

void audio_send_to_server(int16_t *samples, size_t count) {
    if (esp_websocket_client_is_connected(ws_client)) {
        esp_websocket_client_send_bin(ws_client, (char*)samples,
                                     count * sizeof(int16_t), portMAX_DELAY);
    }
}

size_t audio_receive_from_server(int16_t *out_buffer, size_t max_samples) {
    // Receive audio from server (blocking with timeout)
    int len = esp_websocket_client_recv(ws_client, (char*)out_buffer,
                                       max_samples * sizeof(int16_t), pdMS_TO_TICKS(100));
    return (len > 0) ? (len / sizeof(int16_t)) : 0;
}

Alternative: HTTP Chunked Transfer (Simpler):

void audio_stream_http(void) {
    esp_http_client_config_t config = {
        .url = "http://10.1.10.71:3006/audio/stream",
        .method = HTTP_METHOD_POST,
    };
    esp_http_client_handle_t client = esp_http_client_init(&config);

    // Set headers
    esp_http_client_set_header(client, "Content-Type", "audio/pcm");
    esp_http_client_set_header(client, "Transfer-Encoding", "chunked");

    esp_http_client_open(client, -1);  // -1 = chunked mode

    // Stream audio chunks
    int16_t buffer[256];
    while (current_state == STATE_LISTENING) {
        // Read from mic
        size_t bytes_read;
        i2s_read(I2S_MIC_NUM, buffer, sizeof(buffer), &bytes_read, portMAX_DELAY);

        // Send to server
        esp_http_client_write(client, (char*)buffer, bytes_read);
    }

    esp_http_client_close(client);
    esp_http_client_cleanup(client);
}

Power Management

Battery Monitoring

ETA6098 Charging Chip:

#define BATTERY_ADC_CHANNEL ADC1_CHANNEL_0  // GPIO1 (example)
#define BATTERY_FULL_MV     4200
#define BATTERY_EMPTY_MV    3300

void battery_init(void) {
    adc1_config_width(ADC_WIDTH_BIT_12);
    adc1_config_channel_atten(BATTERY_ADC_CHANNEL, ADC_ATTEN_DB_11);
}

uint8_t battery_get_percentage(void) {
    int adc_reading = adc1_get_raw(BATTERY_ADC_CHANNEL);
    int voltage_mv = esp_adc_cal_raw_to_voltage(adc_reading, &adc_chars);

    if (voltage_mv >= BATTERY_FULL_MV) return 100;
    if (voltage_mv <= BATTERY_EMPTY_MV) return 0;

    return ((voltage_mv - BATTERY_EMPTY_MV) * 100) / (BATTERY_FULL_MV - BATTERY_EMPTY_MV);
}

bool battery_is_charging(void) {
    // Check SYS_OUT pin (GPIO36) - high when charging
    gpio_set_direction(GPIO_NUM_36, GPIO_MODE_INPUT);
    return gpio_get_level(GPIO_NUM_36);
}

Low Power Modes

Deep Sleep When Idle (Optional):

#define IDLE_TIMEOUT_MS 300000  // 5 minutes

void enter_deep_sleep(void) {
    // Save state to RTC memory
    RTC_DATA_ATTR static uint32_t boot_count = 0;
    boot_count++;

    // Configure wake sources
    esp_sleep_enable_ext0_wakeup(TOUCH_INT_PIN, 0);  // Wake on touch
    esp_sleep_enable_timer_wakeup(3600 * 1000000ULL); // Wake every hour

    // Turn off display
    gpio_set_level(LCD_BL_PIN, 0);

    // Enter deep sleep
    esp_deep_sleep_start();
}

Performance Optimization

LVGL Performance

Buffer Configuration:

#define LVGL_BUFFER_SIZE (240 * 280 * 2)  // Full screen buffer

static lv_color_t buf_1[LVGL_BUFFER_SIZE / 10];  // 1/10 screen buffer
static lv_color_t buf_2[LVGL_BUFFER_SIZE / 10];  // Double buffering

lv_disp_draw_buf_t draw_buf;
lv_disp_draw_buf_init(&draw_buf, buf_1, buf_2, LVGL_BUFFER_SIZE / 10);

Task Priority:

#define LVGL_TASK_PRIORITY      5
#define AUDIO_MIC_TASK_PRIORITY 10  // Higher priority for audio
#define AUDIO_SPK_TASK_PRIORITY 10
#define WIFI_TASK_PRIORITY      8
#define WAVEFORM_TASK_PRIORITY  4   // Lower priority for visuals

void app_main(void) {
    // Create tasks with priorities
    xTaskCreatePinnedToCore(lvgl_task, "LVGL", 8192, NULL, LVGL_TASK_PRIORITY, NULL, 1);
    xTaskCreatePinnedToCore(audio_mic_task, "MIC", 4096, NULL, AUDIO_MIC_TASK_PRIORITY, NULL, 0);
    xTaskCreatePinnedToCore(audio_speaker_task, "SPK", 4096, NULL, AUDIO_SPK_TASK_PRIORITY, NULL, 0);
    xTaskCreatePinnedToCore(lvgl_waveform_task, "WAVE", 4096, NULL, WAVEFORM_TASK_PRIORITY, NULL, 1);
}

Reduce Waveform Update Rate:

// Only update waveform at 30 FPS, not every audio sample
#define WAVEFORM_UPDATE_MS 33  // ~30 FPS

void lvgl_waveform_task(void *pvParameters) {
    TickType_t last_update = xTaskGetTickCount();

    while (1) {
        TickType_t now = xTaskGetTickCount();
        if ((now - last_update) >= pdMS_TO_TICKS(WAVEFORM_UPDATE_MS)) {
            // Update waveform
            last_update = now;
        }
        vTaskDelay(pdMS_TO_TICKS(10));
    }
}

Memory Management

PSRAM Usage:

// Allocate large buffers in PSRAM (8MB available)
#define AUDIO_LARGE_BUFFER_SIZE (16000 * 10)  // 10 seconds at 16kHz

int16_t *audio_history = heap_caps_malloc(AUDIO_LARGE_BUFFER_SIZE * sizeof(int16_t),
                                          MALLOC_CAP_SPIRAM);

// Check if allocation succeeded
if (audio_history == NULL) {
    ESP_LOGE(TAG, "Failed to allocate PSRAM buffer");
}

Heap Monitoring:

void log_memory_stats(void) {
    ESP_LOGI(TAG, "Free heap: %d bytes", esp_get_free_heap_size());
    ESP_LOGI(TAG, "Free PSRAM: %d bytes", heap_caps_get_free_size(MALLOC_CAP_SPIRAM));
    ESP_LOGI(TAG, "Min free heap: %d bytes", esp_get_minimum_free_heap_size());
}

Example Code Structure

File Organization

esp32_voice_assistant/
├── main/
│   ├── main.c                  # Entry point, task creation
│   ├── audio/
│   │   ├── audio_input.c       # I2S microphone handling
│   │   ├── audio_output.c      # I2S speaker handling
│   │   ├── audio_buffer.c      # Circular buffer management
│   │   └── audio_network.c     # WebSocket/HTTP streaming
│   ├── ui/
│   │   ├── ui_init.c           # LVGL setup, screen creation
│   │   ├── ui_idle.c           # Idle screen UI
│   │   ├── ui_listening.c      # Listening screen + waveform
│   │   ├── ui_processing.c     # Processing screen + spinner
│   │   ├── ui_speaking.c       # Speaking screen + output waveform
│   │   ├── ui_settings.c       # Settings screen
│   │   └── ui_waveform.c       # Waveform drawing functions
│   ├── touch/
│   │   ├── touch_cst816d.c     # Touch controller driver
│   │   └── touch_gestures.c    # Gesture recognition
│   ├── network/
│   │   └── wifi_manager.c      # WiFi connection management
│   ├── power/
│   │   ├── battery.c           # Battery monitoring
│   │   └── power_mgmt.c        # Sleep modes
│   └── state_machine.c         # Voice assistant state machine
├── components/
│   └── lvgl/                   # LVGL library (ESP-IDF component)
├── CMakeLists.txt
└── sdkconfig                   # ESP-IDF configuration

Main Entry Point

// main/main.c
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "esp_log.h"

static const char *TAG = "VOICE_ASSISTANT";

void app_main(void) {
    ESP_LOGI(TAG, "Voice Assistant Starting...");

    // Initialize hardware
    nvs_flash_init();           // Non-volatile storage
    gpio_install_isr_service(0);// GPIO interrupts

    // Power management
    battery_init();

    // Display and touch
    lcd_init();
    touch_init();
    ui_init();

    // Audio pipeline
    audio_init_microphone();
    audio_init_speaker();
    audio_buffer_init(&mic_buffer);
    audio_buffer_init(&spk_buffer);

    // Network
    wifi_init();
    websocket_init();

    // State machine
    state_machine_init();

    // Create FreeRTOS tasks
    xTaskCreatePinnedToCore(lvgl_task, "LVGL", 8192, NULL, 5, NULL, 1);
    xTaskCreatePinnedToCore(audio_mic_task, "MIC", 4096, NULL, 10, NULL, 0);
    xTaskCreatePinnedToCore(audio_speaker_task, "SPK", 4096, NULL, 10, NULL, 0);
    xTaskCreatePinnedToCore(lvgl_waveform_task, "WAVE", 4096, NULL, 4, NULL, 1);
    xTaskCreatePinnedToCore(state_machine_task, "STATE", 4096, NULL, 7, NULL, 0);

    ESP_LOGI(TAG, "Voice Assistant Running!");
}

Testing Plan

Phase 1: Hardware Validation

LCD display working (show test pattern)
Touch controller responding (log touch coordinates)
Buzzer working (play test tone)
WiFi connecting (check IP address)
Battery reading (log voltage)
RTC working (log time)
IMU working (log accelerometer values)

Phase 2: Audio Pipeline

I2S microphone reading audio (log levels)
Audio streaming to Heimdall server
I2S speaker playing audio (test tone)
TTS audio playback from server
Audio buffer management (no overflows)

Phase 3: LVGL UI

Idle screen displays correctly
State transitions smooth
Waveform renders at 30 FPS
Touch gestures recognized
Settings screen functional
Status bar updates correctly

Phase 4: Integration

Wake word detection triggers listening state
Waveform shows mic input in real-time
Processing state shows after speech ends
TTS response plays with output waveform
Touch cancel works in all states
Battery indicator accurate

Phase 5: Optimization

Memory usage stable (no leaks)
CPU usage acceptable (<80% average)
WiFi latency <100ms
Audio latency <200ms end-to-end
Display framerate stable (30 FPS)
Battery life >4 hours continuous

Bill of Materials (BOM)

Component	Part Number	Quantity	Unit Price	Total
ESP32-S3-Touch-LCD-1.69	Waveshare	1	$12.00	$12.00
I2S MEMS Microphone	INMP441	1	$3.50	$3.50
I2S Amplifier	MAX98357A	1	$3.50	$3.50
Speaker (3W 8Ω)	Generic	1	$5.00	$5.00
LiPo Battery (1000mAh)	503040 JST 1.25	1	$7.00	$7.00
MicroSD Card (8GB)	SanDisk	1	$5.00	$5.00
Breadboard + Wires	Generic	1	$5.00	$5.00
Total				$41.00

Optional:

Enclosure/Case (3D printed or project box): $5-10
Backup battery: $7
USB-C cable: $3

Grand Total with Options: ~$56-63

References & Resources

LVGL Audio Visualization Examples

Music Player with FFT Spectrum - Instructables Guide
- Source: https://github.com/moononournation/LVGL_Music_Player.git
- Shows FFT-based audio visualization on LVGL canvas
LVGL Audio FFT Spectrum (Xiao S3) - GitHub: genvex/LVGL_Audio_FFT_Spectrum_xiaoS3_oled
- Real-time FFT visualization using low-level LVGL drawing
LVGL Audio FFT Spectrum - GitHub: imliubo/LVGL_Audio_FFT_Spectrum
- Alternative FFT spectrum implementation
Moving Waveform Discussion - LVGL Forum Thread
- Tips on efficiently displaying moving waveforms

ESP32-S3 Resources

Waveshare Wiki - https://www.waveshare.com/wiki/ESP32-S3-LCD-1.69
LVGL ESP32 Port - GitHub: lvgl/lv_port_esp32
ESP-IDF Documentation - https://docs.espressif.com/projects/esp-idf/en/latest/

Voice Assistant Project

Mycroft Precise Documentation - https://github.com/MycroftAI/mycroft-precise
Whisper OpenAI - https://github.com/openai/whisper
Piper TTS - https://github.com/rhasspy/piper

Next Steps

Order Hardware - ESP32-S3-Touch-LCD + audio components (~$41)
Setup ESP-IDF - Install ESP-IDF v5.3.1+ on development machine
Clone Examples - Get LVGL audio visualization examples for reference
Start Simple - Begin with LCD + LVGL test (no audio)
Add Audio - Wire I2S mic, test audio streaming
Waveform MVP - Get basic waveform rendering working
Full Integration - Connect to Heimdall voice server
Polish - Add touch controls, settings, battery support

Version: 1.0 Created: 2026-01-01 Status: Specification Complete, Ready for Implementation

32 KiB Executable file Raw Permalink Blame History Unescape Escape

ESP32-S3-Touch-LCD Voice Assistant - Technical Specification

Overview

Visual States & UI Design

State Machine

Visual Feedback Per State

1. IDLE State

2. LISTENING State

3. PROCESSING State

4. SPEAKING State

Additional UI Elements

Status Bar (All States)

Settings Screen (Touch Access)

Audio Pipeline Integration

I2S Configuration

Audio Buffer Management

Audio Streaming Task

LVGL Update Task

Touch Gesture Integration

Touch Controller (CST816D)

Gesture Actions by State

Network Communication

WiFi Configuration

Server Communication Protocol

Power Management

Battery Monitoring

Low Power Modes

Performance Optimization

LVGL Performance

Memory Management

Example Code Structure

File Organization

Main Entry Point

Testing Plan

Phase 1: Hardware Validation

Phase 2: Audio Pipeline

Phase 3: LVGL UI

Phase 4: Integration

Phase 5: Optimization

Bill of Materials (BOM)

References & Resources

LVGL Audio Visualization Examples

ESP32-S3 Resources

Voice Assistant Project

Next Steps

32 KiB

Executable file

Raw Permalink Blame History