minerva/docs/ESP32_S3_VOICE_ASSISTANT_SPEC.md
pyr0ball 173f7f37d4 feat: import mycroft-precise work as Minerva foundation
Ports prior voice assistant research and prototypes from devl/Devops
into the Minerva repo. Includes:

- docs/: architecture, wake word guides, ESP32-S3 spec, hardware buying guide
- scripts/: voice_server.py, voice_server_enhanced.py, setup scripts
- hardware/maixduino/: edge device scripts with WiFi credentials scrubbed
  (replaced hardcoded password with secrets.py pattern)
- config/.env.example: server config template
- .gitignore: excludes .env, secrets.py, model blobs, ELF firmware
- CLAUDE.md: Minerva product context and connection to cf-voice roadmap
2026-04-06 22:21:12 -07:00

1089 lines
32 KiB
Markdown
Executable file
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# ESP32-S3-Touch-LCD Voice Assistant - Technical Specification
**Date:** 2026-01-01
**Hardware:** Waveshare ESP32-S3-Touch-LCD-1.69
**Display:** 240×280 ST7789V2 with Capacitive Touch
**Framework:** ESP-IDF v5.3.1+ with LVGL 8.4.0+
**Purpose:** Voice assistant endpoint with real-time audio waveform visualization
---
## Overview
Voice assistant client for ESP32-S3 with integrated LVGL-based visual feedback showing:
- Real-time audio waveform during listening
- Wake word detection animation
- Processing/thinking state
- Response state with audio output visualization
- Touch controls for volume, sensitivity, settings
**Architecture:**
```
┌─────────────────────────────────┐
│ ESP32-S3-Touch-LCD-1.69 │
│ │
│ ┌──────────────────────────┐ │
│ │ LVGL UI (240×280) │ │
│ │ - Waveform Canvas │ │
│ │ - State Indicators │ │──┐
│ │ - Touch Controls │ │ │
│ └──────────────────────────┘ │ │
│ │ │
│ ┌──────────────────────────┐ │ │ WiFi
│ │ Audio Pipeline │ │ │ Audio Stream
│ │ - I2S Mic Input │ │ │
│ │ - I2S Speaker Output │ │──┤
│ │ - Buffer Management │ │ │
│ └──────────────────────────┘ │ │
│ │ │
│ ┌──────────────────────────┐ │ │
│ │ State Machine │ │ │
│ │ - Idle → Listening │ │ │
│ │ - Processing → Speaking│ │──┘
│ └──────────────────────────┘ │
└─────────────────────────────────┘
│ TCP/HTTP
┌─────────────────────────────────┐
│ Heimdall Voice Server │
│ (10.1.10.71:3006) │
│ │
│ - Mycroft Precise Wake Word │
│ - Whisper STT │
│ - Home Assistant Integration │
│ - Piper TTS │
└─────────────────────────────────┘
```
---
## Visual States & UI Design
### State Machine
```
┌─────────┐
│ IDLE │ ◄──────────────┐
└────┬────┘ │
│ │
Wake Word Detected │
│ │
↓ │
┌──────────┐ │
│LISTENING │ │
└────┬─────┘ │
│ │
End of Speech │
│ │
↓ │
┌───────────┐ │
│PROCESSING │ │
└─────┬─────┘ │
│ │
Response Ready │
│ │
↓ │
┌──────────┐ │
│ SPEAKING │ ───────────────┘
└──────────┘
```
### Visual Feedback Per State
#### 1. IDLE State
**Display:**
- Subtle pulsing ring animation (like Google Home)
- Time display from RTC
- Status icons (WiFi strength, battery level)
- Dim backlight (30-50%)
**Colors:**
- Background: Dark blue (#001F3F)
- Pulse ring: Cyan (#00BFFF)
- Text: White (#FFFFFF)
**LVGL Widgets:**
```c
lv_obj_t *idle_screen;
lv_obj_t *pulse_ring; // Arc widget, animated rotation
lv_obj_t *time_label; // Label with RTC time
lv_obj_t *status_bar; // Container for icons
```
**Animation:**
- Slow pulse: 2-second breathing cycle
- Rotation: 360° over 10 seconds
---
#### 2. LISTENING State
**Display:**
- Real-time audio waveform visualization
- Bright backlight (100%)
- "Listening..." text
- Cancel button (touch)
**Waveform Visualization:**
**Option A: Canvas-Based Waveform (Recommended)**
- Use LVGL `lv_canvas` for custom drawing
- Draw waveform from audio buffer samples
- Scrolling waveform (left-to-right)
- Update rate: 30-60 FPS
**Option B: Bar Chart Spectrum**
- Use `lv_chart` with bar type
- FFT-based spectrum analyzer
- 8-16 bars for frequency bins
- Update rate: 15-30 FPS
**Colors:**
- Background: Dark gray (#1A1A1A)
- Waveform: Green (#00FF00)
- Peak indicators: Yellow (#FFFF00)
- Clipping: Red (#FF0000)
**LVGL Implementation:**
```c
// Canvas-based waveform
lv_obj_t *listening_screen;
lv_obj_t *waveform_canvas; // 240×180 canvas
lv_obj_t *listening_label; // "Listening..."
lv_obj_t *cancel_btn; // Touch to cancel
// Waveform buffer (circular buffer)
#define WAVEFORM_WIDTH 240
#define WAVEFORM_HEIGHT 180
#define WAVEFORM_CENTER (WAVEFORM_HEIGHT / 2)
int16_t waveform_buffer[WAVEFORM_WIDTH];
uint16_t waveform_index = 0;
// Drawing function (called from audio callback)
void draw_waveform(lv_obj_t *canvas, int16_t *audio_samples, size_t count) {
lv_canvas_fill_bg(canvas, lv_color_hex(0x1A1A1A), LV_OPA_COVER);
lv_draw_line_dsc_t line_dsc;
lv_draw_line_dsc_init(&line_dsc);
line_dsc.color = lv_color_hex(0x00FF00);
line_dsc.width = 2;
// Draw waveform line
for (int x = 0; x < WAVEFORM_WIDTH - 1; x++) {
int16_t y1 = WAVEFORM_CENTER + (waveform_buffer[x] / 256);
int16_t y2 = WAVEFORM_CENTER + (waveform_buffer[x + 1] / 256);
lv_point_t points[] = {{x, y1}, {x + 1, y2}};
lv_canvas_draw_line(canvas, points, 2, &line_dsc);
}
}
// Audio callback (I2S task)
void audio_i2s_callback(int16_t *samples, size_t count) {
// Downsample audio for waveform display
for (int i = 0; i < count; i += (count / WAVEFORM_WIDTH)) {
waveform_buffer[waveform_index] = samples[i];
waveform_index = (waveform_index + 1) % WAVEFORM_WIDTH;
}
// Trigger LVGL update (use event or flag)
xEventGroupSetBits(ui_event_group, WAVEFORM_UPDATE_BIT);
}
```
**Touch Controls:**
- Tap anywhere: Cancel listening
- Swipe down: Lower sensitivity
- Swipe up: Increase sensitivity
---
#### 3. PROCESSING State
**Display:**
- Animated spinner/thinking indicator
- "Processing..." text
- Waveform fades out smoothly
**Animation:**
- Circular spinner with gradient
- Rotation: 360° per 1 second
- Pulsing opacity
**Colors:**
- Background: Dark gray (#1A1A1A)
- Spinner: Blue (#0080FF)
- Text: Light gray (#CCCCCC)
**LVGL Implementation:**
```c
lv_obj_t *processing_screen;
lv_obj_t *spinner; // lv_spinner widget
lv_obj_t *processing_label; // "Processing..."
// Transition from listening to processing
void transition_to_processing(void) {
// Fade out waveform
lv_anim_t fade_out;
lv_anim_init(&fade_out);
lv_anim_set_var(&fade_out, waveform_canvas);
lv_anim_set_values(&fade_out, LV_OPA_COVER, LV_OPA_TRANSP);
lv_anim_set_time(&fade_out, 300);
lv_anim_set_exec_cb(&fade_out, lv_obj_set_style_opa);
lv_anim_start(&fade_out);
// Show spinner after fade
lv_timer_t *timer = lv_timer_create(show_spinner_callback, 300, NULL);
lv_timer_set_repeat_count(timer, 1);
}
```
---
#### 4. SPEAKING State
**Display:**
- Audio output waveform (TTS playback visualization)
- "Speaking..." or response text snippet
- Volume indicator
**Waveform:**
- Same canvas as LISTENING but different color
- Shows output audio being played
- Synchronized with speaker output
**Colors:**
- Background: Dark gray (#1A1A1A)
- Waveform: Blue (#0080FF)
- Text: White (#FFFFFF)
**LVGL Implementation:**
```c
lv_obj_t *speaking_screen;
lv_obj_t *output_waveform_canvas; // Same size as input waveform
lv_obj_t *response_label; // Show part of response text
lv_obj_t *volume_bar; // lv_bar widget for volume level
// Similar drawing to listening state, but fed from speaker buffer
void draw_output_waveform(lv_obj_t *canvas, int16_t *speaker_samples, size_t count) {
// Same logic as input waveform, different color
line_dsc.color = lv_color_hex(0x0080FF);
// ... draw logic
}
```
**Touch Controls:**
- Tap: Skip response (go back to idle)
- Volume slider: Adjust speaker volume
---
### Additional UI Elements
#### Status Bar (All States)
**Location:** Top 20 pixels
**Contents:**
- WiFi icon + signal strength
- Battery icon + percentage
- Time (from RTC)
- Mute icon (if muted)
**LVGL Implementation:**
```c
lv_obj_t *status_bar;
lv_obj_t *wifi_icon;
lv_obj_t *battery_icon;
lv_obj_t *time_label;
lv_obj_t *mute_icon;
// Update every second
void update_status_bar(lv_timer_t *timer) {
// Update WiFi strength
int8_t rssi = wifi_get_rssi();
lv_img_set_src(wifi_icon, get_wifi_icon(rssi));
// Update battery
uint8_t battery_pct = battery_get_percentage();
lv_img_set_src(battery_icon, get_battery_icon(battery_pct));
// Update time from RTC
rtc_time_t time;
pcf85063_get_time(&time);
lv_label_set_text_fmt(time_label, "%02d:%02d", time.hour, time.min);
}
// Create timer for status bar updates
lv_timer_create(update_status_bar, 1000, NULL);
```
#### Settings Screen (Touch Access)
**Trigger:** Long-press on idle screen
**Contents:**
- Volume slider
- Brightness slider
- Wake word sensitivity slider
- WiFi settings button
- About/Info button
**LVGL Implementation:**
```c
lv_obj_t *settings_screen;
lv_obj_t *volume_slider;
lv_obj_t *brightness_slider;
lv_obj_t *sensitivity_slider;
lv_obj_t *wifi_btn;
lv_obj_t *about_btn;
lv_obj_t *back_btn;
// Slider event handler
static void slider_event_cb(lv_event_t *e) {
lv_obj_t *slider = lv_event_get_target(e);
int32_t value = lv_slider_get_value(slider);
if (slider == volume_slider) {
set_speaker_volume(value);
} else if (slider == brightness_slider) {
set_backlight_brightness(value);
} else if (slider == sensitivity_slider) {
set_wake_word_sensitivity(value);
}
}
```
---
## Audio Pipeline Integration
### I2S Configuration
**Microphone (INMP441):**
```c
#define I2S_MIC_NUM I2S_NUM_0
#define I2S_MIC_BCLK_PIN GPIO_NUM_4 // Verify with board schematic
#define I2S_MIC_WS_PIN GPIO_NUM_5
#define I2S_MIC_DIN_PIN GPIO_NUM_6
#define I2S_MIC_SAMPLE_RATE 16000
#define I2S_MIC_BITS 16
#define I2S_MIC_CHANNELS 1
i2s_config_t i2s_mic_config = {
.mode = I2S_MODE_MASTER | I2S_MODE_RX,
.sample_rate = I2S_MIC_SAMPLE_RATE,
.bits_per_sample = I2S_BITS_PER_SAMPLE_16BIT,
.channel_format = I2S_CHANNEL_FMT_ONLY_LEFT,
.communication_format = I2S_COMM_FORMAT_STAND_I2S,
.intr_alloc_flags = ESP_INTR_FLAG_LEVEL1,
.dma_buf_count = 8,
.dma_buf_len = 256,
.use_apll = false,
.tx_desc_auto_clear = false,
.fixed_mclk = 0
};
i2s_pin_config_t i2s_mic_pins = {
.bck_io_num = I2S_MIC_BCLK_PIN,
.ws_io_num = I2S_MIC_WS_PIN,
.data_out_num = I2S_PIN_NO_CHANGE,
.data_in_num = I2S_MIC_DIN_PIN
};
void audio_init_microphone(void) {
i2s_driver_install(I2S_MIC_NUM, &i2s_mic_config, 0, NULL);
i2s_set_pin(I2S_MIC_NUM, &i2s_mic_pins);
i2s_zero_dma_buffer(I2S_MIC_NUM);
}
```
**Speaker (MAX98357A I2S Amp):**
```c
#define I2S_SPK_NUM I2S_NUM_1
#define I2S_SPK_BCLK_PIN GPIO_NUM_7 // Verify with board schematic
#define I2S_SPK_WS_PIN GPIO_NUM_8
#define I2S_SPK_DOUT_PIN GPIO_NUM_9
#define I2S_SPK_SAMPLE_RATE 16000
#define I2S_SPK_BITS 16
#define I2S_SPK_CHANNELS 1
i2s_config_t i2s_spk_config = {
.mode = I2S_MODE_MASTER | I2S_MODE_TX,
.sample_rate = I2S_SPK_SAMPLE_RATE,
.bits_per_sample = I2S_BITS_PER_SAMPLE_16BIT,
.channel_format = I2S_CHANNEL_FMT_ONLY_LEFT,
.communication_format = I2S_COMM_FORMAT_STAND_I2S,
.intr_alloc_flags = ESP_INTR_FLAG_LEVEL1,
.dma_buf_count = 8,
.dma_buf_len = 256,
.use_apll = false,
.tx_desc_auto_clear = true,
.fixed_mclk = 0
};
i2s_pin_config_t i2s_spk_pins = {
.bck_io_num = I2S_SPK_BCLK_PIN,
.ws_io_num = I2S_SPK_WS_PIN,
.data_out_num = I2S_SPK_DOUT_PIN,
.data_in_num = I2S_PIN_NO_CHANGE
};
void audio_init_speaker(void) {
i2s_driver_install(I2S_SPK_NUM, &i2s_spk_config, 0, NULL);
i2s_set_pin(I2S_SPK_NUM, &i2s_spk_pins);
i2s_zero_dma_buffer(I2S_SPK_NUM);
}
```
### Audio Buffer Management
**Circular Buffer for Waveform:**
```c
#define AUDIO_BUFFER_SIZE 2048
#define WAVEFORM_DECIMATION 8 // Downsample for display
typedef struct {
int16_t samples[AUDIO_BUFFER_SIZE];
uint16_t write_idx;
uint16_t read_idx;
SemaphoreHandle_t mutex;
} audio_buffer_t;
audio_buffer_t mic_buffer;
audio_buffer_t spk_buffer;
void audio_buffer_init(audio_buffer_t *buf) {
memset(buf->samples, 0, sizeof(buf->samples));
buf->write_idx = 0;
buf->read_idx = 0;
buf->mutex = xSemaphoreCreateMutex();
}
void audio_buffer_write(audio_buffer_t *buf, int16_t *samples, size_t count) {
xSemaphoreTake(buf->mutex, portMAX_DELAY);
for (size_t i = 0; i < count; i++) {
buf->samples[buf->write_idx] = samples[i];
buf->write_idx = (buf->write_idx + 1) % AUDIO_BUFFER_SIZE;
}
xSemaphoreGive(buf->mutex);
}
// Get downsampled samples for waveform display
void audio_buffer_get_waveform(audio_buffer_t *buf, int16_t *out, size_t out_count) {
xSemaphoreTake(buf->mutex, portMAX_DELAY);
for (size_t i = 0; i < out_count; i++) {
size_t src_idx = (buf->write_idx + (i * WAVEFORM_DECIMATION)) % AUDIO_BUFFER_SIZE;
out[i] = buf->samples[src_idx];
}
xSemaphoreGive(buf->mutex);
}
```
### Audio Streaming Task
**Microphone Input Task:**
```c
void audio_mic_task(void *pvParameters) {
int16_t i2s_buffer[256];
size_t bytes_read;
while (1) {
// Read from I2S microphone
i2s_read(I2S_MIC_NUM, i2s_buffer, sizeof(i2s_buffer), &bytes_read, portMAX_DELAY);
size_t samples_read = bytes_read / sizeof(int16_t);
if (current_state == STATE_LISTENING) {
// Write to circular buffer for waveform display
audio_buffer_write(&mic_buffer, i2s_buffer, samples_read);
// Send to Heimdall server via WiFi
audio_send_to_server(i2s_buffer, samples_read);
// Trigger waveform update
xEventGroupSetBits(ui_event_group, WAVEFORM_UPDATE_BIT);
}
}
}
```
**Speaker Output Task:**
```c
void audio_speaker_task(void *pvParameters) {
int16_t i2s_buffer[256];
size_t bytes_written;
while (1) {
// Receive audio from Heimdall server
size_t samples_received = audio_receive_from_server(i2s_buffer, 256);
if (samples_received > 0 && current_state == STATE_SPEAKING) {
// Write to circular buffer for waveform display
audio_buffer_write(&spk_buffer, i2s_buffer, samples_received);
// Play through I2S speaker
i2s_write(I2S_SPK_NUM, i2s_buffer, samples_received * sizeof(int16_t),
&bytes_written, portMAX_DELAY);
// Trigger waveform update
xEventGroupSetBits(ui_event_group, WAVEFORM_UPDATE_BIT);
} else {
vTaskDelay(pdMS_TO_TICKS(10));
}
}
}
```
### LVGL Update Task
**Waveform Rendering Task:**
```c
void lvgl_waveform_task(void *pvParameters) {
int16_t waveform_samples[WAVEFORM_WIDTH];
while (1) {
// Wait for waveform update event
EventBits_t bits = xEventGroupWaitBits(ui_event_group, WAVEFORM_UPDATE_BIT,
pdTRUE, pdFALSE, pdMS_TO_TICKS(50));
if (bits & WAVEFORM_UPDATE_BIT) {
if (current_state == STATE_LISTENING) {
// Get downsampled mic data
audio_buffer_get_waveform(&mic_buffer, waveform_samples, WAVEFORM_WIDTH);
// Draw on LVGL canvas (must lock LVGL)
lvgl_lock();
draw_waveform(waveform_canvas, waveform_samples, WAVEFORM_WIDTH);
lvgl_unlock();
} else if (current_state == STATE_SPEAKING) {
// Get downsampled speaker data
audio_buffer_get_waveform(&spk_buffer, waveform_samples, WAVEFORM_WIDTH);
lvgl_lock();
draw_output_waveform(output_waveform_canvas, waveform_samples, WAVEFORM_WIDTH);
lvgl_unlock();
}
}
}
}
```
---
## Touch Gesture Integration
### Touch Controller (CST816D)
**Gestures Supported:**
- Single tap
- Long press
- Swipe up/down/left/right
**Implementation:**
```c
#define TOUCH_I2C_NUM I2C_NUM_0
#define TOUCH_SDA_PIN GPIO_NUM_6
#define TOUCH_SCL_PIN GPIO_NUM_7
#define TOUCH_INT_PIN GPIO_NUM_9
#define TOUCH_RST_PIN GPIO_NUM_10
typedef enum {
GESTURE_NONE = 0,
GESTURE_TAP,
GESTURE_LONG_PRESS,
GESTURE_SWIPE_UP,
GESTURE_SWIPE_DOWN,
GESTURE_SWIPE_LEFT,
GESTURE_SWIPE_RIGHT
} touch_gesture_t;
void touch_init(void) {
// I2C init for CST816D
i2c_config_t conf = {
.mode = I2C_MODE_MASTER,
.sda_io_num = TOUCH_SDA_PIN,
.scl_io_num = TOUCH_SCL_PIN,
.sda_pullup_en = GPIO_PULLUP_ENABLE,
.scl_pullup_en = GPIO_PULLUP_ENABLE,
.master.clk_speed = 100000,
};
i2c_param_config(TOUCH_I2C_NUM, &conf);
i2c_driver_install(TOUCH_I2C_NUM, conf.mode, 0, 0, 0);
// Reset touch controller
gpio_set_direction(TOUCH_RST_PIN, GPIO_MODE_OUTPUT);
gpio_set_level(TOUCH_RST_PIN, 0);
vTaskDelay(pdMS_TO_TICKS(10));
gpio_set_level(TOUCH_RST_PIN, 1);
vTaskDelay(pdMS_TO_TICKS(50));
// Configure interrupt pin
gpio_set_direction(TOUCH_INT_PIN, GPIO_MODE_INPUT);
gpio_set_intr_type(TOUCH_INT_PIN, GPIO_INTR_NEGEDGE);
gpio_install_isr_service(0);
gpio_isr_handler_add(TOUCH_INT_PIN, touch_isr_handler, NULL);
}
touch_gesture_t touch_read_gesture(void) {
uint8_t data[8];
// Read gesture from CST816D register 0x01
i2c_master_read_from_device(TOUCH_I2C_NUM, CST816D_ADDR, 0x01, data, 8, pdMS_TO_TICKS(100));
return (touch_gesture_t)data[0];
}
```
### Gesture Actions by State
**IDLE State:**
- **Tap:** Wake up display (if dimmed)
- **Long Press:** Open settings screen
- **Swipe Up:** Show more info (weather, calendar)
**LISTENING State:**
- **Tap:** Cancel listening, return to idle
- **Swipe Down:** Lower wake word sensitivity
- **Swipe Up:** Raise wake word sensitivity
**SPEAKING State:**
- **Tap:** Skip response, return to idle
- **Swipe Left/Right:** Volume down/up
**PROCESSING State:**
- **Tap:** Cancel processing (if possible)
---
## Network Communication
### WiFi Configuration
**Connection:**
```c
#define WIFI_SSID "YourNetworkName"
#define WIFI_PASSWORD "YourPassword"
#define SERVER_URL "http://10.1.10.71:3006"
void wifi_init(void) {
esp_netif_init();
esp_event_loop_create_default();
esp_netif_create_default_wifi_sta();
wifi_init_config_t cfg = WIFI_INIT_CONFIG_DEFAULT();
esp_wifi_init(&cfg);
wifi_config_t wifi_config = {
.sta = {
.ssid = WIFI_SSID,
.password = WIFI_PASSWORD,
},
};
esp_wifi_set_mode(WIFI_MODE_STA);
esp_wifi_set_config(WIFI_IF_STA, &wifi_config);
esp_wifi_start();
esp_wifi_connect();
}
```
### Server Communication Protocol
**Endpoints:**
- `GET /health` - Server health check
- `POST /audio/stream` - Stream audio to server (multipart)
- `GET /audio/tts` - Receive TTS audio response
- `GET /wake-word/status` - Check wake word detection status
**Audio Streaming (WebSockets Recommended):**
```c
#include "esp_websocket_client.h"
esp_websocket_client_handle_t ws_client;
void websocket_init(void) {
esp_websocket_client_config_t ws_cfg = {
.uri = "ws://10.1.10.71:3006/ws/audio",
.buffer_size = 2048,
};
ws_client = esp_websocket_client_init(&ws_cfg);
esp_websocket_register_events(ws_client, WEBSOCKET_EVENT_ANY,
websocket_event_handler, NULL);
esp_websocket_client_start(ws_client);
}
void audio_send_to_server(int16_t *samples, size_t count) {
if (esp_websocket_client_is_connected(ws_client)) {
esp_websocket_client_send_bin(ws_client, (char*)samples,
count * sizeof(int16_t), portMAX_DELAY);
}
}
size_t audio_receive_from_server(int16_t *out_buffer, size_t max_samples) {
// Receive audio from server (blocking with timeout)
int len = esp_websocket_client_recv(ws_client, (char*)out_buffer,
max_samples * sizeof(int16_t), pdMS_TO_TICKS(100));
return (len > 0) ? (len / sizeof(int16_t)) : 0;
}
```
**Alternative: HTTP Chunked Transfer (Simpler):**
```c
void audio_stream_http(void) {
esp_http_client_config_t config = {
.url = "http://10.1.10.71:3006/audio/stream",
.method = HTTP_METHOD_POST,
};
esp_http_client_handle_t client = esp_http_client_init(&config);
// Set headers
esp_http_client_set_header(client, "Content-Type", "audio/pcm");
esp_http_client_set_header(client, "Transfer-Encoding", "chunked");
esp_http_client_open(client, -1); // -1 = chunked mode
// Stream audio chunks
int16_t buffer[256];
while (current_state == STATE_LISTENING) {
// Read from mic
size_t bytes_read;
i2s_read(I2S_MIC_NUM, buffer, sizeof(buffer), &bytes_read, portMAX_DELAY);
// Send to server
esp_http_client_write(client, (char*)buffer, bytes_read);
}
esp_http_client_close(client);
esp_http_client_cleanup(client);
}
```
---
## Power Management
### Battery Monitoring
**ETA6098 Charging Chip:**
```c
#define BATTERY_ADC_CHANNEL ADC1_CHANNEL_0 // GPIO1 (example)
#define BATTERY_FULL_MV 4200
#define BATTERY_EMPTY_MV 3300
void battery_init(void) {
adc1_config_width(ADC_WIDTH_BIT_12);
adc1_config_channel_atten(BATTERY_ADC_CHANNEL, ADC_ATTEN_DB_11);
}
uint8_t battery_get_percentage(void) {
int adc_reading = adc1_get_raw(BATTERY_ADC_CHANNEL);
int voltage_mv = esp_adc_cal_raw_to_voltage(adc_reading, &adc_chars);
if (voltage_mv >= BATTERY_FULL_MV) return 100;
if (voltage_mv <= BATTERY_EMPTY_MV) return 0;
return ((voltage_mv - BATTERY_EMPTY_MV) * 100) / (BATTERY_FULL_MV - BATTERY_EMPTY_MV);
}
bool battery_is_charging(void) {
// Check SYS_OUT pin (GPIO36) - high when charging
gpio_set_direction(GPIO_NUM_36, GPIO_MODE_INPUT);
return gpio_get_level(GPIO_NUM_36);
}
```
### Low Power Modes
**Deep Sleep When Idle (Optional):**
```c
#define IDLE_TIMEOUT_MS 300000 // 5 minutes
void enter_deep_sleep(void) {
// Save state to RTC memory
RTC_DATA_ATTR static uint32_t boot_count = 0;
boot_count++;
// Configure wake sources
esp_sleep_enable_ext0_wakeup(TOUCH_INT_PIN, 0); // Wake on touch
esp_sleep_enable_timer_wakeup(3600 * 1000000ULL); // Wake every hour
// Turn off display
gpio_set_level(LCD_BL_PIN, 0);
// Enter deep sleep
esp_deep_sleep_start();
}
```
---
## Performance Optimization
### LVGL Performance
**Buffer Configuration:**
```c
#define LVGL_BUFFER_SIZE (240 * 280 * 2) // Full screen buffer
static lv_color_t buf_1[LVGL_BUFFER_SIZE / 10]; // 1/10 screen buffer
static lv_color_t buf_2[LVGL_BUFFER_SIZE / 10]; // Double buffering
lv_disp_draw_buf_t draw_buf;
lv_disp_draw_buf_init(&draw_buf, buf_1, buf_2, LVGL_BUFFER_SIZE / 10);
```
**Task Priority:**
```c
#define LVGL_TASK_PRIORITY 5
#define AUDIO_MIC_TASK_PRIORITY 10 // Higher priority for audio
#define AUDIO_SPK_TASK_PRIORITY 10
#define WIFI_TASK_PRIORITY 8
#define WAVEFORM_TASK_PRIORITY 4 // Lower priority for visuals
void app_main(void) {
// Create tasks with priorities
xTaskCreatePinnedToCore(lvgl_task, "LVGL", 8192, NULL, LVGL_TASK_PRIORITY, NULL, 1);
xTaskCreatePinnedToCore(audio_mic_task, "MIC", 4096, NULL, AUDIO_MIC_TASK_PRIORITY, NULL, 0);
xTaskCreatePinnedToCore(audio_speaker_task, "SPK", 4096, NULL, AUDIO_SPK_TASK_PRIORITY, NULL, 0);
xTaskCreatePinnedToCore(lvgl_waveform_task, "WAVE", 4096, NULL, WAVEFORM_TASK_PRIORITY, NULL, 1);
}
```
**Reduce Waveform Update Rate:**
```c
// Only update waveform at 30 FPS, not every audio sample
#define WAVEFORM_UPDATE_MS 33 // ~30 FPS
void lvgl_waveform_task(void *pvParameters) {
TickType_t last_update = xTaskGetTickCount();
while (1) {
TickType_t now = xTaskGetTickCount();
if ((now - last_update) >= pdMS_TO_TICKS(WAVEFORM_UPDATE_MS)) {
// Update waveform
last_update = now;
}
vTaskDelay(pdMS_TO_TICKS(10));
}
}
```
### Memory Management
**PSRAM Usage:**
```c
// Allocate large buffers in PSRAM (8MB available)
#define AUDIO_LARGE_BUFFER_SIZE (16000 * 10) // 10 seconds at 16kHz
int16_t *audio_history = heap_caps_malloc(AUDIO_LARGE_BUFFER_SIZE * sizeof(int16_t),
MALLOC_CAP_SPIRAM);
// Check if allocation succeeded
if (audio_history == NULL) {
ESP_LOGE(TAG, "Failed to allocate PSRAM buffer");
}
```
**Heap Monitoring:**
```c
void log_memory_stats(void) {
ESP_LOGI(TAG, "Free heap: %d bytes", esp_get_free_heap_size());
ESP_LOGI(TAG, "Free PSRAM: %d bytes", heap_caps_get_free_size(MALLOC_CAP_SPIRAM));
ESP_LOGI(TAG, "Min free heap: %d bytes", esp_get_minimum_free_heap_size());
}
```
---
## Example Code Structure
### File Organization
```
esp32_voice_assistant/
├── main/
│ ├── main.c # Entry point, task creation
│ ├── audio/
│ │ ├── audio_input.c # I2S microphone handling
│ │ ├── audio_output.c # I2S speaker handling
│ │ ├── audio_buffer.c # Circular buffer management
│ │ └── audio_network.c # WebSocket/HTTP streaming
│ ├── ui/
│ │ ├── ui_init.c # LVGL setup, screen creation
│ │ ├── ui_idle.c # Idle screen UI
│ │ ├── ui_listening.c # Listening screen + waveform
│ │ ├── ui_processing.c # Processing screen + spinner
│ │ ├── ui_speaking.c # Speaking screen + output waveform
│ │ ├── ui_settings.c # Settings screen
│ │ └── ui_waveform.c # Waveform drawing functions
│ ├── touch/
│ │ ├── touch_cst816d.c # Touch controller driver
│ │ └── touch_gestures.c # Gesture recognition
│ ├── network/
│ │ └── wifi_manager.c # WiFi connection management
│ ├── power/
│ │ ├── battery.c # Battery monitoring
│ │ └── power_mgmt.c # Sleep modes
│ └── state_machine.c # Voice assistant state machine
├── components/
│ └── lvgl/ # LVGL library (ESP-IDF component)
├── CMakeLists.txt
└── sdkconfig # ESP-IDF configuration
```
### Main Entry Point
```c
// main/main.c
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "esp_log.h"
static const char *TAG = "VOICE_ASSISTANT";
void app_main(void) {
ESP_LOGI(TAG, "Voice Assistant Starting...");
// Initialize hardware
nvs_flash_init(); // Non-volatile storage
gpio_install_isr_service(0);// GPIO interrupts
// Power management
battery_init();
// Display and touch
lcd_init();
touch_init();
ui_init();
// Audio pipeline
audio_init_microphone();
audio_init_speaker();
audio_buffer_init(&mic_buffer);
audio_buffer_init(&spk_buffer);
// Network
wifi_init();
websocket_init();
// State machine
state_machine_init();
// Create FreeRTOS tasks
xTaskCreatePinnedToCore(lvgl_task, "LVGL", 8192, NULL, 5, NULL, 1);
xTaskCreatePinnedToCore(audio_mic_task, "MIC", 4096, NULL, 10, NULL, 0);
xTaskCreatePinnedToCore(audio_speaker_task, "SPK", 4096, NULL, 10, NULL, 0);
xTaskCreatePinnedToCore(lvgl_waveform_task, "WAVE", 4096, NULL, 4, NULL, 1);
xTaskCreatePinnedToCore(state_machine_task, "STATE", 4096, NULL, 7, NULL, 0);
ESP_LOGI(TAG, "Voice Assistant Running!");
}
```
---
## Testing Plan
### Phase 1: Hardware Validation
- [ ] LCD display working (show test pattern)
- [ ] Touch controller responding (log touch coordinates)
- [ ] Buzzer working (play test tone)
- [ ] WiFi connecting (check IP address)
- [ ] Battery reading (log voltage)
- [ ] RTC working (log time)
- [ ] IMU working (log accelerometer values)
### Phase 2: Audio Pipeline
- [ ] I2S microphone reading audio (log levels)
- [ ] Audio streaming to Heimdall server
- [ ] I2S speaker playing audio (test tone)
- [ ] TTS audio playback from server
- [ ] Audio buffer management (no overflows)
### Phase 3: LVGL UI
- [ ] Idle screen displays correctly
- [ ] State transitions smooth
- [ ] Waveform renders at 30 FPS
- [ ] Touch gestures recognized
- [ ] Settings screen functional
- [ ] Status bar updates correctly
### Phase 4: Integration
- [ ] Wake word detection triggers listening state
- [ ] Waveform shows mic input in real-time
- [ ] Processing state shows after speech ends
- [ ] TTS response plays with output waveform
- [ ] Touch cancel works in all states
- [ ] Battery indicator accurate
### Phase 5: Optimization
- [ ] Memory usage stable (no leaks)
- [ ] CPU usage acceptable (<80% average)
- [ ] WiFi latency <100ms
- [ ] Audio latency <200ms end-to-end
- [ ] Display framerate stable (30 FPS)
- [ ] Battery life >4 hours continuous
---
## Bill of Materials (BOM)
| Component | Part Number | Quantity | Unit Price | Total |
|-----------|-------------|----------|------------|-------|
| ESP32-S3-Touch-LCD-1.69 | Waveshare | 1 | $12.00 | $12.00 |
| I2S MEMS Microphone | INMP441 | 1 | $3.50 | $3.50 |
| I2S Amplifier | MAX98357A | 1 | $3.50 | $3.50 |
| Speaker (3W 8Ω) | Generic | 1 | $5.00 | $5.00 |
| LiPo Battery (1000mAh) | 503040 JST 1.25 | 1 | $7.00 | $7.00 |
| MicroSD Card (8GB) | SanDisk | 1 | $5.00 | $5.00 |
| Breadboard + Wires | Generic | 1 | $5.00 | $5.00 |
| **Total** | | | | **$41.00** |
**Optional:**
- Enclosure/Case (3D printed or project box): $5-10
- Backup battery: $7
- USB-C cable: $3
**Grand Total with Options:** ~$56-63
---
## References & Resources
### LVGL Audio Visualization Examples
- **Music Player with FFT Spectrum** - [Instructables Guide](https://www.instructables.com/Design-Music-Player-UI-With-LVGL/)
- Source: https://github.com/moononournation/LVGL_Music_Player.git
- Shows FFT-based audio visualization on LVGL canvas
- **LVGL Audio FFT Spectrum (Xiao S3)** - [GitHub: genvex/LVGL_Audio_FFT_Spectrum_xiaoS3_oled](https://github.com/genvex/LVGL_Audio_FFT_Spectrum_xiaoS3_oled)
- Real-time FFT visualization using low-level LVGL drawing
- **LVGL Audio FFT Spectrum** - [GitHub: imliubo/LVGL_Audio_FFT_Spectrum](https://github.com/imliubo/LVGL_Audio_FFT_Spectrum)
- Alternative FFT spectrum implementation
- **Moving Waveform Discussion** - [LVGL Forum Thread](https://forum.lvgl.io/t/best-method-to-display-a-moving-waveform/17361)
- Tips on efficiently displaying moving waveforms
### ESP32-S3 Resources
- **Waveshare Wiki** - https://www.waveshare.com/wiki/ESP32-S3-LCD-1.69
- **LVGL ESP32 Port** - [GitHub: lvgl/lv_port_esp32](https://github.com/lvgl/lv_port_esp32)
- **ESP-IDF Documentation** - https://docs.espressif.com/projects/esp-idf/en/latest/
### Voice Assistant Project
- **Mycroft Precise Documentation** - https://github.com/MycroftAI/mycroft-precise
- **Whisper OpenAI** - https://github.com/openai/whisper
- **Piper TTS** - https://github.com/rhasspy/piper
---
## Next Steps
1. **Order Hardware** - ESP32-S3-Touch-LCD + audio components (~$41)
2. **Setup ESP-IDF** - Install ESP-IDF v5.3.1+ on development machine
3. **Clone Examples** - Get LVGL audio visualization examples for reference
4. **Start Simple** - Begin with LCD + LVGL test (no audio)
5. **Add Audio** - Wire I2S mic, test audio streaming
6. **Waveform MVP** - Get basic waveform rendering working
7. **Full Integration** - Connect to Heimdall voice server
8. **Polish** - Add touch controls, settings, battery support
---
**Version:** 1.0
**Created:** 2026-01-01
**Status:** Specification Complete, Ready for Implementation