Ports prior voice assistant research and prototypes from devl/Devops into the Minerva repo. Includes: - docs/: architecture, wake word guides, ESP32-S3 spec, hardware buying guide - scripts/: voice_server.py, voice_server_enhanced.py, setup scripts - hardware/maixduino/: edge device scripts with WiFi credentials scrubbed (replaced hardcoded password with secrets.py pattern) - config/.env.example: server config template - .gitignore: excludes .env, secrets.py, model blobs, ELF firmware - CLAUDE.md: Minerva product context and connection to cf-voice roadmap
32 KiB
Executable file
ESP32-S3-Touch-LCD Voice Assistant - Technical Specification
Date: 2026-01-01 Hardware: Waveshare ESP32-S3-Touch-LCD-1.69 Display: 240×280 ST7789V2 with Capacitive Touch Framework: ESP-IDF v5.3.1+ with LVGL 8.4.0+ Purpose: Voice assistant endpoint with real-time audio waveform visualization
Overview
Voice assistant client for ESP32-S3 with integrated LVGL-based visual feedback showing:
- Real-time audio waveform during listening
- Wake word detection animation
- Processing/thinking state
- Response state with audio output visualization
- Touch controls for volume, sensitivity, settings
Architecture:
┌─────────────────────────────────┐
│ ESP32-S3-Touch-LCD-1.69 │
│ │
│ ┌──────────────────────────┐ │
│ │ LVGL UI (240×280) │ │
│ │ - Waveform Canvas │ │
│ │ - State Indicators │ │──┐
│ │ - Touch Controls │ │ │
│ └──────────────────────────┘ │ │
│ │ │
│ ┌──────────────────────────┐ │ │ WiFi
│ │ Audio Pipeline │ │ │ Audio Stream
│ │ - I2S Mic Input │ │ │
│ │ - I2S Speaker Output │ │──┤
│ │ - Buffer Management │ │ │
│ └──────────────────────────┘ │ │
│ │ │
│ ┌──────────────────────────┐ │ │
│ │ State Machine │ │ │
│ │ - Idle → Listening │ │ │
│ │ - Processing → Speaking│ │──┘
│ └──────────────────────────┘ │
└─────────────────────────────────┘
│
│ TCP/HTTP
↓
┌─────────────────────────────────┐
│ Heimdall Voice Server │
│ (10.1.10.71:3006) │
│ │
│ - Mycroft Precise Wake Word │
│ - Whisper STT │
│ - Home Assistant Integration │
│ - Piper TTS │
└─────────────────────────────────┘
Visual States & UI Design
State Machine
┌─────────┐
│ IDLE │ ◄──────────────┐
└────┬────┘ │
│ │
Wake Word Detected │
│ │
↓ │
┌──────────┐ │
│LISTENING │ │
└────┬─────┘ │
│ │
End of Speech │
│ │
↓ │
┌───────────┐ │
│PROCESSING │ │
└─────┬─────┘ │
│ │
Response Ready │
│ │
↓ │
┌──────────┐ │
│ SPEAKING │ ───────────────┘
└──────────┘
Visual Feedback Per State
1. IDLE State
Display:
- Subtle pulsing ring animation (like Google Home)
- Time display from RTC
- Status icons (WiFi strength, battery level)
- Dim backlight (30-50%)
Colors:
- Background: Dark blue (#001F3F)
- Pulse ring: Cyan (#00BFFF)
- Text: White (#FFFFFF)
LVGL Widgets:
lv_obj_t *idle_screen;
lv_obj_t *pulse_ring; // Arc widget, animated rotation
lv_obj_t *time_label; // Label with RTC time
lv_obj_t *status_bar; // Container for icons
Animation:
- Slow pulse: 2-second breathing cycle
- Rotation: 360° over 10 seconds
2. LISTENING State
Display:
- Real-time audio waveform visualization
- Bright backlight (100%)
- "Listening..." text
- Cancel button (touch)
Waveform Visualization:
Option A: Canvas-Based Waveform (Recommended)
- Use LVGL
lv_canvasfor custom drawing - Draw waveform from audio buffer samples
- Scrolling waveform (left-to-right)
- Update rate: 30-60 FPS
Option B: Bar Chart Spectrum
- Use
lv_chartwith bar type - FFT-based spectrum analyzer
- 8-16 bars for frequency bins
- Update rate: 15-30 FPS
Colors:
- Background: Dark gray (#1A1A1A)
- Waveform: Green (#00FF00)
- Peak indicators: Yellow (#FFFF00)
- Clipping: Red (#FF0000)
LVGL Implementation:
// Canvas-based waveform
lv_obj_t *listening_screen;
lv_obj_t *waveform_canvas; // 240×180 canvas
lv_obj_t *listening_label; // "Listening..."
lv_obj_t *cancel_btn; // Touch to cancel
// Waveform buffer (circular buffer)
#define WAVEFORM_WIDTH 240
#define WAVEFORM_HEIGHT 180
#define WAVEFORM_CENTER (WAVEFORM_HEIGHT / 2)
int16_t waveform_buffer[WAVEFORM_WIDTH];
uint16_t waveform_index = 0;
// Drawing function (called from audio callback)
void draw_waveform(lv_obj_t *canvas, int16_t *audio_samples, size_t count) {
lv_canvas_fill_bg(canvas, lv_color_hex(0x1A1A1A), LV_OPA_COVER);
lv_draw_line_dsc_t line_dsc;
lv_draw_line_dsc_init(&line_dsc);
line_dsc.color = lv_color_hex(0x00FF00);
line_dsc.width = 2;
// Draw waveform line
for (int x = 0; x < WAVEFORM_WIDTH - 1; x++) {
int16_t y1 = WAVEFORM_CENTER + (waveform_buffer[x] / 256);
int16_t y2 = WAVEFORM_CENTER + (waveform_buffer[x + 1] / 256);
lv_point_t points[] = {{x, y1}, {x + 1, y2}};
lv_canvas_draw_line(canvas, points, 2, &line_dsc);
}
}
// Audio callback (I2S task)
void audio_i2s_callback(int16_t *samples, size_t count) {
// Downsample audio for waveform display
for (int i = 0; i < count; i += (count / WAVEFORM_WIDTH)) {
waveform_buffer[waveform_index] = samples[i];
waveform_index = (waveform_index + 1) % WAVEFORM_WIDTH;
}
// Trigger LVGL update (use event or flag)
xEventGroupSetBits(ui_event_group, WAVEFORM_UPDATE_BIT);
}
Touch Controls:
- Tap anywhere: Cancel listening
- Swipe down: Lower sensitivity
- Swipe up: Increase sensitivity
3. PROCESSING State
Display:
- Animated spinner/thinking indicator
- "Processing..." text
- Waveform fades out smoothly
Animation:
- Circular spinner with gradient
- Rotation: 360° per 1 second
- Pulsing opacity
Colors:
- Background: Dark gray (#1A1A1A)
- Spinner: Blue (#0080FF)
- Text: Light gray (#CCCCCC)
LVGL Implementation:
lv_obj_t *processing_screen;
lv_obj_t *spinner; // lv_spinner widget
lv_obj_t *processing_label; // "Processing..."
// Transition from listening to processing
void transition_to_processing(void) {
// Fade out waveform
lv_anim_t fade_out;
lv_anim_init(&fade_out);
lv_anim_set_var(&fade_out, waveform_canvas);
lv_anim_set_values(&fade_out, LV_OPA_COVER, LV_OPA_TRANSP);
lv_anim_set_time(&fade_out, 300);
lv_anim_set_exec_cb(&fade_out, lv_obj_set_style_opa);
lv_anim_start(&fade_out);
// Show spinner after fade
lv_timer_t *timer = lv_timer_create(show_spinner_callback, 300, NULL);
lv_timer_set_repeat_count(timer, 1);
}
4. SPEAKING State
Display:
- Audio output waveform (TTS playback visualization)
- "Speaking..." or response text snippet
- Volume indicator
Waveform:
- Same canvas as LISTENING but different color
- Shows output audio being played
- Synchronized with speaker output
Colors:
- Background: Dark gray (#1A1A1A)
- Waveform: Blue (#0080FF)
- Text: White (#FFFFFF)
LVGL Implementation:
lv_obj_t *speaking_screen;
lv_obj_t *output_waveform_canvas; // Same size as input waveform
lv_obj_t *response_label; // Show part of response text
lv_obj_t *volume_bar; // lv_bar widget for volume level
// Similar drawing to listening state, but fed from speaker buffer
void draw_output_waveform(lv_obj_t *canvas, int16_t *speaker_samples, size_t count) {
// Same logic as input waveform, different color
line_dsc.color = lv_color_hex(0x0080FF);
// ... draw logic
}
Touch Controls:
- Tap: Skip response (go back to idle)
- Volume slider: Adjust speaker volume
Additional UI Elements
Status Bar (All States)
Location: Top 20 pixels Contents:
- WiFi icon + signal strength
- Battery icon + percentage
- Time (from RTC)
- Mute icon (if muted)
LVGL Implementation:
lv_obj_t *status_bar;
lv_obj_t *wifi_icon;
lv_obj_t *battery_icon;
lv_obj_t *time_label;
lv_obj_t *mute_icon;
// Update every second
void update_status_bar(lv_timer_t *timer) {
// Update WiFi strength
int8_t rssi = wifi_get_rssi();
lv_img_set_src(wifi_icon, get_wifi_icon(rssi));
// Update battery
uint8_t battery_pct = battery_get_percentage();
lv_img_set_src(battery_icon, get_battery_icon(battery_pct));
// Update time from RTC
rtc_time_t time;
pcf85063_get_time(&time);
lv_label_set_text_fmt(time_label, "%02d:%02d", time.hour, time.min);
}
// Create timer for status bar updates
lv_timer_create(update_status_bar, 1000, NULL);
Settings Screen (Touch Access)
Trigger: Long-press on idle screen Contents:
- Volume slider
- Brightness slider
- Wake word sensitivity slider
- WiFi settings button
- About/Info button
LVGL Implementation:
lv_obj_t *settings_screen;
lv_obj_t *volume_slider;
lv_obj_t *brightness_slider;
lv_obj_t *sensitivity_slider;
lv_obj_t *wifi_btn;
lv_obj_t *about_btn;
lv_obj_t *back_btn;
// Slider event handler
static void slider_event_cb(lv_event_t *e) {
lv_obj_t *slider = lv_event_get_target(e);
int32_t value = lv_slider_get_value(slider);
if (slider == volume_slider) {
set_speaker_volume(value);
} else if (slider == brightness_slider) {
set_backlight_brightness(value);
} else if (slider == sensitivity_slider) {
set_wake_word_sensitivity(value);
}
}
Audio Pipeline Integration
I2S Configuration
Microphone (INMP441):
#define I2S_MIC_NUM I2S_NUM_0
#define I2S_MIC_BCLK_PIN GPIO_NUM_4 // Verify with board schematic
#define I2S_MIC_WS_PIN GPIO_NUM_5
#define I2S_MIC_DIN_PIN GPIO_NUM_6
#define I2S_MIC_SAMPLE_RATE 16000
#define I2S_MIC_BITS 16
#define I2S_MIC_CHANNELS 1
i2s_config_t i2s_mic_config = {
.mode = I2S_MODE_MASTER | I2S_MODE_RX,
.sample_rate = I2S_MIC_SAMPLE_RATE,
.bits_per_sample = I2S_BITS_PER_SAMPLE_16BIT,
.channel_format = I2S_CHANNEL_FMT_ONLY_LEFT,
.communication_format = I2S_COMM_FORMAT_STAND_I2S,
.intr_alloc_flags = ESP_INTR_FLAG_LEVEL1,
.dma_buf_count = 8,
.dma_buf_len = 256,
.use_apll = false,
.tx_desc_auto_clear = false,
.fixed_mclk = 0
};
i2s_pin_config_t i2s_mic_pins = {
.bck_io_num = I2S_MIC_BCLK_PIN,
.ws_io_num = I2S_MIC_WS_PIN,
.data_out_num = I2S_PIN_NO_CHANGE,
.data_in_num = I2S_MIC_DIN_PIN
};
void audio_init_microphone(void) {
i2s_driver_install(I2S_MIC_NUM, &i2s_mic_config, 0, NULL);
i2s_set_pin(I2S_MIC_NUM, &i2s_mic_pins);
i2s_zero_dma_buffer(I2S_MIC_NUM);
}
Speaker (MAX98357A I2S Amp):
#define I2S_SPK_NUM I2S_NUM_1
#define I2S_SPK_BCLK_PIN GPIO_NUM_7 // Verify with board schematic
#define I2S_SPK_WS_PIN GPIO_NUM_8
#define I2S_SPK_DOUT_PIN GPIO_NUM_9
#define I2S_SPK_SAMPLE_RATE 16000
#define I2S_SPK_BITS 16
#define I2S_SPK_CHANNELS 1
i2s_config_t i2s_spk_config = {
.mode = I2S_MODE_MASTER | I2S_MODE_TX,
.sample_rate = I2S_SPK_SAMPLE_RATE,
.bits_per_sample = I2S_BITS_PER_SAMPLE_16BIT,
.channel_format = I2S_CHANNEL_FMT_ONLY_LEFT,
.communication_format = I2S_COMM_FORMAT_STAND_I2S,
.intr_alloc_flags = ESP_INTR_FLAG_LEVEL1,
.dma_buf_count = 8,
.dma_buf_len = 256,
.use_apll = false,
.tx_desc_auto_clear = true,
.fixed_mclk = 0
};
i2s_pin_config_t i2s_spk_pins = {
.bck_io_num = I2S_SPK_BCLK_PIN,
.ws_io_num = I2S_SPK_WS_PIN,
.data_out_num = I2S_SPK_DOUT_PIN,
.data_in_num = I2S_PIN_NO_CHANGE
};
void audio_init_speaker(void) {
i2s_driver_install(I2S_SPK_NUM, &i2s_spk_config, 0, NULL);
i2s_set_pin(I2S_SPK_NUM, &i2s_spk_pins);
i2s_zero_dma_buffer(I2S_SPK_NUM);
}
Audio Buffer Management
Circular Buffer for Waveform:
#define AUDIO_BUFFER_SIZE 2048
#define WAVEFORM_DECIMATION 8 // Downsample for display
typedef struct {
int16_t samples[AUDIO_BUFFER_SIZE];
uint16_t write_idx;
uint16_t read_idx;
SemaphoreHandle_t mutex;
} audio_buffer_t;
audio_buffer_t mic_buffer;
audio_buffer_t spk_buffer;
void audio_buffer_init(audio_buffer_t *buf) {
memset(buf->samples, 0, sizeof(buf->samples));
buf->write_idx = 0;
buf->read_idx = 0;
buf->mutex = xSemaphoreCreateMutex();
}
void audio_buffer_write(audio_buffer_t *buf, int16_t *samples, size_t count) {
xSemaphoreTake(buf->mutex, portMAX_DELAY);
for (size_t i = 0; i < count; i++) {
buf->samples[buf->write_idx] = samples[i];
buf->write_idx = (buf->write_idx + 1) % AUDIO_BUFFER_SIZE;
}
xSemaphoreGive(buf->mutex);
}
// Get downsampled samples for waveform display
void audio_buffer_get_waveform(audio_buffer_t *buf, int16_t *out, size_t out_count) {
xSemaphoreTake(buf->mutex, portMAX_DELAY);
for (size_t i = 0; i < out_count; i++) {
size_t src_idx = (buf->write_idx + (i * WAVEFORM_DECIMATION)) % AUDIO_BUFFER_SIZE;
out[i] = buf->samples[src_idx];
}
xSemaphoreGive(buf->mutex);
}
Audio Streaming Task
Microphone Input Task:
void audio_mic_task(void *pvParameters) {
int16_t i2s_buffer[256];
size_t bytes_read;
while (1) {
// Read from I2S microphone
i2s_read(I2S_MIC_NUM, i2s_buffer, sizeof(i2s_buffer), &bytes_read, portMAX_DELAY);
size_t samples_read = bytes_read / sizeof(int16_t);
if (current_state == STATE_LISTENING) {
// Write to circular buffer for waveform display
audio_buffer_write(&mic_buffer, i2s_buffer, samples_read);
// Send to Heimdall server via WiFi
audio_send_to_server(i2s_buffer, samples_read);
// Trigger waveform update
xEventGroupSetBits(ui_event_group, WAVEFORM_UPDATE_BIT);
}
}
}
Speaker Output Task:
void audio_speaker_task(void *pvParameters) {
int16_t i2s_buffer[256];
size_t bytes_written;
while (1) {
// Receive audio from Heimdall server
size_t samples_received = audio_receive_from_server(i2s_buffer, 256);
if (samples_received > 0 && current_state == STATE_SPEAKING) {
// Write to circular buffer for waveform display
audio_buffer_write(&spk_buffer, i2s_buffer, samples_received);
// Play through I2S speaker
i2s_write(I2S_SPK_NUM, i2s_buffer, samples_received * sizeof(int16_t),
&bytes_written, portMAX_DELAY);
// Trigger waveform update
xEventGroupSetBits(ui_event_group, WAVEFORM_UPDATE_BIT);
} else {
vTaskDelay(pdMS_TO_TICKS(10));
}
}
}
LVGL Update Task
Waveform Rendering Task:
void lvgl_waveform_task(void *pvParameters) {
int16_t waveform_samples[WAVEFORM_WIDTH];
while (1) {
// Wait for waveform update event
EventBits_t bits = xEventGroupWaitBits(ui_event_group, WAVEFORM_UPDATE_BIT,
pdTRUE, pdFALSE, pdMS_TO_TICKS(50));
if (bits & WAVEFORM_UPDATE_BIT) {
if (current_state == STATE_LISTENING) {
// Get downsampled mic data
audio_buffer_get_waveform(&mic_buffer, waveform_samples, WAVEFORM_WIDTH);
// Draw on LVGL canvas (must lock LVGL)
lvgl_lock();
draw_waveform(waveform_canvas, waveform_samples, WAVEFORM_WIDTH);
lvgl_unlock();
} else if (current_state == STATE_SPEAKING) {
// Get downsampled speaker data
audio_buffer_get_waveform(&spk_buffer, waveform_samples, WAVEFORM_WIDTH);
lvgl_lock();
draw_output_waveform(output_waveform_canvas, waveform_samples, WAVEFORM_WIDTH);
lvgl_unlock();
}
}
}
}
Touch Gesture Integration
Touch Controller (CST816D)
Gestures Supported:
- Single tap
- Long press
- Swipe up/down/left/right
Implementation:
#define TOUCH_I2C_NUM I2C_NUM_0
#define TOUCH_SDA_PIN GPIO_NUM_6
#define TOUCH_SCL_PIN GPIO_NUM_7
#define TOUCH_INT_PIN GPIO_NUM_9
#define TOUCH_RST_PIN GPIO_NUM_10
typedef enum {
GESTURE_NONE = 0,
GESTURE_TAP,
GESTURE_LONG_PRESS,
GESTURE_SWIPE_UP,
GESTURE_SWIPE_DOWN,
GESTURE_SWIPE_LEFT,
GESTURE_SWIPE_RIGHT
} touch_gesture_t;
void touch_init(void) {
// I2C init for CST816D
i2c_config_t conf = {
.mode = I2C_MODE_MASTER,
.sda_io_num = TOUCH_SDA_PIN,
.scl_io_num = TOUCH_SCL_PIN,
.sda_pullup_en = GPIO_PULLUP_ENABLE,
.scl_pullup_en = GPIO_PULLUP_ENABLE,
.master.clk_speed = 100000,
};
i2c_param_config(TOUCH_I2C_NUM, &conf);
i2c_driver_install(TOUCH_I2C_NUM, conf.mode, 0, 0, 0);
// Reset touch controller
gpio_set_direction(TOUCH_RST_PIN, GPIO_MODE_OUTPUT);
gpio_set_level(TOUCH_RST_PIN, 0);
vTaskDelay(pdMS_TO_TICKS(10));
gpio_set_level(TOUCH_RST_PIN, 1);
vTaskDelay(pdMS_TO_TICKS(50));
// Configure interrupt pin
gpio_set_direction(TOUCH_INT_PIN, GPIO_MODE_INPUT);
gpio_set_intr_type(TOUCH_INT_PIN, GPIO_INTR_NEGEDGE);
gpio_install_isr_service(0);
gpio_isr_handler_add(TOUCH_INT_PIN, touch_isr_handler, NULL);
}
touch_gesture_t touch_read_gesture(void) {
uint8_t data[8];
// Read gesture from CST816D register 0x01
i2c_master_read_from_device(TOUCH_I2C_NUM, CST816D_ADDR, 0x01, data, 8, pdMS_TO_TICKS(100));
return (touch_gesture_t)data[0];
}
Gesture Actions by State
IDLE State:
- Tap: Wake up display (if dimmed)
- Long Press: Open settings screen
- Swipe Up: Show more info (weather, calendar)
LISTENING State:
- Tap: Cancel listening, return to idle
- Swipe Down: Lower wake word sensitivity
- Swipe Up: Raise wake word sensitivity
SPEAKING State:
- Tap: Skip response, return to idle
- Swipe Left/Right: Volume down/up
PROCESSING State:
- Tap: Cancel processing (if possible)
Network Communication
WiFi Configuration
Connection:
#define WIFI_SSID "YourNetworkName"
#define WIFI_PASSWORD "YourPassword"
#define SERVER_URL "http://10.1.10.71:3006"
void wifi_init(void) {
esp_netif_init();
esp_event_loop_create_default();
esp_netif_create_default_wifi_sta();
wifi_init_config_t cfg = WIFI_INIT_CONFIG_DEFAULT();
esp_wifi_init(&cfg);
wifi_config_t wifi_config = {
.sta = {
.ssid = WIFI_SSID,
.password = WIFI_PASSWORD,
},
};
esp_wifi_set_mode(WIFI_MODE_STA);
esp_wifi_set_config(WIFI_IF_STA, &wifi_config);
esp_wifi_start();
esp_wifi_connect();
}
Server Communication Protocol
Endpoints:
GET /health- Server health checkPOST /audio/stream- Stream audio to server (multipart)GET /audio/tts- Receive TTS audio responseGET /wake-word/status- Check wake word detection status
Audio Streaming (WebSockets Recommended):
#include "esp_websocket_client.h"
esp_websocket_client_handle_t ws_client;
void websocket_init(void) {
esp_websocket_client_config_t ws_cfg = {
.uri = "ws://10.1.10.71:3006/ws/audio",
.buffer_size = 2048,
};
ws_client = esp_websocket_client_init(&ws_cfg);
esp_websocket_register_events(ws_client, WEBSOCKET_EVENT_ANY,
websocket_event_handler, NULL);
esp_websocket_client_start(ws_client);
}
void audio_send_to_server(int16_t *samples, size_t count) {
if (esp_websocket_client_is_connected(ws_client)) {
esp_websocket_client_send_bin(ws_client, (char*)samples,
count * sizeof(int16_t), portMAX_DELAY);
}
}
size_t audio_receive_from_server(int16_t *out_buffer, size_t max_samples) {
// Receive audio from server (blocking with timeout)
int len = esp_websocket_client_recv(ws_client, (char*)out_buffer,
max_samples * sizeof(int16_t), pdMS_TO_TICKS(100));
return (len > 0) ? (len / sizeof(int16_t)) : 0;
}
Alternative: HTTP Chunked Transfer (Simpler):
void audio_stream_http(void) {
esp_http_client_config_t config = {
.url = "http://10.1.10.71:3006/audio/stream",
.method = HTTP_METHOD_POST,
};
esp_http_client_handle_t client = esp_http_client_init(&config);
// Set headers
esp_http_client_set_header(client, "Content-Type", "audio/pcm");
esp_http_client_set_header(client, "Transfer-Encoding", "chunked");
esp_http_client_open(client, -1); // -1 = chunked mode
// Stream audio chunks
int16_t buffer[256];
while (current_state == STATE_LISTENING) {
// Read from mic
size_t bytes_read;
i2s_read(I2S_MIC_NUM, buffer, sizeof(buffer), &bytes_read, portMAX_DELAY);
// Send to server
esp_http_client_write(client, (char*)buffer, bytes_read);
}
esp_http_client_close(client);
esp_http_client_cleanup(client);
}
Power Management
Battery Monitoring
ETA6098 Charging Chip:
#define BATTERY_ADC_CHANNEL ADC1_CHANNEL_0 // GPIO1 (example)
#define BATTERY_FULL_MV 4200
#define BATTERY_EMPTY_MV 3300
void battery_init(void) {
adc1_config_width(ADC_WIDTH_BIT_12);
adc1_config_channel_atten(BATTERY_ADC_CHANNEL, ADC_ATTEN_DB_11);
}
uint8_t battery_get_percentage(void) {
int adc_reading = adc1_get_raw(BATTERY_ADC_CHANNEL);
int voltage_mv = esp_adc_cal_raw_to_voltage(adc_reading, &adc_chars);
if (voltage_mv >= BATTERY_FULL_MV) return 100;
if (voltage_mv <= BATTERY_EMPTY_MV) return 0;
return ((voltage_mv - BATTERY_EMPTY_MV) * 100) / (BATTERY_FULL_MV - BATTERY_EMPTY_MV);
}
bool battery_is_charging(void) {
// Check SYS_OUT pin (GPIO36) - high when charging
gpio_set_direction(GPIO_NUM_36, GPIO_MODE_INPUT);
return gpio_get_level(GPIO_NUM_36);
}
Low Power Modes
Deep Sleep When Idle (Optional):
#define IDLE_TIMEOUT_MS 300000 // 5 minutes
void enter_deep_sleep(void) {
// Save state to RTC memory
RTC_DATA_ATTR static uint32_t boot_count = 0;
boot_count++;
// Configure wake sources
esp_sleep_enable_ext0_wakeup(TOUCH_INT_PIN, 0); // Wake on touch
esp_sleep_enable_timer_wakeup(3600 * 1000000ULL); // Wake every hour
// Turn off display
gpio_set_level(LCD_BL_PIN, 0);
// Enter deep sleep
esp_deep_sleep_start();
}
Performance Optimization
LVGL Performance
Buffer Configuration:
#define LVGL_BUFFER_SIZE (240 * 280 * 2) // Full screen buffer
static lv_color_t buf_1[LVGL_BUFFER_SIZE / 10]; // 1/10 screen buffer
static lv_color_t buf_2[LVGL_BUFFER_SIZE / 10]; // Double buffering
lv_disp_draw_buf_t draw_buf;
lv_disp_draw_buf_init(&draw_buf, buf_1, buf_2, LVGL_BUFFER_SIZE / 10);
Task Priority:
#define LVGL_TASK_PRIORITY 5
#define AUDIO_MIC_TASK_PRIORITY 10 // Higher priority for audio
#define AUDIO_SPK_TASK_PRIORITY 10
#define WIFI_TASK_PRIORITY 8
#define WAVEFORM_TASK_PRIORITY 4 // Lower priority for visuals
void app_main(void) {
// Create tasks with priorities
xTaskCreatePinnedToCore(lvgl_task, "LVGL", 8192, NULL, LVGL_TASK_PRIORITY, NULL, 1);
xTaskCreatePinnedToCore(audio_mic_task, "MIC", 4096, NULL, AUDIO_MIC_TASK_PRIORITY, NULL, 0);
xTaskCreatePinnedToCore(audio_speaker_task, "SPK", 4096, NULL, AUDIO_SPK_TASK_PRIORITY, NULL, 0);
xTaskCreatePinnedToCore(lvgl_waveform_task, "WAVE", 4096, NULL, WAVEFORM_TASK_PRIORITY, NULL, 1);
}
Reduce Waveform Update Rate:
// Only update waveform at 30 FPS, not every audio sample
#define WAVEFORM_UPDATE_MS 33 // ~30 FPS
void lvgl_waveform_task(void *pvParameters) {
TickType_t last_update = xTaskGetTickCount();
while (1) {
TickType_t now = xTaskGetTickCount();
if ((now - last_update) >= pdMS_TO_TICKS(WAVEFORM_UPDATE_MS)) {
// Update waveform
last_update = now;
}
vTaskDelay(pdMS_TO_TICKS(10));
}
}
Memory Management
PSRAM Usage:
// Allocate large buffers in PSRAM (8MB available)
#define AUDIO_LARGE_BUFFER_SIZE (16000 * 10) // 10 seconds at 16kHz
int16_t *audio_history = heap_caps_malloc(AUDIO_LARGE_BUFFER_SIZE * sizeof(int16_t),
MALLOC_CAP_SPIRAM);
// Check if allocation succeeded
if (audio_history == NULL) {
ESP_LOGE(TAG, "Failed to allocate PSRAM buffer");
}
Heap Monitoring:
void log_memory_stats(void) {
ESP_LOGI(TAG, "Free heap: %d bytes", esp_get_free_heap_size());
ESP_LOGI(TAG, "Free PSRAM: %d bytes", heap_caps_get_free_size(MALLOC_CAP_SPIRAM));
ESP_LOGI(TAG, "Min free heap: %d bytes", esp_get_minimum_free_heap_size());
}
Example Code Structure
File Organization
esp32_voice_assistant/
├── main/
│ ├── main.c # Entry point, task creation
│ ├── audio/
│ │ ├── audio_input.c # I2S microphone handling
│ │ ├── audio_output.c # I2S speaker handling
│ │ ├── audio_buffer.c # Circular buffer management
│ │ └── audio_network.c # WebSocket/HTTP streaming
│ ├── ui/
│ │ ├── ui_init.c # LVGL setup, screen creation
│ │ ├── ui_idle.c # Idle screen UI
│ │ ├── ui_listening.c # Listening screen + waveform
│ │ ├── ui_processing.c # Processing screen + spinner
│ │ ├── ui_speaking.c # Speaking screen + output waveform
│ │ ├── ui_settings.c # Settings screen
│ │ └── ui_waveform.c # Waveform drawing functions
│ ├── touch/
│ │ ├── touch_cst816d.c # Touch controller driver
│ │ └── touch_gestures.c # Gesture recognition
│ ├── network/
│ │ └── wifi_manager.c # WiFi connection management
│ ├── power/
│ │ ├── battery.c # Battery monitoring
│ │ └── power_mgmt.c # Sleep modes
│ └── state_machine.c # Voice assistant state machine
├── components/
│ └── lvgl/ # LVGL library (ESP-IDF component)
├── CMakeLists.txt
└── sdkconfig # ESP-IDF configuration
Main Entry Point
// main/main.c
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "esp_log.h"
static const char *TAG = "VOICE_ASSISTANT";
void app_main(void) {
ESP_LOGI(TAG, "Voice Assistant Starting...");
// Initialize hardware
nvs_flash_init(); // Non-volatile storage
gpio_install_isr_service(0);// GPIO interrupts
// Power management
battery_init();
// Display and touch
lcd_init();
touch_init();
ui_init();
// Audio pipeline
audio_init_microphone();
audio_init_speaker();
audio_buffer_init(&mic_buffer);
audio_buffer_init(&spk_buffer);
// Network
wifi_init();
websocket_init();
// State machine
state_machine_init();
// Create FreeRTOS tasks
xTaskCreatePinnedToCore(lvgl_task, "LVGL", 8192, NULL, 5, NULL, 1);
xTaskCreatePinnedToCore(audio_mic_task, "MIC", 4096, NULL, 10, NULL, 0);
xTaskCreatePinnedToCore(audio_speaker_task, "SPK", 4096, NULL, 10, NULL, 0);
xTaskCreatePinnedToCore(lvgl_waveform_task, "WAVE", 4096, NULL, 4, NULL, 1);
xTaskCreatePinnedToCore(state_machine_task, "STATE", 4096, NULL, 7, NULL, 0);
ESP_LOGI(TAG, "Voice Assistant Running!");
}
Testing Plan
Phase 1: Hardware Validation
- LCD display working (show test pattern)
- Touch controller responding (log touch coordinates)
- Buzzer working (play test tone)
- WiFi connecting (check IP address)
- Battery reading (log voltage)
- RTC working (log time)
- IMU working (log accelerometer values)
Phase 2: Audio Pipeline
- I2S microphone reading audio (log levels)
- Audio streaming to Heimdall server
- I2S speaker playing audio (test tone)
- TTS audio playback from server
- Audio buffer management (no overflows)
Phase 3: LVGL UI
- Idle screen displays correctly
- State transitions smooth
- Waveform renders at 30 FPS
- Touch gestures recognized
- Settings screen functional
- Status bar updates correctly
Phase 4: Integration
- Wake word detection triggers listening state
- Waveform shows mic input in real-time
- Processing state shows after speech ends
- TTS response plays with output waveform
- Touch cancel works in all states
- Battery indicator accurate
Phase 5: Optimization
- Memory usage stable (no leaks)
- CPU usage acceptable (<80% average)
- WiFi latency <100ms
- Audio latency <200ms end-to-end
- Display framerate stable (30 FPS)
- Battery life >4 hours continuous
Bill of Materials (BOM)
| Component | Part Number | Quantity | Unit Price | Total |
|---|---|---|---|---|
| ESP32-S3-Touch-LCD-1.69 | Waveshare | 1 | $12.00 | $12.00 |
| I2S MEMS Microphone | INMP441 | 1 | $3.50 | $3.50 |
| I2S Amplifier | MAX98357A | 1 | $3.50 | $3.50 |
| Speaker (3W 8Ω) | Generic | 1 | $5.00 | $5.00 |
| LiPo Battery (1000mAh) | 503040 JST 1.25 | 1 | $7.00 | $7.00 |
| MicroSD Card (8GB) | SanDisk | 1 | $5.00 | $5.00 |
| Breadboard + Wires | Generic | 1 | $5.00 | $5.00 |
| Total | $41.00 |
Optional:
- Enclosure/Case (3D printed or project box): $5-10
- Backup battery: $7
- USB-C cable: $3
Grand Total with Options: ~$56-63
References & Resources
LVGL Audio Visualization Examples
-
Music Player with FFT Spectrum - Instructables Guide
- Source: https://github.com/moononournation/LVGL_Music_Player.git
- Shows FFT-based audio visualization on LVGL canvas
-
LVGL Audio FFT Spectrum (Xiao S3) - GitHub: genvex/LVGL_Audio_FFT_Spectrum_xiaoS3_oled
- Real-time FFT visualization using low-level LVGL drawing
-
LVGL Audio FFT Spectrum - GitHub: imliubo/LVGL_Audio_FFT_Spectrum
- Alternative FFT spectrum implementation
-
Moving Waveform Discussion - LVGL Forum Thread
- Tips on efficiently displaying moving waveforms
ESP32-S3 Resources
- Waveshare Wiki - https://www.waveshare.com/wiki/ESP32-S3-LCD-1.69
- LVGL ESP32 Port - GitHub: lvgl/lv_port_esp32
- ESP-IDF Documentation - https://docs.espressif.com/projects/esp-idf/en/latest/
Voice Assistant Project
- Mycroft Precise Documentation - https://github.com/MycroftAI/mycroft-precise
- Whisper OpenAI - https://github.com/openai/whisper
- Piper TTS - https://github.com/rhasspy/piper
Next Steps
- Order Hardware - ESP32-S3-Touch-LCD + audio components (~$41)
- Setup ESP-IDF - Install ESP-IDF v5.3.1+ on development machine
- Clone Examples - Get LVGL audio visualization examples for reference
- Start Simple - Begin with LCD + LVGL test (no audio)
- Add Audio - Wire I2S mic, test audio streaming
- Waveform MVP - Get basic waveform rendering working
- Full Integration - Connect to Heimdall voice server
- Polish - Add touch controls, settings, battery support
Version: 1.0 Created: 2026-01-01 Status: Specification Complete, Ready for Implementation