Ports prior voice assistant research and prototypes from devl/Devops into the Minerva repo. Includes: - docs/: architecture, wake word guides, ESP32-S3 spec, hardware buying guide - scripts/: voice_server.py, voice_server_enhanced.py, setup scripts - hardware/maixduino/: edge device scripts with WiFi credentials scrubbed (replaced hardcoded password with secrets.py pattern) - config/.env.example: server config template - .gitignore: excludes .env, secrets.py, model blobs, ELF firmware - CLAUDE.md: Minerva product context and connection to cf-voice roadmap
1089 lines
32 KiB
Markdown
Executable file
1089 lines
32 KiB
Markdown
Executable file
# ESP32-S3-Touch-LCD Voice Assistant - Technical Specification
|
||
|
||
**Date:** 2026-01-01
|
||
**Hardware:** Waveshare ESP32-S3-Touch-LCD-1.69
|
||
**Display:** 240×280 ST7789V2 with Capacitive Touch
|
||
**Framework:** ESP-IDF v5.3.1+ with LVGL 8.4.0+
|
||
**Purpose:** Voice assistant endpoint with real-time audio waveform visualization
|
||
|
||
---
|
||
|
||
## Overview
|
||
|
||
Voice assistant client for ESP32-S3 with integrated LVGL-based visual feedback showing:
|
||
- Real-time audio waveform during listening
|
||
- Wake word detection animation
|
||
- Processing/thinking state
|
||
- Response state with audio output visualization
|
||
- Touch controls for volume, sensitivity, settings
|
||
|
||
**Architecture:**
|
||
```
|
||
┌─────────────────────────────────┐
|
||
│ ESP32-S3-Touch-LCD-1.69 │
|
||
│ │
|
||
│ ┌──────────────────────────┐ │
|
||
│ │ LVGL UI (240×280) │ │
|
||
│ │ - Waveform Canvas │ │
|
||
│ │ - State Indicators │ │──┐
|
||
│ │ - Touch Controls │ │ │
|
||
│ └──────────────────────────┘ │ │
|
||
│ │ │
|
||
│ ┌──────────────────────────┐ │ │ WiFi
|
||
│ │ Audio Pipeline │ │ │ Audio Stream
|
||
│ │ - I2S Mic Input │ │ │
|
||
│ │ - I2S Speaker Output │ │──┤
|
||
│ │ - Buffer Management │ │ │
|
||
│ └──────────────────────────┘ │ │
|
||
│ │ │
|
||
│ ┌──────────────────────────┐ │ │
|
||
│ │ State Machine │ │ │
|
||
│ │ - Idle → Listening │ │ │
|
||
│ │ - Processing → Speaking│ │──┘
|
||
│ └──────────────────────────┘ │
|
||
└─────────────────────────────────┘
|
||
│
|
||
│ TCP/HTTP
|
||
↓
|
||
┌─────────────────────────────────┐
|
||
│ Heimdall Voice Server │
|
||
│ (10.1.10.71:3006) │
|
||
│ │
|
||
│ - Mycroft Precise Wake Word │
|
||
│ - Whisper STT │
|
||
│ - Home Assistant Integration │
|
||
│ - Piper TTS │
|
||
└─────────────────────────────────┘
|
||
```
|
||
|
||
---
|
||
|
||
## Visual States & UI Design
|
||
|
||
### State Machine
|
||
|
||
```
|
||
┌─────────┐
|
||
│ IDLE │ ◄──────────────┐
|
||
└────┬────┘ │
|
||
│ │
|
||
Wake Word Detected │
|
||
│ │
|
||
↓ │
|
||
┌──────────┐ │
|
||
│LISTENING │ │
|
||
└────┬─────┘ │
|
||
│ │
|
||
End of Speech │
|
||
│ │
|
||
↓ │
|
||
┌───────────┐ │
|
||
│PROCESSING │ │
|
||
└─────┬─────┘ │
|
||
│ │
|
||
Response Ready │
|
||
│ │
|
||
↓ │
|
||
┌──────────┐ │
|
||
│ SPEAKING │ ───────────────┘
|
||
└──────────┘
|
||
```
|
||
|
||
### Visual Feedback Per State
|
||
|
||
#### 1. IDLE State
|
||
**Display:**
|
||
- Subtle pulsing ring animation (like Google Home)
|
||
- Time display from RTC
|
||
- Status icons (WiFi strength, battery level)
|
||
- Dim backlight (30-50%)
|
||
|
||
**Colors:**
|
||
- Background: Dark blue (#001F3F)
|
||
- Pulse ring: Cyan (#00BFFF)
|
||
- Text: White (#FFFFFF)
|
||
|
||
**LVGL Widgets:**
|
||
```c
|
||
lv_obj_t *idle_screen;
|
||
lv_obj_t *pulse_ring; // Arc widget, animated rotation
|
||
lv_obj_t *time_label; // Label with RTC time
|
||
lv_obj_t *status_bar; // Container for icons
|
||
```
|
||
|
||
**Animation:**
|
||
- Slow pulse: 2-second breathing cycle
|
||
- Rotation: 360° over 10 seconds
|
||
|
||
---
|
||
|
||
#### 2. LISTENING State
|
||
**Display:**
|
||
- Real-time audio waveform visualization
|
||
- Bright backlight (100%)
|
||
- "Listening..." text
|
||
- Cancel button (touch)
|
||
|
||
**Waveform Visualization:**
|
||
|
||
**Option A: Canvas-Based Waveform (Recommended)**
|
||
- Use LVGL `lv_canvas` for custom drawing
|
||
- Draw waveform from audio buffer samples
|
||
- Scrolling waveform (left-to-right)
|
||
- Update rate: 30-60 FPS
|
||
|
||
**Option B: Bar Chart Spectrum**
|
||
- Use `lv_chart` with bar type
|
||
- FFT-based spectrum analyzer
|
||
- 8-16 bars for frequency bins
|
||
- Update rate: 15-30 FPS
|
||
|
||
**Colors:**
|
||
- Background: Dark gray (#1A1A1A)
|
||
- Waveform: Green (#00FF00)
|
||
- Peak indicators: Yellow (#FFFF00)
|
||
- Clipping: Red (#FF0000)
|
||
|
||
**LVGL Implementation:**
|
||
```c
|
||
// Canvas-based waveform
|
||
lv_obj_t *listening_screen;
|
||
lv_obj_t *waveform_canvas; // 240×180 canvas
|
||
lv_obj_t *listening_label; // "Listening..."
|
||
lv_obj_t *cancel_btn; // Touch to cancel
|
||
|
||
// Waveform buffer (circular buffer)
|
||
#define WAVEFORM_WIDTH 240
|
||
#define WAVEFORM_HEIGHT 180
|
||
#define WAVEFORM_CENTER (WAVEFORM_HEIGHT / 2)
|
||
int16_t waveform_buffer[WAVEFORM_WIDTH];
|
||
uint16_t waveform_index = 0;
|
||
|
||
// Drawing function (called from audio callback)
|
||
void draw_waveform(lv_obj_t *canvas, int16_t *audio_samples, size_t count) {
|
||
lv_canvas_fill_bg(canvas, lv_color_hex(0x1A1A1A), LV_OPA_COVER);
|
||
|
||
lv_draw_line_dsc_t line_dsc;
|
||
lv_draw_line_dsc_init(&line_dsc);
|
||
line_dsc.color = lv_color_hex(0x00FF00);
|
||
line_dsc.width = 2;
|
||
|
||
// Draw waveform line
|
||
for (int x = 0; x < WAVEFORM_WIDTH - 1; x++) {
|
||
int16_t y1 = WAVEFORM_CENTER + (waveform_buffer[x] / 256);
|
||
int16_t y2 = WAVEFORM_CENTER + (waveform_buffer[x + 1] / 256);
|
||
|
||
lv_point_t points[] = {{x, y1}, {x + 1, y2}};
|
||
lv_canvas_draw_line(canvas, points, 2, &line_dsc);
|
||
}
|
||
}
|
||
|
||
// Audio callback (I2S task)
|
||
void audio_i2s_callback(int16_t *samples, size_t count) {
|
||
// Downsample audio for waveform display
|
||
for (int i = 0; i < count; i += (count / WAVEFORM_WIDTH)) {
|
||
waveform_buffer[waveform_index] = samples[i];
|
||
waveform_index = (waveform_index + 1) % WAVEFORM_WIDTH;
|
||
}
|
||
|
||
// Trigger LVGL update (use event or flag)
|
||
xEventGroupSetBits(ui_event_group, WAVEFORM_UPDATE_BIT);
|
||
}
|
||
```
|
||
|
||
**Touch Controls:**
|
||
- Tap anywhere: Cancel listening
|
||
- Swipe down: Lower sensitivity
|
||
- Swipe up: Increase sensitivity
|
||
|
||
---
|
||
|
||
#### 3. PROCESSING State
|
||
**Display:**
|
||
- Animated spinner/thinking indicator
|
||
- "Processing..." text
|
||
- Waveform fades out smoothly
|
||
|
||
**Animation:**
|
||
- Circular spinner with gradient
|
||
- Rotation: 360° per 1 second
|
||
- Pulsing opacity
|
||
|
||
**Colors:**
|
||
- Background: Dark gray (#1A1A1A)
|
||
- Spinner: Blue (#0080FF)
|
||
- Text: Light gray (#CCCCCC)
|
||
|
||
**LVGL Implementation:**
|
||
```c
|
||
lv_obj_t *processing_screen;
|
||
lv_obj_t *spinner; // lv_spinner widget
|
||
lv_obj_t *processing_label; // "Processing..."
|
||
|
||
// Transition from listening to processing
|
||
void transition_to_processing(void) {
|
||
// Fade out waveform
|
||
lv_anim_t fade_out;
|
||
lv_anim_init(&fade_out);
|
||
lv_anim_set_var(&fade_out, waveform_canvas);
|
||
lv_anim_set_values(&fade_out, LV_OPA_COVER, LV_OPA_TRANSP);
|
||
lv_anim_set_time(&fade_out, 300);
|
||
lv_anim_set_exec_cb(&fade_out, lv_obj_set_style_opa);
|
||
lv_anim_start(&fade_out);
|
||
|
||
// Show spinner after fade
|
||
lv_timer_t *timer = lv_timer_create(show_spinner_callback, 300, NULL);
|
||
lv_timer_set_repeat_count(timer, 1);
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
#### 4. SPEAKING State
|
||
**Display:**
|
||
- Audio output waveform (TTS playback visualization)
|
||
- "Speaking..." or response text snippet
|
||
- Volume indicator
|
||
|
||
**Waveform:**
|
||
- Same canvas as LISTENING but different color
|
||
- Shows output audio being played
|
||
- Synchronized with speaker output
|
||
|
||
**Colors:**
|
||
- Background: Dark gray (#1A1A1A)
|
||
- Waveform: Blue (#0080FF)
|
||
- Text: White (#FFFFFF)
|
||
|
||
**LVGL Implementation:**
|
||
```c
|
||
lv_obj_t *speaking_screen;
|
||
lv_obj_t *output_waveform_canvas; // Same size as input waveform
|
||
lv_obj_t *response_label; // Show part of response text
|
||
lv_obj_t *volume_bar; // lv_bar widget for volume level
|
||
|
||
// Similar drawing to listening state, but fed from speaker buffer
|
||
void draw_output_waveform(lv_obj_t *canvas, int16_t *speaker_samples, size_t count) {
|
||
// Same logic as input waveform, different color
|
||
line_dsc.color = lv_color_hex(0x0080FF);
|
||
// ... draw logic
|
||
}
|
||
```
|
||
|
||
**Touch Controls:**
|
||
- Tap: Skip response (go back to idle)
|
||
- Volume slider: Adjust speaker volume
|
||
|
||
---
|
||
|
||
### Additional UI Elements
|
||
|
||
#### Status Bar (All States)
|
||
**Location:** Top 20 pixels
|
||
**Contents:**
|
||
- WiFi icon + signal strength
|
||
- Battery icon + percentage
|
||
- Time (from RTC)
|
||
- Mute icon (if muted)
|
||
|
||
**LVGL Implementation:**
|
||
```c
|
||
lv_obj_t *status_bar;
|
||
lv_obj_t *wifi_icon;
|
||
lv_obj_t *battery_icon;
|
||
lv_obj_t *time_label;
|
||
lv_obj_t *mute_icon;
|
||
|
||
// Update every second
|
||
void update_status_bar(lv_timer_t *timer) {
|
||
// Update WiFi strength
|
||
int8_t rssi = wifi_get_rssi();
|
||
lv_img_set_src(wifi_icon, get_wifi_icon(rssi));
|
||
|
||
// Update battery
|
||
uint8_t battery_pct = battery_get_percentage();
|
||
lv_img_set_src(battery_icon, get_battery_icon(battery_pct));
|
||
|
||
// Update time from RTC
|
||
rtc_time_t time;
|
||
pcf85063_get_time(&time);
|
||
lv_label_set_text_fmt(time_label, "%02d:%02d", time.hour, time.min);
|
||
}
|
||
|
||
// Create timer for status bar updates
|
||
lv_timer_create(update_status_bar, 1000, NULL);
|
||
```
|
||
|
||
#### Settings Screen (Touch Access)
|
||
**Trigger:** Long-press on idle screen
|
||
**Contents:**
|
||
- Volume slider
|
||
- Brightness slider
|
||
- Wake word sensitivity slider
|
||
- WiFi settings button
|
||
- About/Info button
|
||
|
||
**LVGL Implementation:**
|
||
```c
|
||
lv_obj_t *settings_screen;
|
||
lv_obj_t *volume_slider;
|
||
lv_obj_t *brightness_slider;
|
||
lv_obj_t *sensitivity_slider;
|
||
lv_obj_t *wifi_btn;
|
||
lv_obj_t *about_btn;
|
||
lv_obj_t *back_btn;
|
||
|
||
// Slider event handler
|
||
static void slider_event_cb(lv_event_t *e) {
|
||
lv_obj_t *slider = lv_event_get_target(e);
|
||
int32_t value = lv_slider_get_value(slider);
|
||
|
||
if (slider == volume_slider) {
|
||
set_speaker_volume(value);
|
||
} else if (slider == brightness_slider) {
|
||
set_backlight_brightness(value);
|
||
} else if (slider == sensitivity_slider) {
|
||
set_wake_word_sensitivity(value);
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## Audio Pipeline Integration
|
||
|
||
### I2S Configuration
|
||
|
||
**Microphone (INMP441):**
|
||
```c
|
||
#define I2S_MIC_NUM I2S_NUM_0
|
||
#define I2S_MIC_BCLK_PIN GPIO_NUM_4 // Verify with board schematic
|
||
#define I2S_MIC_WS_PIN GPIO_NUM_5
|
||
#define I2S_MIC_DIN_PIN GPIO_NUM_6
|
||
#define I2S_MIC_SAMPLE_RATE 16000
|
||
#define I2S_MIC_BITS 16
|
||
#define I2S_MIC_CHANNELS 1
|
||
|
||
i2s_config_t i2s_mic_config = {
|
||
.mode = I2S_MODE_MASTER | I2S_MODE_RX,
|
||
.sample_rate = I2S_MIC_SAMPLE_RATE,
|
||
.bits_per_sample = I2S_BITS_PER_SAMPLE_16BIT,
|
||
.channel_format = I2S_CHANNEL_FMT_ONLY_LEFT,
|
||
.communication_format = I2S_COMM_FORMAT_STAND_I2S,
|
||
.intr_alloc_flags = ESP_INTR_FLAG_LEVEL1,
|
||
.dma_buf_count = 8,
|
||
.dma_buf_len = 256,
|
||
.use_apll = false,
|
||
.tx_desc_auto_clear = false,
|
||
.fixed_mclk = 0
|
||
};
|
||
|
||
i2s_pin_config_t i2s_mic_pins = {
|
||
.bck_io_num = I2S_MIC_BCLK_PIN,
|
||
.ws_io_num = I2S_MIC_WS_PIN,
|
||
.data_out_num = I2S_PIN_NO_CHANGE,
|
||
.data_in_num = I2S_MIC_DIN_PIN
|
||
};
|
||
|
||
void audio_init_microphone(void) {
|
||
i2s_driver_install(I2S_MIC_NUM, &i2s_mic_config, 0, NULL);
|
||
i2s_set_pin(I2S_MIC_NUM, &i2s_mic_pins);
|
||
i2s_zero_dma_buffer(I2S_MIC_NUM);
|
||
}
|
||
```
|
||
|
||
**Speaker (MAX98357A I2S Amp):**
|
||
```c
|
||
#define I2S_SPK_NUM I2S_NUM_1
|
||
#define I2S_SPK_BCLK_PIN GPIO_NUM_7 // Verify with board schematic
|
||
#define I2S_SPK_WS_PIN GPIO_NUM_8
|
||
#define I2S_SPK_DOUT_PIN GPIO_NUM_9
|
||
#define I2S_SPK_SAMPLE_RATE 16000
|
||
#define I2S_SPK_BITS 16
|
||
#define I2S_SPK_CHANNELS 1
|
||
|
||
i2s_config_t i2s_spk_config = {
|
||
.mode = I2S_MODE_MASTER | I2S_MODE_TX,
|
||
.sample_rate = I2S_SPK_SAMPLE_RATE,
|
||
.bits_per_sample = I2S_BITS_PER_SAMPLE_16BIT,
|
||
.channel_format = I2S_CHANNEL_FMT_ONLY_LEFT,
|
||
.communication_format = I2S_COMM_FORMAT_STAND_I2S,
|
||
.intr_alloc_flags = ESP_INTR_FLAG_LEVEL1,
|
||
.dma_buf_count = 8,
|
||
.dma_buf_len = 256,
|
||
.use_apll = false,
|
||
.tx_desc_auto_clear = true,
|
||
.fixed_mclk = 0
|
||
};
|
||
|
||
i2s_pin_config_t i2s_spk_pins = {
|
||
.bck_io_num = I2S_SPK_BCLK_PIN,
|
||
.ws_io_num = I2S_SPK_WS_PIN,
|
||
.data_out_num = I2S_SPK_DOUT_PIN,
|
||
.data_in_num = I2S_PIN_NO_CHANGE
|
||
};
|
||
|
||
void audio_init_speaker(void) {
|
||
i2s_driver_install(I2S_SPK_NUM, &i2s_spk_config, 0, NULL);
|
||
i2s_set_pin(I2S_SPK_NUM, &i2s_spk_pins);
|
||
i2s_zero_dma_buffer(I2S_SPK_NUM);
|
||
}
|
||
```
|
||
|
||
### Audio Buffer Management
|
||
|
||
**Circular Buffer for Waveform:**
|
||
```c
|
||
#define AUDIO_BUFFER_SIZE 2048
|
||
#define WAVEFORM_DECIMATION 8 // Downsample for display
|
||
|
||
typedef struct {
|
||
int16_t samples[AUDIO_BUFFER_SIZE];
|
||
uint16_t write_idx;
|
||
uint16_t read_idx;
|
||
SemaphoreHandle_t mutex;
|
||
} audio_buffer_t;
|
||
|
||
audio_buffer_t mic_buffer;
|
||
audio_buffer_t spk_buffer;
|
||
|
||
void audio_buffer_init(audio_buffer_t *buf) {
|
||
memset(buf->samples, 0, sizeof(buf->samples));
|
||
buf->write_idx = 0;
|
||
buf->read_idx = 0;
|
||
buf->mutex = xSemaphoreCreateMutex();
|
||
}
|
||
|
||
void audio_buffer_write(audio_buffer_t *buf, int16_t *samples, size_t count) {
|
||
xSemaphoreTake(buf->mutex, portMAX_DELAY);
|
||
for (size_t i = 0; i < count; i++) {
|
||
buf->samples[buf->write_idx] = samples[i];
|
||
buf->write_idx = (buf->write_idx + 1) % AUDIO_BUFFER_SIZE;
|
||
}
|
||
xSemaphoreGive(buf->mutex);
|
||
}
|
||
|
||
// Get downsampled samples for waveform display
|
||
void audio_buffer_get_waveform(audio_buffer_t *buf, int16_t *out, size_t out_count) {
|
||
xSemaphoreTake(buf->mutex, portMAX_DELAY);
|
||
for (size_t i = 0; i < out_count; i++) {
|
||
size_t src_idx = (buf->write_idx + (i * WAVEFORM_DECIMATION)) % AUDIO_BUFFER_SIZE;
|
||
out[i] = buf->samples[src_idx];
|
||
}
|
||
xSemaphoreGive(buf->mutex);
|
||
}
|
||
```
|
||
|
||
### Audio Streaming Task
|
||
|
||
**Microphone Input Task:**
|
||
```c
|
||
void audio_mic_task(void *pvParameters) {
|
||
int16_t i2s_buffer[256];
|
||
size_t bytes_read;
|
||
|
||
while (1) {
|
||
// Read from I2S microphone
|
||
i2s_read(I2S_MIC_NUM, i2s_buffer, sizeof(i2s_buffer), &bytes_read, portMAX_DELAY);
|
||
size_t samples_read = bytes_read / sizeof(int16_t);
|
||
|
||
if (current_state == STATE_LISTENING) {
|
||
// Write to circular buffer for waveform display
|
||
audio_buffer_write(&mic_buffer, i2s_buffer, samples_read);
|
||
|
||
// Send to Heimdall server via WiFi
|
||
audio_send_to_server(i2s_buffer, samples_read);
|
||
|
||
// Trigger waveform update
|
||
xEventGroupSetBits(ui_event_group, WAVEFORM_UPDATE_BIT);
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
**Speaker Output Task:**
|
||
```c
|
||
void audio_speaker_task(void *pvParameters) {
|
||
int16_t i2s_buffer[256];
|
||
size_t bytes_written;
|
||
|
||
while (1) {
|
||
// Receive audio from Heimdall server
|
||
size_t samples_received = audio_receive_from_server(i2s_buffer, 256);
|
||
|
||
if (samples_received > 0 && current_state == STATE_SPEAKING) {
|
||
// Write to circular buffer for waveform display
|
||
audio_buffer_write(&spk_buffer, i2s_buffer, samples_received);
|
||
|
||
// Play through I2S speaker
|
||
i2s_write(I2S_SPK_NUM, i2s_buffer, samples_received * sizeof(int16_t),
|
||
&bytes_written, portMAX_DELAY);
|
||
|
||
// Trigger waveform update
|
||
xEventGroupSetBits(ui_event_group, WAVEFORM_UPDATE_BIT);
|
||
} else {
|
||
vTaskDelay(pdMS_TO_TICKS(10));
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
### LVGL Update Task
|
||
|
||
**Waveform Rendering Task:**
|
||
```c
|
||
void lvgl_waveform_task(void *pvParameters) {
|
||
int16_t waveform_samples[WAVEFORM_WIDTH];
|
||
|
||
while (1) {
|
||
// Wait for waveform update event
|
||
EventBits_t bits = xEventGroupWaitBits(ui_event_group, WAVEFORM_UPDATE_BIT,
|
||
pdTRUE, pdFALSE, pdMS_TO_TICKS(50));
|
||
|
||
if (bits & WAVEFORM_UPDATE_BIT) {
|
||
if (current_state == STATE_LISTENING) {
|
||
// Get downsampled mic data
|
||
audio_buffer_get_waveform(&mic_buffer, waveform_samples, WAVEFORM_WIDTH);
|
||
|
||
// Draw on LVGL canvas (must lock LVGL)
|
||
lvgl_lock();
|
||
draw_waveform(waveform_canvas, waveform_samples, WAVEFORM_WIDTH);
|
||
lvgl_unlock();
|
||
|
||
} else if (current_state == STATE_SPEAKING) {
|
||
// Get downsampled speaker data
|
||
audio_buffer_get_waveform(&spk_buffer, waveform_samples, WAVEFORM_WIDTH);
|
||
|
||
lvgl_lock();
|
||
draw_output_waveform(output_waveform_canvas, waveform_samples, WAVEFORM_WIDTH);
|
||
lvgl_unlock();
|
||
}
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## Touch Gesture Integration
|
||
|
||
### Touch Controller (CST816D)
|
||
|
||
**Gestures Supported:**
|
||
- Single tap
|
||
- Long press
|
||
- Swipe up/down/left/right
|
||
|
||
**Implementation:**
|
||
```c
|
||
#define TOUCH_I2C_NUM I2C_NUM_0
|
||
#define TOUCH_SDA_PIN GPIO_NUM_6
|
||
#define TOUCH_SCL_PIN GPIO_NUM_7
|
||
#define TOUCH_INT_PIN GPIO_NUM_9
|
||
#define TOUCH_RST_PIN GPIO_NUM_10
|
||
|
||
typedef enum {
|
||
GESTURE_NONE = 0,
|
||
GESTURE_TAP,
|
||
GESTURE_LONG_PRESS,
|
||
GESTURE_SWIPE_UP,
|
||
GESTURE_SWIPE_DOWN,
|
||
GESTURE_SWIPE_LEFT,
|
||
GESTURE_SWIPE_RIGHT
|
||
} touch_gesture_t;
|
||
|
||
void touch_init(void) {
|
||
// I2C init for CST816D
|
||
i2c_config_t conf = {
|
||
.mode = I2C_MODE_MASTER,
|
||
.sda_io_num = TOUCH_SDA_PIN,
|
||
.scl_io_num = TOUCH_SCL_PIN,
|
||
.sda_pullup_en = GPIO_PULLUP_ENABLE,
|
||
.scl_pullup_en = GPIO_PULLUP_ENABLE,
|
||
.master.clk_speed = 100000,
|
||
};
|
||
i2c_param_config(TOUCH_I2C_NUM, &conf);
|
||
i2c_driver_install(TOUCH_I2C_NUM, conf.mode, 0, 0, 0);
|
||
|
||
// Reset touch controller
|
||
gpio_set_direction(TOUCH_RST_PIN, GPIO_MODE_OUTPUT);
|
||
gpio_set_level(TOUCH_RST_PIN, 0);
|
||
vTaskDelay(pdMS_TO_TICKS(10));
|
||
gpio_set_level(TOUCH_RST_PIN, 1);
|
||
vTaskDelay(pdMS_TO_TICKS(50));
|
||
|
||
// Configure interrupt pin
|
||
gpio_set_direction(TOUCH_INT_PIN, GPIO_MODE_INPUT);
|
||
gpio_set_intr_type(TOUCH_INT_PIN, GPIO_INTR_NEGEDGE);
|
||
gpio_install_isr_service(0);
|
||
gpio_isr_handler_add(TOUCH_INT_PIN, touch_isr_handler, NULL);
|
||
}
|
||
|
||
touch_gesture_t touch_read_gesture(void) {
|
||
uint8_t data[8];
|
||
// Read gesture from CST816D register 0x01
|
||
i2c_master_read_from_device(TOUCH_I2C_NUM, CST816D_ADDR, 0x01, data, 8, pdMS_TO_TICKS(100));
|
||
return (touch_gesture_t)data[0];
|
||
}
|
||
```
|
||
|
||
### Gesture Actions by State
|
||
|
||
**IDLE State:**
|
||
- **Tap:** Wake up display (if dimmed)
|
||
- **Long Press:** Open settings screen
|
||
- **Swipe Up:** Show more info (weather, calendar)
|
||
|
||
**LISTENING State:**
|
||
- **Tap:** Cancel listening, return to idle
|
||
- **Swipe Down:** Lower wake word sensitivity
|
||
- **Swipe Up:** Raise wake word sensitivity
|
||
|
||
**SPEAKING State:**
|
||
- **Tap:** Skip response, return to idle
|
||
- **Swipe Left/Right:** Volume down/up
|
||
|
||
**PROCESSING State:**
|
||
- **Tap:** Cancel processing (if possible)
|
||
|
||
---
|
||
|
||
## Network Communication
|
||
|
||
### WiFi Configuration
|
||
|
||
**Connection:**
|
||
```c
|
||
#define WIFI_SSID "YourNetworkName"
|
||
#define WIFI_PASSWORD "YourPassword"
|
||
#define SERVER_URL "http://10.1.10.71:3006"
|
||
|
||
void wifi_init(void) {
|
||
esp_netif_init();
|
||
esp_event_loop_create_default();
|
||
esp_netif_create_default_wifi_sta();
|
||
|
||
wifi_init_config_t cfg = WIFI_INIT_CONFIG_DEFAULT();
|
||
esp_wifi_init(&cfg);
|
||
|
||
wifi_config_t wifi_config = {
|
||
.sta = {
|
||
.ssid = WIFI_SSID,
|
||
.password = WIFI_PASSWORD,
|
||
},
|
||
};
|
||
|
||
esp_wifi_set_mode(WIFI_MODE_STA);
|
||
esp_wifi_set_config(WIFI_IF_STA, &wifi_config);
|
||
esp_wifi_start();
|
||
esp_wifi_connect();
|
||
}
|
||
```
|
||
|
||
### Server Communication Protocol
|
||
|
||
**Endpoints:**
|
||
- `GET /health` - Server health check
|
||
- `POST /audio/stream` - Stream audio to server (multipart)
|
||
- `GET /audio/tts` - Receive TTS audio response
|
||
- `GET /wake-word/status` - Check wake word detection status
|
||
|
||
**Audio Streaming (WebSockets Recommended):**
|
||
```c
|
||
#include "esp_websocket_client.h"
|
||
|
||
esp_websocket_client_handle_t ws_client;
|
||
|
||
void websocket_init(void) {
|
||
esp_websocket_client_config_t ws_cfg = {
|
||
.uri = "ws://10.1.10.71:3006/ws/audio",
|
||
.buffer_size = 2048,
|
||
};
|
||
|
||
ws_client = esp_websocket_client_init(&ws_cfg);
|
||
esp_websocket_register_events(ws_client, WEBSOCKET_EVENT_ANY,
|
||
websocket_event_handler, NULL);
|
||
esp_websocket_client_start(ws_client);
|
||
}
|
||
|
||
void audio_send_to_server(int16_t *samples, size_t count) {
|
||
if (esp_websocket_client_is_connected(ws_client)) {
|
||
esp_websocket_client_send_bin(ws_client, (char*)samples,
|
||
count * sizeof(int16_t), portMAX_DELAY);
|
||
}
|
||
}
|
||
|
||
size_t audio_receive_from_server(int16_t *out_buffer, size_t max_samples) {
|
||
// Receive audio from server (blocking with timeout)
|
||
int len = esp_websocket_client_recv(ws_client, (char*)out_buffer,
|
||
max_samples * sizeof(int16_t), pdMS_TO_TICKS(100));
|
||
return (len > 0) ? (len / sizeof(int16_t)) : 0;
|
||
}
|
||
```
|
||
|
||
**Alternative: HTTP Chunked Transfer (Simpler):**
|
||
```c
|
||
void audio_stream_http(void) {
|
||
esp_http_client_config_t config = {
|
||
.url = "http://10.1.10.71:3006/audio/stream",
|
||
.method = HTTP_METHOD_POST,
|
||
};
|
||
esp_http_client_handle_t client = esp_http_client_init(&config);
|
||
|
||
// Set headers
|
||
esp_http_client_set_header(client, "Content-Type", "audio/pcm");
|
||
esp_http_client_set_header(client, "Transfer-Encoding", "chunked");
|
||
|
||
esp_http_client_open(client, -1); // -1 = chunked mode
|
||
|
||
// Stream audio chunks
|
||
int16_t buffer[256];
|
||
while (current_state == STATE_LISTENING) {
|
||
// Read from mic
|
||
size_t bytes_read;
|
||
i2s_read(I2S_MIC_NUM, buffer, sizeof(buffer), &bytes_read, portMAX_DELAY);
|
||
|
||
// Send to server
|
||
esp_http_client_write(client, (char*)buffer, bytes_read);
|
||
}
|
||
|
||
esp_http_client_close(client);
|
||
esp_http_client_cleanup(client);
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## Power Management
|
||
|
||
### Battery Monitoring
|
||
|
||
**ETA6098 Charging Chip:**
|
||
```c
|
||
#define BATTERY_ADC_CHANNEL ADC1_CHANNEL_0 // GPIO1 (example)
|
||
#define BATTERY_FULL_MV 4200
|
||
#define BATTERY_EMPTY_MV 3300
|
||
|
||
void battery_init(void) {
|
||
adc1_config_width(ADC_WIDTH_BIT_12);
|
||
adc1_config_channel_atten(BATTERY_ADC_CHANNEL, ADC_ATTEN_DB_11);
|
||
}
|
||
|
||
uint8_t battery_get_percentage(void) {
|
||
int adc_reading = adc1_get_raw(BATTERY_ADC_CHANNEL);
|
||
int voltage_mv = esp_adc_cal_raw_to_voltage(adc_reading, &adc_chars);
|
||
|
||
if (voltage_mv >= BATTERY_FULL_MV) return 100;
|
||
if (voltage_mv <= BATTERY_EMPTY_MV) return 0;
|
||
|
||
return ((voltage_mv - BATTERY_EMPTY_MV) * 100) / (BATTERY_FULL_MV - BATTERY_EMPTY_MV);
|
||
}
|
||
|
||
bool battery_is_charging(void) {
|
||
// Check SYS_OUT pin (GPIO36) - high when charging
|
||
gpio_set_direction(GPIO_NUM_36, GPIO_MODE_INPUT);
|
||
return gpio_get_level(GPIO_NUM_36);
|
||
}
|
||
```
|
||
|
||
### Low Power Modes
|
||
|
||
**Deep Sleep When Idle (Optional):**
|
||
```c
|
||
#define IDLE_TIMEOUT_MS 300000 // 5 minutes
|
||
|
||
void enter_deep_sleep(void) {
|
||
// Save state to RTC memory
|
||
RTC_DATA_ATTR static uint32_t boot_count = 0;
|
||
boot_count++;
|
||
|
||
// Configure wake sources
|
||
esp_sleep_enable_ext0_wakeup(TOUCH_INT_PIN, 0); // Wake on touch
|
||
esp_sleep_enable_timer_wakeup(3600 * 1000000ULL); // Wake every hour
|
||
|
||
// Turn off display
|
||
gpio_set_level(LCD_BL_PIN, 0);
|
||
|
||
// Enter deep sleep
|
||
esp_deep_sleep_start();
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## Performance Optimization
|
||
|
||
### LVGL Performance
|
||
|
||
**Buffer Configuration:**
|
||
```c
|
||
#define LVGL_BUFFER_SIZE (240 * 280 * 2) // Full screen buffer
|
||
|
||
static lv_color_t buf_1[LVGL_BUFFER_SIZE / 10]; // 1/10 screen buffer
|
||
static lv_color_t buf_2[LVGL_BUFFER_SIZE / 10]; // Double buffering
|
||
|
||
lv_disp_draw_buf_t draw_buf;
|
||
lv_disp_draw_buf_init(&draw_buf, buf_1, buf_2, LVGL_BUFFER_SIZE / 10);
|
||
```
|
||
|
||
**Task Priority:**
|
||
```c
|
||
#define LVGL_TASK_PRIORITY 5
|
||
#define AUDIO_MIC_TASK_PRIORITY 10 // Higher priority for audio
|
||
#define AUDIO_SPK_TASK_PRIORITY 10
|
||
#define WIFI_TASK_PRIORITY 8
|
||
#define WAVEFORM_TASK_PRIORITY 4 // Lower priority for visuals
|
||
|
||
void app_main(void) {
|
||
// Create tasks with priorities
|
||
xTaskCreatePinnedToCore(lvgl_task, "LVGL", 8192, NULL, LVGL_TASK_PRIORITY, NULL, 1);
|
||
xTaskCreatePinnedToCore(audio_mic_task, "MIC", 4096, NULL, AUDIO_MIC_TASK_PRIORITY, NULL, 0);
|
||
xTaskCreatePinnedToCore(audio_speaker_task, "SPK", 4096, NULL, AUDIO_SPK_TASK_PRIORITY, NULL, 0);
|
||
xTaskCreatePinnedToCore(lvgl_waveform_task, "WAVE", 4096, NULL, WAVEFORM_TASK_PRIORITY, NULL, 1);
|
||
}
|
||
```
|
||
|
||
**Reduce Waveform Update Rate:**
|
||
```c
|
||
// Only update waveform at 30 FPS, not every audio sample
|
||
#define WAVEFORM_UPDATE_MS 33 // ~30 FPS
|
||
|
||
void lvgl_waveform_task(void *pvParameters) {
|
||
TickType_t last_update = xTaskGetTickCount();
|
||
|
||
while (1) {
|
||
TickType_t now = xTaskGetTickCount();
|
||
if ((now - last_update) >= pdMS_TO_TICKS(WAVEFORM_UPDATE_MS)) {
|
||
// Update waveform
|
||
last_update = now;
|
||
}
|
||
vTaskDelay(pdMS_TO_TICKS(10));
|
||
}
|
||
}
|
||
```
|
||
|
||
### Memory Management
|
||
|
||
**PSRAM Usage:**
|
||
```c
|
||
// Allocate large buffers in PSRAM (8MB available)
|
||
#define AUDIO_LARGE_BUFFER_SIZE (16000 * 10) // 10 seconds at 16kHz
|
||
|
||
int16_t *audio_history = heap_caps_malloc(AUDIO_LARGE_BUFFER_SIZE * sizeof(int16_t),
|
||
MALLOC_CAP_SPIRAM);
|
||
|
||
// Check if allocation succeeded
|
||
if (audio_history == NULL) {
|
||
ESP_LOGE(TAG, "Failed to allocate PSRAM buffer");
|
||
}
|
||
```
|
||
|
||
**Heap Monitoring:**
|
||
```c
|
||
void log_memory_stats(void) {
|
||
ESP_LOGI(TAG, "Free heap: %d bytes", esp_get_free_heap_size());
|
||
ESP_LOGI(TAG, "Free PSRAM: %d bytes", heap_caps_get_free_size(MALLOC_CAP_SPIRAM));
|
||
ESP_LOGI(TAG, "Min free heap: %d bytes", esp_get_minimum_free_heap_size());
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## Example Code Structure
|
||
|
||
### File Organization
|
||
|
||
```
|
||
esp32_voice_assistant/
|
||
├── main/
|
||
│ ├── main.c # Entry point, task creation
|
||
│ ├── audio/
|
||
│ │ ├── audio_input.c # I2S microphone handling
|
||
│ │ ├── audio_output.c # I2S speaker handling
|
||
│ │ ├── audio_buffer.c # Circular buffer management
|
||
│ │ └── audio_network.c # WebSocket/HTTP streaming
|
||
│ ├── ui/
|
||
│ │ ├── ui_init.c # LVGL setup, screen creation
|
||
│ │ ├── ui_idle.c # Idle screen UI
|
||
│ │ ├── ui_listening.c # Listening screen + waveform
|
||
│ │ ├── ui_processing.c # Processing screen + spinner
|
||
│ │ ├── ui_speaking.c # Speaking screen + output waveform
|
||
│ │ ├── ui_settings.c # Settings screen
|
||
│ │ └── ui_waveform.c # Waveform drawing functions
|
||
│ ├── touch/
|
||
│ │ ├── touch_cst816d.c # Touch controller driver
|
||
│ │ └── touch_gestures.c # Gesture recognition
|
||
│ ├── network/
|
||
│ │ └── wifi_manager.c # WiFi connection management
|
||
│ ├── power/
|
||
│ │ ├── battery.c # Battery monitoring
|
||
│ │ └── power_mgmt.c # Sleep modes
|
||
│ └── state_machine.c # Voice assistant state machine
|
||
├── components/
|
||
│ └── lvgl/ # LVGL library (ESP-IDF component)
|
||
├── CMakeLists.txt
|
||
└── sdkconfig # ESP-IDF configuration
|
||
```
|
||
|
||
### Main Entry Point
|
||
|
||
```c
|
||
// main/main.c
|
||
#include "freertos/FreeRTOS.h"
|
||
#include "freertos/task.h"
|
||
#include "esp_log.h"
|
||
|
||
static const char *TAG = "VOICE_ASSISTANT";
|
||
|
||
void app_main(void) {
|
||
ESP_LOGI(TAG, "Voice Assistant Starting...");
|
||
|
||
// Initialize hardware
|
||
nvs_flash_init(); // Non-volatile storage
|
||
gpio_install_isr_service(0);// GPIO interrupts
|
||
|
||
// Power management
|
||
battery_init();
|
||
|
||
// Display and touch
|
||
lcd_init();
|
||
touch_init();
|
||
ui_init();
|
||
|
||
// Audio pipeline
|
||
audio_init_microphone();
|
||
audio_init_speaker();
|
||
audio_buffer_init(&mic_buffer);
|
||
audio_buffer_init(&spk_buffer);
|
||
|
||
// Network
|
||
wifi_init();
|
||
websocket_init();
|
||
|
||
// State machine
|
||
state_machine_init();
|
||
|
||
// Create FreeRTOS tasks
|
||
xTaskCreatePinnedToCore(lvgl_task, "LVGL", 8192, NULL, 5, NULL, 1);
|
||
xTaskCreatePinnedToCore(audio_mic_task, "MIC", 4096, NULL, 10, NULL, 0);
|
||
xTaskCreatePinnedToCore(audio_speaker_task, "SPK", 4096, NULL, 10, NULL, 0);
|
||
xTaskCreatePinnedToCore(lvgl_waveform_task, "WAVE", 4096, NULL, 4, NULL, 1);
|
||
xTaskCreatePinnedToCore(state_machine_task, "STATE", 4096, NULL, 7, NULL, 0);
|
||
|
||
ESP_LOGI(TAG, "Voice Assistant Running!");
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## Testing Plan
|
||
|
||
### Phase 1: Hardware Validation
|
||
- [ ] LCD display working (show test pattern)
|
||
- [ ] Touch controller responding (log touch coordinates)
|
||
- [ ] Buzzer working (play test tone)
|
||
- [ ] WiFi connecting (check IP address)
|
||
- [ ] Battery reading (log voltage)
|
||
- [ ] RTC working (log time)
|
||
- [ ] IMU working (log accelerometer values)
|
||
|
||
### Phase 2: Audio Pipeline
|
||
- [ ] I2S microphone reading audio (log levels)
|
||
- [ ] Audio streaming to Heimdall server
|
||
- [ ] I2S speaker playing audio (test tone)
|
||
- [ ] TTS audio playback from server
|
||
- [ ] Audio buffer management (no overflows)
|
||
|
||
### Phase 3: LVGL UI
|
||
- [ ] Idle screen displays correctly
|
||
- [ ] State transitions smooth
|
||
- [ ] Waveform renders at 30 FPS
|
||
- [ ] Touch gestures recognized
|
||
- [ ] Settings screen functional
|
||
- [ ] Status bar updates correctly
|
||
|
||
### Phase 4: Integration
|
||
- [ ] Wake word detection triggers listening state
|
||
- [ ] Waveform shows mic input in real-time
|
||
- [ ] Processing state shows after speech ends
|
||
- [ ] TTS response plays with output waveform
|
||
- [ ] Touch cancel works in all states
|
||
- [ ] Battery indicator accurate
|
||
|
||
### Phase 5: Optimization
|
||
- [ ] Memory usage stable (no leaks)
|
||
- [ ] CPU usage acceptable (<80% average)
|
||
- [ ] WiFi latency <100ms
|
||
- [ ] Audio latency <200ms end-to-end
|
||
- [ ] Display framerate stable (30 FPS)
|
||
- [ ] Battery life >4 hours continuous
|
||
|
||
---
|
||
|
||
## Bill of Materials (BOM)
|
||
|
||
| Component | Part Number | Quantity | Unit Price | Total |
|
||
|-----------|-------------|----------|------------|-------|
|
||
| ESP32-S3-Touch-LCD-1.69 | Waveshare | 1 | $12.00 | $12.00 |
|
||
| I2S MEMS Microphone | INMP441 | 1 | $3.50 | $3.50 |
|
||
| I2S Amplifier | MAX98357A | 1 | $3.50 | $3.50 |
|
||
| Speaker (3W 8Ω) | Generic | 1 | $5.00 | $5.00 |
|
||
| LiPo Battery (1000mAh) | 503040 JST 1.25 | 1 | $7.00 | $7.00 |
|
||
| MicroSD Card (8GB) | SanDisk | 1 | $5.00 | $5.00 |
|
||
| Breadboard + Wires | Generic | 1 | $5.00 | $5.00 |
|
||
| **Total** | | | | **$41.00** |
|
||
|
||
**Optional:**
|
||
- Enclosure/Case (3D printed or project box): $5-10
|
||
- Backup battery: $7
|
||
- USB-C cable: $3
|
||
|
||
**Grand Total with Options:** ~$56-63
|
||
|
||
---
|
||
|
||
## References & Resources
|
||
|
||
### LVGL Audio Visualization Examples
|
||
- **Music Player with FFT Spectrum** - [Instructables Guide](https://www.instructables.com/Design-Music-Player-UI-With-LVGL/)
|
||
- Source: https://github.com/moononournation/LVGL_Music_Player.git
|
||
- Shows FFT-based audio visualization on LVGL canvas
|
||
|
||
- **LVGL Audio FFT Spectrum (Xiao S3)** - [GitHub: genvex/LVGL_Audio_FFT_Spectrum_xiaoS3_oled](https://github.com/genvex/LVGL_Audio_FFT_Spectrum_xiaoS3_oled)
|
||
- Real-time FFT visualization using low-level LVGL drawing
|
||
|
||
- **LVGL Audio FFT Spectrum** - [GitHub: imliubo/LVGL_Audio_FFT_Spectrum](https://github.com/imliubo/LVGL_Audio_FFT_Spectrum)
|
||
- Alternative FFT spectrum implementation
|
||
|
||
- **Moving Waveform Discussion** - [LVGL Forum Thread](https://forum.lvgl.io/t/best-method-to-display-a-moving-waveform/17361)
|
||
- Tips on efficiently displaying moving waveforms
|
||
|
||
### ESP32-S3 Resources
|
||
- **Waveshare Wiki** - https://www.waveshare.com/wiki/ESP32-S3-LCD-1.69
|
||
- **LVGL ESP32 Port** - [GitHub: lvgl/lv_port_esp32](https://github.com/lvgl/lv_port_esp32)
|
||
- **ESP-IDF Documentation** - https://docs.espressif.com/projects/esp-idf/en/latest/
|
||
|
||
### Voice Assistant Project
|
||
- **Mycroft Precise Documentation** - https://github.com/MycroftAI/mycroft-precise
|
||
- **Whisper OpenAI** - https://github.com/openai/whisper
|
||
- **Piper TTS** - https://github.com/rhasspy/piper
|
||
|
||
---
|
||
|
||
## Next Steps
|
||
|
||
1. **Order Hardware** - ESP32-S3-Touch-LCD + audio components (~$41)
|
||
2. **Setup ESP-IDF** - Install ESP-IDF v5.3.1+ on development machine
|
||
3. **Clone Examples** - Get LVGL audio visualization examples for reference
|
||
4. **Start Simple** - Begin with LCD + LVGL test (no audio)
|
||
5. **Add Audio** - Wire I2S mic, test audio streaming
|
||
6. **Waveform MVP** - Get basic waveform rendering working
|
||
7. **Full Integration** - Connect to Heimdall voice server
|
||
8. **Polish** - Add touch controls, settings, battery support
|
||
|
||
---
|
||
|
||
**Version:** 1.0
|
||
**Created:** 2026-01-01
|
||
**Status:** Specification Complete, Ready for Implementation
|
||
|