esp32 搓语音助手，好像跟官方的assist voice模式差不多，

hehe.1536 · 发表于 2025-2-20 23:01:51

本帖最后由 hehe.1536 于 2025-2-18 18:56 编辑

  本人的环境，stt用的之前论坛里大佬分享的，跑在d525的openwrt的docker里，tts用的esge tts，对话代理用的是论坛里大佬的火天大有，home assistant第一代理，质谱轻言第二代理，

  用的硬件是esp32 s3 n16r8 ,麦克风INMP441，功放MAX98357A，喇叭随便找的，

大概原理是用开源的micro_wake_word做语音唤醒，唤醒后把录到的音上传给HA 的 conversation.process ，ha根据你的语音助手的设置开始处理，跟在手机上用ha的语音助手差不多了

  唤醒精度还行，灵敏度也够用，唤醒距离吧看使用环境，无噪音情况下在一个卧室里基本没问题，简单的指令走homeassistant的时候延迟大概五秒左右，走质谱轻言的话得十秒左右，有时会快，有时候会偏慢，感觉跟我跑stt的软路由性能有关系，手头设备好应该会快不少。edge tts好像也挺拉的

指示灯啊各种使用细节的体验也就半成品，本人水平不够希望有大佬看到能给优化优化，要是有更成熟的方案也求指个路，找了好久都没找到，

【演示视频链接】 https://www.bilibili.com/video/B ... d9ec7ad304739133d75

esphome:
  name: esp32-voice-assist
  friendly_name: ESP32 Voice Assist
  platformio_options:
board_build.flash_mode: dio

esp32:
  board: esp32-s3-devkitc-1
  framework:
type: esp-idf

# Enable logging
logger:

ota:
  - platform: esphome
password: "xxxx"

wifi:
  ssid: "xxxxxx"    ## wifi账号;
  password: "xxxxx"  ## wifi密码;

captive_portal:

web_server:
  port: 80

# 启用 API（用于与 Home Assistant 交互）
api:
  encryption:
key: "xxxxxxx"
  on_client_connected:
then:
   - delay: 50ms
   - light.turn_off: led_status
   - micro_wake_word.start
  on_client_disconnected:
then:
   - voice_assistant.stop

# LED 状态灯（可选，用于显示状态）
light:
  - platform: esp32_rmt_led_strip
id: led_status
pin: GPIO48
num_leds: 1
rgb_order: GRB
rmt_channel: 0
chipset: ws2812
name: "状态灯"
effects:
   - pulse:
      name: "Fast Pulse"
      transition_length: 0.5s
      update_interval: 0.5s
      min_brightness: 0%
      max_brightness: 100%

# I2S 音频输入和输出配置
i2s_audio:
  - id: i2s_in
i2s_lrclk_pin: GPIO5 # INMP441的左/右时钟信号
i2s_bclk_pin: GPIO6    # INMP441的比特时钟信号
  - id: i2s_out
i2s_lrclk_pin: GPIO11 # MAX98357A的左/右时钟信号
i2s_bclk_pin: GPIO12 # MAX98357A的比特时钟信号

# 麦克风配置（使用 I2S 数字麦克风，比如 INMP441）
microphone:
  - platform: i2s_audio
id: mic
adc_type: external
i2s_din_pin: GPIO4 # INMP441的数据输入信号
channel: left       # 使用左通道
pdm: false          # 不使用脉冲密度调制（PDM）
i2s_audio_id: i2s_in  # 使用 i2s_in 配置
bits_per_sample: 32bit

# 扬声器配置（连接 MAX98357A）
speaker:
  platform: i2s_audio
  id: spk1
  i2s_audio_id: i2s_out
  dac_type: external
  i2s_dout_pin: GPIO13 # MAX98357A的数据输入信号
  channel: mono       # 使用单声道输出

# 语音唤醒配置，使用官方提供的预训练模型（此处用的是 okay_nabu 模型）
micro_wake_word:
  models:
- model: "https://raw.githubusercontent.com/esphome/micro-wake-word-models/main/models/v2/okay_nabu.json"
  on_wake_word_detected:
- voice_assistant.start
- light.turn_on:
      id: led_status
      red: 30%
      green: 30%
      blue: 70%
      brightness: 60%
      effect: Fast Pulse

# 语音助手配置：通过麦克风采集音频，经 STT 转换后，将识别的文本传递给 HA 的对话系统
voice_assistant:
  id: va
  microphone: mic
  speaker: spk1
  noise_suppression_level: 2.0
  volume_multiplier: 4.0
  # 当 STT 结束后，将识别到的文本提交给 HA 的 conversation.process 服务
  on_stt_end:
then:
   - homeassistant.service:
      service: conversation.process
      data:
         text: !lambda |-
            return x.c_str();
  on_error:
then:
   - micro_wake_word.start
  on_end:
then:
   - light.turn_off: led_status
   - wait_until:
      condition:
         - lambda: 'return !id(va).is_running();'
   - micro_wake_word.start

esphome编译的时候需要全局网络，我整的时候光电脑挂全局还不行，最后是在软路由上挂全局才完成的。

1. INMP441 麦克风接线

VCC（INMP441） —— 接 ESP32 的 3.3V 电源
GND（INMP441） —— 接 ESP32 的 GND
WS/LRCLK（INMP441） —— 接 ESP32 GPIO5
BCLK（INMP441） —— 接 ESP32 GPIO6
SD（INMP441） —— 接 ESP32 GPIO4
请注意：我这个麦克风是从以前做的小智ai上拆下来的，在那个项目里是要求麦克风的gnd和L/R短接的，所以我用的是短接的，用着正常，大家试的时候如果不行短接试试。

2. MAX98357A 扬声器接线

VIN（MAX98357A） —— 接 3.3V（或模块规定的供电电压）
GND（MAX98357A） —— 接 ESP32 的 GND
LRC/LRCLK（MAX98357A） —— 接 ESP32 GPIO11
BCLK（MAX98357A） —— 接 ESP32 GPIO12
DIN（MAX98357A） —— 接 ESP32 GPIO13

输出接口注意喇叭正负极

hehe.1536 · 发表于 2025-3-19 09:18:12

a_dongde 发表于 2025-3-18 10:03
我用的WEB页面ADD加载项编译的，AI让我把
esp32:
board: esp32-s3-devkitc-1

邪乎，框架确实是espidf啊，下载模型报错这个文章里有说，是网络问题，就看你有没有条件了

a_dongde · 发表于 2025-3-18 10:03:03

本帖最后由 a_dongde 于 2025-3-18 23:52 编辑

hehe.1536 发表于 2025-3-18 09:01
你要是电脑编译的话报这种错我推荐你直接删除demo文件重新开始，很容易依赖冲突 ...

我用的WEB页面ADD加载项编译的，AI让我把
esp32:
  board: esp32-s3-devkitc-1
  framework:
type: esp-idf  #这个字段切换到Arduino 框架
就不报错了，但是后面又报唤醒模型的链接错误：

INFO ESPHome 2025.2.2
INFO Reading configuration /config/xiaoyi.yaml...
Failed config

micro_wake_word: [source /config/xiaoyi.yaml:91]
  models:
-
   Not a valid model name, local path, http(s) url, or github shorthand.
   model: |-
      https://raw.githubusercontent.co ... s/v2/okay_nabu.json
  on_wake_word_detected:
- voice_assistant.start
- light.turn_on:
      id: led_status
      red: 30%
      green: 30%
      blue: 70%
      brightness: 60%
      effect: Fast Pulse

hehe.1536 · 发表于 2025-3-18 09:01:04

a_dongde 发表于 2025-3-18 08:46
编译报错：INFO ESPHome 2025.2.2
INFO Reading configuration /config/esphome/xiaoyi.yaml...
Failed con ...

你要是电脑编译的话报这种错我推荐你直接删除demo文件重新开始，很容易依赖冲突

hehe.1536 · 发表于 2025-3-18 08:58:54

你要是电脑编译的话报这种错我推荐你直接删除demo文件重新开始，很容易依赖冲突

a_dongde · 发表于 2025-3-18 08:46:34

编译报错：INFO ESPHome 2025.2.2
INFO Reading configuration /config/esphome/xiaoyi.yaml...
Failed config

light.esp32_rmt_led_strip: [source /config/esphome/xiaoyi.yaml:43]
  platform: esp32_rmt_led_strip
  id: led_status
  pin: GPIO48
  num_leds: 1
  rgb_order: GRB

  This feature is not available for the IDF framework version 5.
  rmt_channel: 0
  chipset: ws2812
  name: 状态灯
  effects:
- pulse:
      name: Fast Pulse
      transition_length: 0.5s

hehe.1536 · 发表于 2025-3-14 10:55:28

vanniuner 发表于 2025-3-12 09:18
最大的优势难道不是 HAV可以本地运行，小爱音箱必须联网（云端AI）吗？

Can I run this voice assistant f ...

纯本地我感觉目前问题在没有一个针对智能家居特调的小模型

hehe.1536 · 发表于 2025-3-14 10:53:31

a_dongde 发表于 2025-3-12 10:33
我搞了个ReSpeaker 开发板，，是不是更好些？

目前来看瓶颈在软件，esp32和那个麦克风性能还是够使的

a_dongde · 发表于 2025-3-12 10:33:31

我搞了个ReSpeaker 开发板，，是不是更好些？

vanniuner · 发表于 2025-3-12 09:18:02

最大的优势难道不是 HAV可以本地运行，小爱音箱必须联网（云端AI）吗？

Can I run this voice assistant fully locally?
Yes, provided your language is supported and you have hardware powerful enough to run local text-to-speech and speech-to-text models at a speed that is acceptable to you. Speech-to-text is the main limiting factor for many languages to run locally, as it has mixed results and often requires powerful hardware.

We recommend using at least an Intel N100 or equivalent processor; this will allow you to use OpenAI’s Whisper Base model for speech-to-text locally. This model runs reasonably fast for languages that have large public datasets to train on, such as English and Spanish. However, for languages with less data available, you will need Whisper’s Small or Large models that require significantly more powerful hardware to run. For some languages, no public datasets exist yet for local models to be trained on by OpenAI, and until they exist and they train models, you will not be able to run those languages fully locally.

		自动登录	找回密码
密码			立即注册

esp32 搓语音助手，好像跟官方的assist voice模式差不多，

浏览过的版块