使用 StreamAssist 的Bug 实现唤醒后连续对话

itispip · 发表于 2024-7-7 19:27:54

本帖最后由 itispip 于 2024-7-7 19:29 编辑

大神AlexiiT的StreamAssist可以把任何支持 rtsp 流的摄像头变成支持 Home Asssit语音助手的Mic。

如果你不需要使用提示音 (就是你叫唤醒词，它 Beep 一下表示准备好的），这个插件很好用。但它也有一个Bug，一旦你让它Beep一下，那一声Beep就会打断对话

AlexiiT已经知道这个Bug，只是他说他太忙，没时间修复。

可是意外的是：利用这个Bug，你可以写一段Automation，接管被打断的对话，从而实现一次唤醒，连续对话。惊不惊喜，意不意外？这可是HA官方都没放开的能力！是小爱同学官方只配放给400多的带屏幕高端设备的才有的能力啊！一个Bug你就能拥有啊！

代码如下，你所要做的就是：

1. 使用大语言模型做对话Agent
2. 给大模型的提示词里，要求它所有的对话都以"?"结尾，除非用户表示不再需要它了 (我的代码里以conversation intent返回的文本以 "?"结尾为继续对话的条件，你也可以修改为永远不结束对话，一直到你说某一个关键词，例如“退下”，“滚开”，“去死”之类的)
3. 在StreamAssist里面配置一个 stt media start 提示音

把Bug激活起来
4. 在设置里新建一个名为 continue_conversation的Boolean类型的helper (如果你取其它名字，请自行修改代码里对应的名字)。这个是用来标识连续对话的 Repeat... Until 什么时候退出的。我试过用variable来记录状态，发现无论varialbe怎么变，Until好像不重新评估 template 的值，不得已用 Helper。哪位如果知道我哪里搞错，造成 Repeat ... Until 不能根据variable值的变化而终止，请告知。
5. 把以下的代码保存为一个script
6. 写个automation监听正常的conversation是否被成功打断，一旦发现被成功打断，就调用这段script
7. 红字的部分必须修改成你的环境，黄色的部分如果你不想改以上的第2，4点，就不用改。

连续对话要想体验好有一个关键：Media Player必须要能在tts语音播放完成后，及时返回 idle状态，如果一直不返回状态，你就要尴尬的等好久了！请自行测试自己的Media Player是否反应灵敏，测试的方式是找个时长1秒的音频文件(例如上面的那一声Beep）让它播放，看它的图标是不是在播放完后立刻由实心的变成空心的。

alias: AI ask questions
sequence:
  - alias: Ask me a question
    service: stream_assist.run
    data:
      player_entity_id: <span style="background-color: red;"><font color="#ffffff">media_player.kodi</font></span><span style="color: rgb(255, 255, 255); background-color: rgb(255, 0, 0);"> </span><span style="color: rgb(255, 255, 255); background-color: rgb(255, 0, 0);">## 修改成你自己的播放器</span>
      assist:
        start_stage: intent
        end_stage: tts
        intent_input: ask me "怎么了?"
        conversation_id: 01HV8SNMZXEHKJYYR38GD2Y27G
    response_variable: cv_response
  - alias: Loop listen & execution Until Continue_Conversation is on
    repeat:
      sequence:
        - alias: Sequence to flag for continue conversation or not
          sequence:
            - variables:
                resp_type: >-
                  {{
                  cv_response['intent-end'].data.intent_output.response.response_type
                  }}
                resp_last_input_speech: "{{ cv_response['intent-start'].data.intent_input }}"
                resp_last_output_speech: >-
                  {{
                  cv_response['intent-end'].data.intent_output.response.speech.plain.speech
                  }}
<span style="background-color: yellow;">                resp_last_char: "{{ resp_last_output_speech[-4:] }}"</span>
              alias: >-
                Get resp_type, initialize resp_last_speeches (input & output),
                resp_last_char and continuous flag
- if:
                - condition: template
                  value_template: <span style="background-color: yellow;">"{{ ("?" in resp_last_char) or ("？" in resp_last_char) }}"</span>
                  alias: Test is last speech ended with a "?" marked
              then:
                - service: input_boolean.turn_on
                  metadata: {}
                  data: {}
                  target:
                    entity_id: <span style="background-color: yellow;">input_boolean.continue_conversation</span>
              else:
                - service: input_boolean.turn_off
                  metadata: {}
                  data: {}
                  target:
                    entity_id: <span style="background-color: yellow;">input_boolean.continue_conversation</span>
              enabled: true
              alias: Decison to set continue flag "ON" or "OFF" ?
        - alias: Listen to user if continue flag "ON"
          if:
            - condition: state
              entity_id: <span style="background-color: yellow;">input_boolean.continue_conversation</span>
              state: "on"
          then:
            - wait_for_trigger:
                - platform: state
                  entity_id:
<font style="background-color: red;" color="#ffffff">                    - media_player.kodi</font><span style="color: rgb(255, 255, 255); background-color: rgb(255, 0, 0);"> ## 修改成你自己的播放器</span>
                  to: idle
                  enabled: true
              timeout:
                hours: 0
                minutes: 1
                seconds: 0
                milliseconds: 0
              enabled: true
              continue_on_timeout: true
            - alias: Listen and execute user speech
              service: stream_assist.run
              data:
                stream_source: <span style="background-color: red;"><font color="#ffffff">rtsp://192.168.2.34/unicast ## 修改成你自己的rtsp源</font></span>
                player_entity_id: media_player.kodi
                assist:
                  start_stage: stt
                  end_stage: tts
                  conversation_id: 01HV8SNMZXEHKJYYR38GD2Y27G
                pipeline_id: 01gzv2xe278epc0svn0cgm3p3p
              enabled: true
              response_variable: cv_response
      until:
        - condition: state
          entity_id: <span style="background-color: yellow;">input_boolean.continue_conversation</span>
          state: "off"
          enabled: true
mode: single
description: ""
fields: {}

隔壁的王叔叔 · 发表于 2024-7-7 21:30:39

看起来好高级的样子，我也试试这个，感谢分享

motoyu · 发表于 2024-7-7 23:14:59

前一段时间一直折腾这个，支持下

DDDear · 发表于 2024-7-8 08:37:11

用摄像头还是IPweb Camera接入的？

itispip · 发表于 2024-7-8 23:17:29

DDDear 发表于 2024-7-8 08:37
用摄像头还是IPweb Camera接入的？

支持rtsp的任何东西

		自动登录	找回密码
密码			立即注册

[经验分享] 使用 StreamAssist 的Bug 实现唤醒后连续对话

评分