智能体自定义控制

2026-06-29

智能体自定义控制允许客户端通过 ZEGO Express SDK 房间信令直接发起智能体实例控制请求，无需依赖服务端 API 调用。

通过该功能，开发者可以在客户端侧实现以下控制能力：

主动调用 TTS：让智能体用语音朗读一段文本，适合播报固定话术（如欢迎语、提醒等）。
主动调用 LLM：给智能体一段文本，让其基于 LLM 推理并回复，适合需要结合上下文的主动发言场景。
打断智能体：立即打断智能体当前正在进行的 TTS/LLM 流程。
控制聆听：在对讲机模式下，控制智能体开始或停止聆听指定用户。

典型使用场景包括：

播报欢迎语、超时提醒等固定话术。
冷场时主动发起话题，提升交互体验。
触发关键节点时打断智能体并立刻播报重要内容。

适用版本：v2.13.0 及以上。

前提条件

已参考快速开始完成基本流程。

使用方式

方式一：通过开源组件实现（推荐）

使用 ZEGO AI Agent Action 客户端套件，可快速集成智能体自定义控制能力。

该套件支持 Web、Android、iOS Swift、iOS Objective-C、Flutter 五大平台，对外统一暴露以下方法：

sendAgentInstanceTTS
sendAgentInstanceLLM
interruptAgentInstance
startListening
stopListening

集成步骤

将目标平台下的 agentaction/ 目录拷贝到自己的项目中。
创建 Client 实例。
在 Sender 中调用 Express callExperimentalAPI，将套件生成的 formatedJson 透传入。
在 onRecvExperimentalAPI 回调中，将响应内容转交给套件的 handleRoomChannelMessage 方法。

引入套件

将目标平台下的 agentaction/ 目录拷贝到自己的项目中。各平台目录结构如下：

Web：agentaction/web/
Android：agentaction/android/
iOS Swift：agentaction/ios_swift/
iOS Objective-C：agentaction/ios_oc/
Flutter：agentaction/flutter/

创建 Client 实例

在应用初始化时创建套件的 Client 实例。

Web

import ZegoAIAgentAction from './agentaction';

// zg 为已创建的 ZegoExpressEngine 实例
const client = new ZegoAIAgentAction.ZegoAIAgentActionClient({
  roomId: 'room_001',
  userId: 'client_A',
  agentUserId: '<AgentUserId>',
  // sender：把套件组装好的 formatedJson 透传给 Express
  sender: async (params, formatedJson) => {
    try {
      await zg.callExperimentalAPI(formatedJson);
      return { errorCode: ZegoAIAgentAction.ErrorCodes.SUCCESS, seq: params.seq };
    } catch (e) {
      return { errorCode: ZegoAIAgentAction.ErrorCodes.SEND_FAILED, seq: params.seq };
    }
  },
  onResponse: (response) => console.log('recv', response.action, response.code),
  onError: (error) => console.error('error', error.action, error.code)
});

import ZegoAIAgentAction from './agentaction';

// zg 为已创建的 ZegoExpressEngine 实例
const client = new ZegoAIAgentAction.ZegoAIAgentActionClient({
  roomId: 'room_001',
  userId: 'client_A',
  agentUserId: '<AgentUserId>',
  // sender：把套件组装好的 formatedJson 透传给 Express
  sender: async (params, formatedJson) => {
    try {
      await zg.callExperimentalAPI(formatedJson);
      return { errorCode: ZegoAIAgentAction.ErrorCodes.SUCCESS, seq: params.seq };
    } catch (e) {
      return { errorCode: ZegoAIAgentAction.ErrorCodes.SEND_FAILED, seq: params.seq };
    }
  },
  onResponse: (response) => console.log('recv', response.action, response.code),
  onError: (error) => console.error('error', error.action, error.code)
});

各端构造器参数说明如下。Android / OC 因语言不支持默认参数，需显式传入全部参数（普通智能体场景下，数字人相关参数传空值即可）。

参数名	描述	示例值
roomId	业务房间 ID，与 ZEGO 音视频房间 ID 一致。	`room_001`
agentUserId	目标智能体 userID。普通智能体取 `rtcInfo.agentUserId`；数字人场景由套件按 `ai_agent_<agentInstanceId>` 自动计算，此参数被忽略。	`<AgentUserId>`
userId	当前终端用户 ID，用于生成本地链路追踪 Seq。	`client_A`
agentInstanceId	智能体实例 ID。数字人场景必传；普通场景 Android / OC 必填（传 `null` / `nil`），Swift / Flutter 可省略（默认 nil），Web 无此参数。	`null`（数字人传实例 ID）
isDigitalHuman	是否数字人通话，决定是否走 `ai_agent_<instanceId>` 拼接规则。普通场景传 `false` / `NO` 或省略。	`false`（数字人传 `true`）
deviceId	设备 ID，用于构造 Seq；传空由套件自动生成。	`null`（传空自动生成）
timeoutMs	单次请求默认超时（ms），默认 5000。	`5000`
sender	业务实现的发送回调，把套件组装好的 `formatedJson` 透传给 Express `callExperimentalAPI`。	无固定取值，见上方各端 sender 实现
onResponse	全局响应回调，每次收到 PaaS 响应时触发（可选）。	无固定取值，见上方各端 onResponse 实现
onError	全局错误回调，发送失败 / 超时 / 主动取消时触发（可选）。	无固定取值，见上方各端 onError 实现

发送控制请求

按需调用对应的接口以发送控制请求（以 sendAgentInstanceTTS 为例）：

Web

// 主动调用 TTS 示例
const tts = new ZegoAIAgentAction.Protobuf.SendAgentInstanceTTSParams();
tts.setText('你好呀，欢迎使用 ZEGO AI Agent 服务');
tts.setAddHistory(true);
tts.setPriority('Medium');
tts.setSamePriorityOption('ClearAndInterrupt');
client.sendAgentInstanceTTS(tts).then(r => console.log('TTS ok', r.seq));

// 主动调用 LLM 示例
const llm = new ZegoAIAgentAction.Protobuf.SendAgentInstanceLLMParams();
llm.setText('现在用户一段时间没有说话了，请主动说一句话');
client.sendAgentInstanceLLM(llm).then(r => console.log('LLM ok', r.seq));

// 打断智能体示例
client.interruptAgentInstance().then(r => console.log('Interrupt ok', r.seq));

// 主动调用 TTS 示例
const tts = new ZegoAIAgentAction.Protobuf.SendAgentInstanceTTSParams();
tts.setText('你好呀，欢迎使用 ZEGO AI Agent 服务');
tts.setAddHistory(true);
tts.setPriority('Medium');
tts.setSamePriorityOption('ClearAndInterrupt');
client.sendAgentInstanceTTS(tts).then(r => console.log('TTS ok', r.seq));

// 主动调用 LLM 示例
const llm = new ZegoAIAgentAction.Protobuf.SendAgentInstanceLLMParams();
llm.setText('现在用户一段时间没有说话了，请主动说一句话');
client.sendAgentInstanceLLM(llm).then(r => console.log('LLM ok', r.seq));

// 打断智能体示例
client.interruptAgentInstance().then(r => console.log('Interrupt ok', r.seq));

注意

其中 <AgentUserId> 为占位值，实际取值因场景而异，详见下方「AgentUserId 说明」章节。

接收响应

将 ZEGO Express SDK 的 onRecvExperimentalAPI 回调中的数据直接透传给给套件处理。

Web

// 监听 Express 实验性 API 回调，把响应内容转交给套件解析
zg.on('recvExperimentalAPI', (payload) => {
    client.handleRoomChannelMessage(payload);
});

// 部分 Web Express 版本需显式开启房间通道消息接收
zg.callExperimentalAPI({ method: 'onRecvRoomChannelMessage', params: {} });

// 监听 Express 实验性 API 回调，把响应内容转交给套件解析
zg.on('recvExperimentalAPI', (payload) => {
    client.handleRoomChannelMessage(payload);
});

// 部分 Web Express 版本需显式开启房间通道消息接收
zg.callExperimentalAPI({ method: 'onRecvRoomChannelMessage', params: {} });

详细集成方式请参考各平台 Demo：Web | Android | iOS Swift | iOS Objective-C | Flutter。

方式二：自定义实现

如果不想使用开源套件，也可以自行实现智能体控制。其原理是：客户端通过 ZEGO Express SDK 的 callExperimentalAPI 发送房间通道消息（msg_type=20）到服务端，服务端处理后通过房间通道消息（msg_type=22）返回响应，客户端通过 onRecvExperimentalAPI 接收响应。

调用 callExperimentalAPI 接口发送指令

以下展示各平台使用 ZEGO Express SDK 发送控制指令。完整可运行的示例代码请参考 ai_agent_quick_start (agentaction)。

使用 Express SDK 的 callExperimentalAPI 发送房间通道消息（msg_type=20），各平台调用方式如下：

params.user_list 参数

Android、iOS和 Flutter 端的 user_list，Web 端的 toUserIDList，均为目标智能体 userID（只传一个）。

注意

数字人通话场景下，传递给 user_list（toUserIDList）的目标智能体 userID 与普通 1V1 智能体场景不同，客户端必须按以下规则传递：

场景	目标智能体 userID 取值
普通智能体（通过创建智能体实例创建）	与接口请求参数的 `RTC.AgentUserId` 字段的值相同。比如 `agent_123456`
数字人智能体（通过创建数字人 Agent 实例创建）	按照`ai_agent_<AgentInstanceId>`模板生成。AgentInstanceId就是响应参数中的 Data.AgentInstanceId。比如 `ai_agent_1912122918452641792`

params.msg_content 参数

其中 msg_content 为 JSON 字符串，包含以下字段：

字段	类型	描述
Action	String	操作类型，如 `SendAgentInstanceTTS`、`SendAgentInstanceLLM`、`InterruptAgentInstance`、`StartListening`、`StopListening`。请参考 Action
Seq	String	业务链路追踪标识，由客户端生成，需保证同房间多用户/多端场景下唯一。推荐格式 `user_id:device_id:local_seq`。在 onRecvExperimentalAPI 将原样返回。
Params	Object	各 Action 对应的参数。请参考 Params

Action

callExperimentalAPI 接口的 msg_content 参数中，JSON 字符串的 Params 字段如下。客户端通过房间信令支持以下 5 类控制能力：

Action	能力描述	对应服务端接口
`SendAgentInstanceTTS`	主动调用 TTS，让智能体朗读一段文本。	自定义调用 TTS
`SendAgentInstanceLLM`	主动调用 LLM，让智能体基于文本推理并回复。	自定义调用 LLM
`InterruptAgentInstance`	打断智能体，停止当前 TTS/LLM 流程。	打断智能体实例
`StartListening`	智能体开始聆听指定用户。	智能体开始聆听
`StopListening`	智能体结束聆听指定用户。	智能体结束聆听

Params

callExperimentalAPI 接口的 msg_content 参数中，JSON 字符串的 Params 字段每个 Action 都会对应不同的选项。以下每个 Tab 对应不同的 Action 的 Params 说明：

监听 onRecvExperimentalAPI 接收响应

服务端处理完成后，通过房间通道消息（msg_type=22）下发响应，客户端在 ZEGO Express SDK 的 onRecvExperimentalAPI 回调中接收。回调内容为 JSON 字符串（原生端）或对象（Web 端），其中 msg_content/msgContent 本身又是一层 JSON 字符串，需二次解析才能得到响应体，最后通过响应体中的 Seq 关联回原始请求。

注意事项

Web 端与原生端协议差异

差异点	Android / iOS 原生端	Web 端
发送方法名	`liveroom.room.send_room_channel_message`	`sendRoomChannelMessage`
接收回调方法名	`liveroom.room.on_recive_room_channel_message`	`onRecvRoomChannelMessage`
房间 ID 字段	`room_id`	`roomID`
接收方列表字段	`user_list`	`toUserIDList`

链路追踪与重发去重

Seq 由客户端按 user_id:device_id:local_seq 格式生成。
客户端按 Seq 维护请求表，对同一 Seq 的多个响应只处理第一个有效响应。
业务幂等由服务端处理，客户端无需关心。