Round 机制与回调追踪

2026-05-12

功能简介

Round 是 AI Agent 交互链路的唯一标识符。 每一次用户说话或 API 调用，服务端都会生成一个 Round 值，后续所有相关回调（ASR 识别、LLM 回复、TTS 播报、状态变化、打断事件等）都携带这个 Round 值，让业务方能准确追踪"这次交互发生了什么"。

适用场景： AI 陪聊、语聊房、数字人、智能客服等需要追踪完整对话链路的场景。无论是用户主动发言、还是业务方主动触发 AI 回复，都能通过 Round 串联所有回调，轻松处理打断、排队等复杂情况。

核心概念：Round 是什么

Round 的定义

Round（轮次） 是服务端为每一次完整交互生成的升序序号，不会重复。

一次交互 = 从"触发"到"结束"的完整链路
触发来源：用户说话、调用 SendAgentInstanceLLM、调用 SendAgentInstanceTTS
结束标志：AI 完成本轮回复（TTS 播报完成或 LLM 返回完成）

Round 的作用

Round 贯穿整个交互生命周期，所有回调事件都携带 Round 字段：

触发 → Round N 开始
  ├─ ASRResult (Round: N)        // ASR 开始识别结果
  ├─ LLMResult (Round: N)        // LLM 回复内容
  ├─ AgentInstanceStatus (Round: N)  // 状态变化（聆听→思考→说话→空闲）
  ├─ AgentInstanceMetaInfo (Round: N) // 元数据（音色、情绪等）
  └─ 结束 → Round N 完成

触发 → Round N 开始
  ├─ ASRResult (Round: N)        // ASR 开始识别结果
  ├─ LLMResult (Round: N)        // LLM 回复内容
  ├─ AgentInstanceStatus (Round: N)  // 状态变化（聆听→思考→说话→空闲）
  ├─ AgentInstanceMetaInfo (Round: N) // 元数据（音色、情绪等）
  └─ 结束 → Round N 完成

Round 的生成时机

触发来源	Round 生成时机	说明
用户说话	检测到用户语音开始	ASR 开始识别时，Round 已确定
SendAgentInstanceLLM	API 请求成功	服务端接收请求后立即分配 Round
SendAgentInstanceTTS	API 请求成功	服务端接收请求后立即分配 Round

功能示意图

数据流转总览

┌─────────────────────────────────────────────────────────────────┐
│                        触发来源                                  │
├───────────────────────┬─────────────────────────────────────────┤
│   用户主动说话         │   业务方调用 API                         │
│   (RTC 房间语音)        │   SendAgentInstanceLLM/TTS              │
└───────────┬───────────┴────────────────┬────────────────────────┘
            │                            │
            ▼                            ▼
    ┌────────────────────────────────────────┐
    │    ZEGO AI Agent 服务端                 │
    │    生成 Round (升序序号)                │
    └────────────────────────────────────────┘
                    │
        ┌───────────┼───────────┬───────────┐
        │           │           │           │
        ▼           ▼           ▼           ▼
  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
  │   ASR    │ │   LLM    │ │   TTS    │ │  Status  │
  │  Result  │ │  Result  │ │  Status  │ │  Change  │
  │ Round: N │ │ Round: N │ │ Round: N │ │ Round: N │
  └──────────┘ └──────────┘ └──────────┘ └──────────┘
        │           │           │           │
        └───────────┴───────────┴───────────┘
                    │
                    ▼
        ┌───────────────────────┐
        │   业务方服务端回调     │
        │   通过 Round 关联所有事件 │
        └───────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│                        触发来源                                  │
├───────────────────────┬─────────────────────────────────────────┤
│   用户主动说话         │   业务方调用 API                         │
│   (RTC 房间语音)        │   SendAgentInstanceLLM/TTS              │
└───────────┬───────────┴────────────────┬────────────────────────┘
            │                            │
            ▼                            ▼
    ┌────────────────────────────────────────┐
    │    ZEGO AI Agent 服务端                 │
    │    生成 Round (升序序号)                │
    └────────────────────────────────────────┘
                    │
        ┌───────────┼───────────┬───────────┐
        │           │           │           │
        ▼           ▼           ▼           ▼
  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
  │   ASR    │ │   LLM    │ │   TTS    │ │  Status  │
  │  Result  │ │  Result  │ │  Status  │ │  Change  │
  │ Round: N │ │ Round: N │ │ Round: N │ │ Round: N │
  └──────────┘ └──────────┘ └──────────┘ └──────────┘
        │           │           │           │
        └───────────┴───────────┴───────────┘
                    │
                    ▼
        ┌───────────────────────┐
        │   业务方服务端回调     │
        │   通过 Round 关联所有事件 │
        └───────────────────────┘

两种数据流转对比

【生成时机 1：用户主动说话】
用户："今天天气怎么样？" (Round 5)
  ↓
服务端：ASR 识别 → LLM 思考 → TTS 播报
  ↓
回调序列：
  - ASRResult (Round: 5, 文本："今天天气怎么样？")
  - AgentInstanceStatus (Round: 5, 状态："Thinking")
  - LLMResult (Round: 5, 文本："今天天气晴朗...")
  - AgentInstanceStatus (Round: 5, 状态："Speaking")
  - AgentInstanceStatus (Round: 5, 状态："Idle")

【生成时机 2：业务方调用 API】
业务方：SendAgentInstanceTTS("欢迎语") (Round 6)
  ↓
服务端：直接 TTS 播报
  ↓
回调序列：
  - AgentInstanceStatus (Round: 6, 状态："Speaking")
  - AgentInstanceMetaInfo (Round: 6, 音色："zh_female")
  - AgentInstanceStatus (Round: 6, 状态："Idle")

【生成时机 1：用户主动说话】
用户："今天天气怎么样？" (Round 5)
  ↓
服务端：ASR 识别 → LLM 思考 → TTS 播报
  ↓
回调序列：
  - ASRResult (Round: 5, 文本："今天天气怎么样？")
  - AgentInstanceStatus (Round: 5, 状态："Thinking")
  - LLMResult (Round: 5, 文本："今天天气晴朗...")
  - AgentInstanceStatus (Round: 5, 状态："Speaking")
  - AgentInstanceStatus (Round: 5, 状态："Idle")

【生成时机 2：业务方调用 API】
业务方：SendAgentInstanceTTS("欢迎语") (Round 6)
  ↓
服务端：直接 TTS 播报
  ↓
回调序列：
  - AgentInstanceStatus (Round: 6, 状态："Speaking")
  - AgentInstanceMetaInfo (Round: 6, 音色："zh_female")
  - AgentInstanceStatus (Round: 6, 状态："Idle")

获取全链路的 Round

客户端 SDK 回调

重要： 客户端可通过 ZEGO Express SDK 的试验性 API 回调获取 Round 信息，用于实现状态切换 UI（如"聆听中"、"思考中"、"说话中"）。

Cmd 类型及 Data 字段

Cmd	类型	Data 字段	说明
1	用户说话状态	SpeakStatus, UserId	SpeakStatus: 1=说话开始，2=说话结束
3	ASR 文本	Text, MessageId, UserId, StartFlag, EndFlag	增量下发 ASR 识别结果
4	LLM 应答	Text, MessageId, EndFlag	增量下发 LLM 回复
6	智能体状态	Status, OldStatus, Reason	Status: 0=空闲，1=聆听，2=思考，3=说话
102	元数据信息	Object	元数据信息（音色、情绪、动作等）

回调字段说明（关键字段）

所有回调事件的公共字段：

{
  "Timestamp": 1765510379,           // 秒级时间戳
  "TimestampMs": 1765510379113,      // 毫秒级时间戳
  "SeqId": 278800715,                // 包序列号，保证有序性，不保证连续性
  "Round": 510359002,                // 【核心】对话轮次，升序生成，不保证连续性
  "Cmd": 1,                          // 命令类型
  "Data": { ... }                    // 具体内容，见下表
}

{
  "Timestamp": 1765510379,           // 秒级时间戳
  "TimestampMs": 1765510379113,      // 毫秒级时间戳
  "SeqId": 278800715,                // 包序列号，保证有序性，不保证连续性
  "Round": 510359002,                // 【核心】对话轮次，升序生成，不保证连续性
  "Cmd": 1,                          // 命令类型
  "Data": { ... }                    // 具体内容，见下表
}

详细文档

完整 API 说明参考：智能体实例 SDK 端回调

服务端回调

回调类型列表

回调事件	回调时机	用途
ASRResult	用户说话识别完成	获取 ASR 识别文本
LLMResult	LLM 生成回复	获取 AI 回复内容
AgentInstanceStatus	状态变化	追踪 AI 当前状态（聆听/思考/说话/空闲）
AgentInstanceMetaInfo	开始播报	获取元数据（音色、情绪、动作等）
Interrupted	被打断	处理打断逻辑
UserSpeakAction	用户开始/结束说话	检测用户发言
AgentSpeakAction	AI 开始/结束说话	检测 AI 发言
UserAudioData	用户开始说话	获取对应轮次的音频数据
Exception	发生错误	错误处理

回调字段说明（关键字段）

所有回调事件的公共字段：

{
  "Event": "ASRResult",           // 事件类型
  "RoomId": "room_123",               // 房间 ID
  "AgentUserId": "agent_user_001",    // Agent 用户 ID
  "Sequence": 1234567890,             // 事件序列号（全局递增）
  "Timestamp": 1746619200000,         // 时间戳（毫秒）
  "Data": {
    "Round": 5,                         // 【核心】轮次序号
    ...
  }                     // 事件具体数据
}

{
  "Event": "ASRResult",           // 事件类型
  "RoomId": "room_123",               // 房间 ID
  "AgentUserId": "agent_user_001",    // Agent 用户 ID
  "Sequence": 1234567890,             // 事件序列号（全局递增）
  "Timestamp": 1746619200000,         // 时间戳（毫秒）
  "Data": {
    "Round": 5,                         // 【核心】轮次序号
    ...
  }                     // 事件具体数据
}

详细文档

完整说明参考：服务端回调

获取全链路 Round 示例

常见问题

Q1: Round 何时生成？

A: 每次新交互触发时生成，包括：

用户开始说话
调用 SendAgentInstanceLLM
调用 SendAgentInstanceTTS

打断场景下，新交互触发新 Round，旧 Round 终止。

Q2: Round 会重复吗？

A: 不会。Round 升序生成，不会重复。

Q3: 如何调试 Round 关联问题？

A: 建议业务方在日志中记录每个回调的 Round 值，按 Round 分组查看：

[Round 5] ASRResult: "用户提问内容"
[Round 5] LLMResult: "AI 回复内容"
[Round 5] Status: Speaking → Idle

[Round 5] ASRResult: "用户提问内容"
[Round 5] LLMResult: "AI 回复内容"
[Round 5] Status: Speaking → Idle

如发现 Round 不连续或事件缺失，检查：

回调配置是否开启（CallbackConfig）
网络是否正常（回调是否丢失）

Q4: Round 和 Sequence 有什么区别？

Round：标识"哪一次交互"，同一交互的所有事件 Round 相同
Sequence：标识"事件的全局顺序"，所有事件严格递增

业务方用 Round 关联同一交互的事件，用 Sequence 检测事件是否丢失。

Q5: 打断时，被中断的 Round 还会收到 LLMResult 吗？

A: 不会。打断时，如果 LLM 还在生成，服务端会停止生成，不会回调 LLMResult。

业务方应在收到 Interrupted 事件后，清理被中断 Round 的 pending 状态。

附录：回调配置示例

创建 Agent 实例时，开启相关回调：

{
  "Action": "CreateAgentInstance",
  "AppId": 1234567890,
  "AgentId": "agent_001",
  "UserId": "user_001",
  "RTC": {
    "RoomId": "room_123",
    "AgentStreamId": "stream_001",
    "AgentUserId": "agent_user_001",
    "UserStreamId": "user_stream_001"
  },
  "CallbackConfig": {
    "ASRResult": 1,              // 开启 ASR 回调
    "LLMResult": 1,              // 开启 LLM 回调
    "Interrupted": 1,            // 开启打断回调
    "UserSpeakAction": 1,        // 开启用户说话回调
    "AgentSpeakAction": 1,       // 开启 AI 说话回调
    "AgentInstanceStatus": 1     // 开启状态回调
  }
}

{
  "Action": "CreateAgentInstance",
  "AppId": 1234567890,
  "AgentId": "agent_001",
  "UserId": "user_001",
  "RTC": {
    "RoomId": "room_123",
    "AgentStreamId": "stream_001",
    "AgentUserId": "agent_user_001",
    "UserStreamId": "user_stream_001"
  },
  "CallbackConfig": {
    "ASRResult": 1,              // 开启 ASR 回调
    "LLMResult": 1,              // 开启 LLM 回调
    "Interrupted": 1,            // 开启打断回调
    "UserSpeakAction": 1,        // 开启用户说话回调
    "AgentSpeakAction": 1,       // 开启 AI 说话回调
    "AgentInstanceStatus": 1     // 开启状态回调
  }
}