AI paper index

Multi-Faceted Interactivity Alignment in Full-Duplex Speech Models

2026-06-09 · arXiv: 2606.11167

One-line summary

An AI research paper on Multi-Faceted Interactivity Alignment in Full-Duplex Speech Models.

Engineering notes

Engineering notes will be added by the aipentium editorial team.

Chinese explanation / 中文解读

中文解读待补充:本站会优先为大语言模型、生成式AI、ChatGPT相关技术、计算机视觉、深度学习等高价值论文补充中文说明。

Original abstract

Full-duplex spoken dialogue models can listen and speak simultaneously, making them a promising architecture for natural conversation. However, current models are trained solely with supervised learning through token-level likelihood maximization, which does not directly optimize interaction-level behaviors, causing interactivity issues such as excessive silence and ill-timed turn-taking. Recent work has applied reinforcement learning (RL) to improve interactivity, but existing methods address only a limited set of interactive behaviors in their rewards. In this work, we propose a post-training alignment method that comprehensively improves the interactivity of full-duplex spoken dialogue models through RL. We address the four canonical axes of interactivity: pause handling, turn-taking, backchanneling, and user interruption. For each axis, we extract short audio segments from human conversation corpora and optimize the model with axis-specific reward functions. An extra LLM-based reward for response quality prevents semantic degradation. We apply our method to two open-source models, Moshi and PersonaPlex, demonstrating consistent improvements in interactivity on both offline evaluation with pre-recorded audio and real-time multi-turn dialogue evaluation.

5.0Engineering value
7.0Research novelty
4.0Business relevance

Links and sources

Need this topic turned into a technical roadmap?

aipentium can prepare a custom AI literature review, code map, dataset map, and B2B technology assessment.

Request B2B AI research

Comments

No comments yet. Be the first to share your thoughts on this paper.
Login or register to leave a comment