RESPOND: Responsive Engagement Strategy for Predictive Orchestration and Dialogue
- Meng-Chen Lee ,
- Costas Panay ,
- Javier Hernandez ,
- Sean Andrist ,
- Dan Bohus ,
- Anatoly Churikov ,
- Andrew D. Wilson
The majority of voice-based conversational agents still rely on pause-and-respond turn-taking, leaving interactions sounding stiff and robotic. We present RESPOND (Responsive Engagement Strategy for Predictive Orchestration and Dialogue), a framework that brings two staples of human conversation to agents: timely backchannels (“mm-hmm,” “right”) and proactive turn claims that can contribute relevant content before the speaker yields the conversational floor. Built on streaming ASR (Automatic Speech Recognition) and incremental semantics, RESPOND continuously predicts both when and how to interject, enabling fluid, listener-aware dialogue. A defining feature is its designer-facing controllability: two orthogonal dials, Backchannel Intensity (frequency of acknowledgments) and Turn Claim Aggressiveness (depth and assertiveness of early contributions), can be tuned to match the etiquette of contexts ranging from rapid ideation to reflective counseling. By coupling predictive orchestration with explicit control, RESPOND offers a practical path toward conversational agents that adapt their conversational footprint to social expectations, advancing the design of more natural and engaging voice interfaces.
Sample audio: