摘要

People in dialog use a rich set of nonverbal behaviors, including variations in the prosody of their utterances. Such behaviors, often emotion-related, call for appropriate responses, but today's spoken dialog systems lack the ability to do this. Recent work has shown how to recognize user emotions from prosody and how to express system-side emotions with prosody, but demonstrations of how to combine these functions to improve the user experience have been lacking. Working with a corpus of conversations with students about graduate school, we analyzed the emotional states of the interlocutors, utterance-by-utterance, using three dimensions: activation, evaluation, and power. We found that the emotional coloring of the speaker's utterance could be largely predicted from the emotion shown by her interlocutor in the immediately previous utterance. This finding enabled us to build Gracie, the first spoken dialog system that recognizes a user's emotional state from his or her speech and gives a response with appropriate emotional coloring. Evaluation with 36 subjects showed that they felt significantly more rapport with Gracie than with either of two controls. This shows that dialog systems can tap into this important level of interpersonal interaction using today's technology.

  • 出版日期2011-12