Multi-cue fusion for emotion recognition in the wild

Yan, Jingwei; Zheng, Wenming<sup>*</sup>; Cui, Zhen; Tang, Chuangao; Zhang, Tong; Zong, Yuan

doi:10.1016/j.neucom.2018.03.068

摘要

Emotion recognition has become a hot research topic in the past several years due to the large demand of this technology in many practical situations. One challenging task in this topic is to recognize emotion types in a given video clip collected in the wild. In order to solve this problem we propose a multi-cue fusion emotion recognition (MCFER) framework by modeling human emotions from three complementary cues, i.e., facial texture, facial landmark action and audio signal, and then fusing them together. To capture the dynamic change of facial texture we employ a cascaded convolutional neutral network (CNN) and bidirectional recurrent neutral network (BRNN) architecture where facial image from each frame is first fed into CNN to extract high-level texture feature, and then the feature sequence is traversed into BRNN to learn the changes within it. Facial landmark action models the movement of facial muscles explicitly. SVM and CNN are deployed to explore the emotion related patterns in it. Audio signal is also modeled with CNN by extracting low-level acoustic features from segmented clips and then stacking them as an image-like matrix. We fuse these models at both feature level and decision level to further boost the overall performance. Experimental results on two challenging databases demonstrate the effectiveness and superiority of our proposed MCFER framework.

出版日期2018-10-2
单位南京理工大学; 东南大学

全文

访问全文

收藏分享被引(32) 浏览

更新时间：2024-05-10 22:18

Multi-cue fusion for emotion recognition in the wild

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友