Audiovisual Fusion: Challenges and New Approaches

Katsaggelos Aggelos K; Bahaadini Sara; Molina Rafael

doi:10.1109/JPROC.2015.2459017

摘要

In this paper, we review recent results on audiovisual (AV) fusion. We also discuss some of the challenges and report on approaches to address them. One important issue in AV fusion is how the modalities interact and influence each other. This review will address this question in the context of AV speech processing, and especially speech recognition, where one of the issues is that the modalities both interact but also sometimes appear to desynchronize from each other. An additional issue that sometimes arises is that one of the modalities may be missing at test time, although it is available at training time; for example, it may be possible to collect AV training data while only having access to audio at test time. We will review approaches to address this issue from the area of multiview learning, where the goal is to learn a model or representation for each of the modalities separately while taking advantage of the rich multimodal training data. In addition to multiview learning, we also discuss the recent application of deep learning (DL) toward AV fusion. We finally draw conclusions and offer our assessment of the future in the area of AV fusion.

出版日期2015-9
单位西北大学

全文

访问全文

收藏分享被引(89) 浏览

更新时间：2024-04-23 12:46

Audiovisual Fusion: Challenges and New Approaches

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友