摘要

Audio-Visual People Diarization (AVPD) is an original framework that simultaneously improves audio, video, and audiovisual diarization results. Following a literature review of people diarization for both audio and video content and their limitations, which includes our own contributions, we describe a proposed method for associating both audio and video information by using co-occurrence matrices and present experiments which were conducted on a corpus containing TV news, TV debates, and movies. Results show the effectiveness of the overall diarization system and confirm the gains audio information can bring to video indexing and vice versa.

  • 出版日期2014-2