An adapted data selection for deep learning-based audio segmentation in multi-genre broadcast channel

Yang, Xu-Kui; Qu, Dan<sup>*</sup>; Zhang, Wen-Lin; Zhang, Wei-Qiang

doi:10.1016/j.dsp.2018.03.004

摘要

Broadcast audio transcription is still a challenging problem because of the complexity of diverse speech and audio signals. Audio segmentation, which is an essential module in a broadcast audio transcription system, has benefited greatly from the development of deep learning theory. However, the need of large amounts of labeled training data becomes a bottleneck of deep learning-based audio segmentation methods. To tackle this problem, an adapted segmentation method is proposed to select speech/nonspeech segments with high confidence from unlabeled training data as complements to the labeled training data. The new method relies on GMM-based speech/non-speech models trained on an utteranceby-utterance basis. The long-term information is used to choose reliable training data for speech/nonspeech models from the utterances at hand. Experimental results show that this data selection method is a powerful audio segmentation algorithm of its own. We also observed that the deep neural networks trained using data selected by this method are superior to those trained with data chosen by two comparing methods. Moreover, better performance could be obtained by combining the deep learning-based audio segmentation method with the adapted data selection method.

出版日期2018-10
单位清华大学

全文

访问全文

收藏分享被引(8) 浏览

更新时间：2023-11-13 18:32

An adapted data selection for deep learning-based audio segmentation in multi-genre broadcast channel

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友