A Feature Study for Classification-Based Speech Separation at Low Signal-to-Noise Ratios

Chen Jitong<sup>*</sup>; Wang Yuxuan; Wang DeLiang

doi:10.1109/TASLP.2014.2359159

摘要

Speech separation can be formulated as a classification problem. In classification-based speech separation, supervised learning is employed to classify time-frequency units as either speech-dominant or noise-dominant. In very low signal-to-noise ratio (SNR) conditions, acoustic features extracted from a mixture are crucial for correct classification. In this study, we systematically evaluate a range of promising features for classification-based separation using six nonstationary noises at the low SNR level of dB, which is chosen with the goal of improving human speech intelligibility in mind. In addition, we propose a new feature called multi-resolution cochleagram (MRCG). The new feature is constructed by combining four cochleagrams at different spectrotemporal resolutions in order to capture both the local and contextual information. Experimental results show that MRCG gives the best classification results among all evaluated features. In addition, our results indicate that auto-regressive moving average (ARMA) filtering, a post-processing technique for improving automatic speech recognition features, also improves many acoustic features for speech separation.

出版日期2014-12

全文

访问全文

收藏分享被引(61) 浏览

更新时间：2021-04-16 17:41

A Feature Study for Classification-Based Speech Separation at Low Signal-to-Noise Ratios

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友