摘要

Estimating a wideband spectral envelope having only narrowband speech at hand is a challenging task. In this paper, we explore ways to do so in the context of an artificial speech bandwidth extension (ABE) framework. Starting from a typical hidden Markov model (HMM)/Gaussian mixture model baseline scheme, we investigate two types of features, topologies, and regularization approaches of deep neural networks (DNNs) to obtain estimates of wideband spectral envelopes with smallest cepstral distance to the original ones. In order to draw realistic conclusions, we employ a database for test, which is acoustically different to the training and validation speech material. Interestingly, it turns out that a DNN regression approach outperforms all other investigated methods, although the HMM has been dropped. Cepstral distance was reduced by 1.18 dB, wideband PESQ was improved by 0.23 MOS points, and a subjective comparison category rating listening test showed a significant preference of the best DNN ABE approach versus narrowband speech of 1.37 CMOS points.

  • 出版日期2018-1