摘要

A voice conversion algorithm, which makes use of the information between continuous frames of speech by compressed sensing, is proposed in this paper. According to the sparsity property of the concatenated vector of several continuous Linear Spectrum Pairs (LSP) in the discrete cosine transformation domain, this paper utilizes compressed sensing to extract the compressed vector from the concatenated LSPs and uses it as the feature vector to train the conversion function. The results of evaluations demonstrate that the performance of this approach can averagely improve 3.21% comparing with the conventional algorithm based on weighted frequency warping when choosing the appropriate numbers of speech frame. The experimental results also illustrate that the performance of voice conversion system can be improved by taking full advantage of the inter-frame information, because those information can make the converted speech remain the more stable acoustic properties which is inherent in inter-frames.

全文