摘要

In this paper, a novel parametric prosody coding approach for Mandarin speech is proposed. It employs a hierarchical prosodic model (HPM) as a prosody-generating model in the encoder to analyze the speech prosody of the input utterance to obtain a parametric representation of four prosodic-acoustic features of syllable pitch contour, syllable duration, syllable energy level, and syllable-juncture pause duration for encoding. In the decoder, the four prosodic-acoustic features are reconstructed by a synthesis operation using the decoded HPM parameters. The reconstructed prosodic features are lastly used in an HMM-based speech synthesizer to generate the reconstructed speech. Objective and subjective evaluations showed that the proposed prosody coding approach encoded speech with better quality and lower data rate than the conventional segment-based coding scheme with vector or scalar quantization approach did. The reconstructed speech encoded by the proposed approach has good quality at low data rates of 81.4 and 72.7 bps for speaker-dependent and speaker-independent tasks, respectively. An application of the proposed prosody coding approach to speaking rate conversion by directly changing the HPM parameters to those of a different speaking rate is also illustrated. An informal listening test confirmed that both converted speeches of high and low speaking rate sounded very smooth.

  • 出版日期2018-7-11

全文