摘要

Prosodic structure generation is the key component in improving the intelligibility and naturalness of synthetic speech for a text-to-speech (TTS) system. This paper investigates the problem of automatic segmentation of prosodic word and prosodic phrase, which are two fundamental layers in the hierarchical prosodic structure of Mandarin, and presents a two-stage prosodic structure generation strategy. Conditional random fields (CRF) models are built for both prosodic word and prosodic phrase prediction at the front end with different feature selections. Besides, a transformation-based error-driven learning (TBL) modification module is introduced in the back end to amend the initial prediction. Experiment results show that the approach combining CRF and TBL achieves an F-score of 94.66%.

全文