Discovering salient prosodic cues and their interactions for automatic   story segmentation in Mandarin broadcast news

Xie Lei<sup>*</sup>

doi:10.1007/s00530-008-0141-1

摘要

This paper investigates speech prosody for automatic story segmentation in Mandarin broadcast news. Prosodic cues effectively used in English story segmentation deserve a re-investigation since the lexical tones of Mandarin may complicate the expressions of pitch declination and reset. Our data-oriented study shows that story boundaries cannot be clearly discriminated from utterance boundaries by speaker normalized pitch features due to their large variations across different Mandarin syllable tones. We thus propose to use speaker- and tone-normalized pitch features that can provide clear separations between utterance and story boundaries. Our study also shows that speaker-normalized pause duration is quite effective to separate between story and utterance boundaries, while speaker-normalized speech energy and syllable duration are not effective. Experiments using decision trees for story boundary detection reinforce the difference between English and Chinese, i.e., speaker- and tone-normalized pitch features should be favorably adopted in Mandarin story segmentation. We show that the combination of different prosodic cues can achieve a very high F-measure of 93.04% due to the complementarity between pause, pitch and energy. Analysis of the decision tree uncovered five major heuristics that show how speakers jointly utilize pause duration and pitch to separate speech into stories.

出版日期2008-9
单位西北工业大学

全文

访问全文

收藏分享被引(4) 浏览

更新时间：2018-08-02 11:20

Discovering salient prosodic cues and their interactions for automatic story segmentation in Mandarin broadcast news

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友