摘要

This paper proposes a hybrid refinement scheme for more accurate localization of phonetic boundaries by combining three different post-processing techniques, including statistical correction, fusion, and predictive models. A statistical method based on state-level correction is proposed to improve the segmentation results. Effects of search ranges on the statistical correction process are studied and a state selection scheme is used to enhance the correction results. This paper also examines the effects of time resolution, i.e., stepsize, of acoustic models on the accuracy of segmentation. A multi-resolution fusion process is proposed to further refine the statistically corrected results. Finally, predictive models are designed to improve the segmentation accuracy by incorporating various acoustic features and searching around the preliminary boundary with a smaller stepsize. By applying the hybrid refinement scheme on a well-known corpus, significant improvements of segmentation results in terms of segmentation accuracy with different tolerances, mean absolute error (MAE), and root-mean-square error (RMSE) can be observed. Furthermore, a scenario of cross-corpora segmentation is examined in generating the segmentation results for a new corpus with a small set of labeled data. Experimental results show that the proposed refinement procedure can generate segmentation results comparable to those given by well-trained acoustic models obtained from the new corpus.

  • 出版日期2015-1
  • 单位南阳理工学院