摘要

This paper presents an automatic non-native accent assessment approach using phonetic level posterior and duration features. In this method, instead of using conventional MFCC trained Gaussian Mixture Models (GMM), we use phonetic phoneme states as tokens to calculate the posterior probability and zero-oder Baum-Welch statistics. Phoneme recognizers from five languages are employed to extract phonetic level features. It is shown that features based on these five languages' phoneme recognizers are complementary for capturing non-native information and phoneme duration based features are most effective in this task. The final proposed fusion system achieved 0.6089 Spearman's Correlation Coefficient on the test set, which outperformed the openSMILE baseline by 43.3%.

全文