An Optimal Set of Flesh Points on Tongue and Lips for Speech-Movement Classification

作者:Wang Jun*; Samal Ashok; Rong Panying; Green Jordan R
来源:Journal of Speech, Language, and Hearing Research, 2016, 59(1): 15-26.
DOI:10.1044/2015_JSLHR-S-14-0112

摘要

Purpose: The authors sought to determine an optimal set of flesh points on the tongue and lips for classifying speech movements. Method: The authors used electromagnetic articulographs (Carstens AG500 and NDI Wave) to record tongue and lip movements from 13 healthy talkers who articulated 8 vowels, 11 consonants, a phonetically balanced set of words, and a set of short phrases during the recording. We used a machine-learning classifier (support-vector machine) to classify the speech stimuli on the basis of articulatory movements. We then compared classification accuracies of the flesh-point combinations to determine an optimal set of sensors. Results: When data from the 4 sensors (T1: the vicinity between the tongue tip and tongue blade; T4: the tongue-body back; UL: the upper lip; and LL: the lower lip) were combined, phoneme and word classifications were most accurate and were comparable with the full set (including T2: the tongue-body front; and T3: the tongue-body front). Conclusion: We identified a 4-sensor set-that is, T1, T4, UL, LL-that yielded a classification accuracy (91%-95%) equivalent to that using all 6 sensors. These findings provide an empirical basis for selecting sensors and their locations for scientific and emerging clinical applications that incorporate articulatory movements.

  • 出版日期2016-2