Automatic Prediction of Children's Reading Ability for High-Level Literacy Assessment

作者:Black Matthew P*; Tepperman Joseph; Narayanan Shrikanth S
来源:IEEE Transactions on Audio Speech and Language Processing, 2011, 19(4): 1015-1028.
DOI:10.1109/TASL.2010.2076389

摘要

Automatic literacy assessment technology can help children acquire reading skills by providing teachers valuable feedback in a repeatable, consistent manner. Recent research efforts have concentrated on detecting mispronunciations during word-reading and sentence-reading tasks. These token-level assessments are important since they highlight specific errors made by the child. However, there is also a need for more high-level automatic assessments that capture the overall performance of the children. These high-level assessments can be viewed as an interpretive extension to token-level assessments, and may be more perceptually relevant to teachers and helpful in tracking performance over time. In this paper, we model and predict the overall reading ability of young children reading a list of English words aloud. The data consist of audio recordings, collected in real kindergarten to second grade classrooms from children from native English-and Spanish-speaking households. This research is broken into two main parts. The first part is a user study, in which 11 human evaluators rated the children on their overall reading ability based on the audio recordings. The evaluators were volunteers from a diverse background, seven of whom were native speakers of American English and four that were fluent speakers of English as a secondary language. While none of the evaluators were trained reading experts or licensed teachers, a subset of them were linguists and researchers with experience in automatic literacy assessment. As part of this work, we analyzed the effect of the evaluator's background on inter-evaluator agreement. In the second part, we ran machine learning experiments to predict evaluators' scores using features automatically extracted from the audio. The features were human-inspired and correlated with cues human evaluators stated they used: pronunciation correctness, speaking rate, and fluency. We investigated various automated methods to verify the correctness of the word pronunciations and to detect disfluencies in the children's speech using held-out annotated data. Using linear regression techniques, we automatically predicted individual evaluators' high-level scores with a mean Pearson correlation coefficient of 0.828, and we predicted average evaluator's scores with correlation 0.946. Both these human-machine agreement statistics exceeded the mean inter-evaluator agreement statistics.

  • 出版日期2011-5