Audio Pattern Recognition of Baby Crying Sound Events

作者:Ntalampiras Stavros*
来源:Journal of the Audio Engineering Society, 2015, 63(5): 358-369.
DOI:10.17743/jaes.2015.0025

摘要

This article addresses a problem arising within the paralinguistic audio signal processing domain that of classifying the state of an infant based on the patterns exhibited by the crying sound events. More specifically we propose a methodology able to distinguish among the following five states: (a) hungry, (b) uncomfortable (need change), (c) need to burp, (d) in pain, and (e) need to sleep. A great variety of audio parameters (Perceptual Linear Prediction, Mel Frequency Cepstral Coefficients, Perceptual Wavelet Packets, Teager Energy Operator, Temporal Modulation) related to the task at hand along with a series of classification techniques (Multilayer Perceptron, Support Vector Machine, Random Forest, Reservoir Network, Gaussian Mixture model, Hidden Markov model) were customized for addressing the issue in a reliable manner. The final implementation exploits a representation of the audio structure including a set of descriptors capturing heterogeneous aspects of the signal. Subsequently we introduce the usage of Reservoir Networks to the specific problematic that demonstrated quite encouraging performance. The final goal of the method is to provide an automatic and non-invasive framework for monitoring infants and helping inexperienced/trainee pediatricians and/or parents and babysitters to diagnose their pathological status.

  • 出版日期2015-5