摘要

In this paper, reflected sound of frequency just above the audible range is used to detect speech activity. The active signal used is inaudible to humans, readily generated by the typical audio circuitry and components found in mobile telephones, and is robust to background sounds such as nearby voices. In use, the system relies upon a wideband excitation signal emitted from a loudspeaker located near the lips, which reflects from the mouth region and is then captured by a nearby microphone. The state of the lip opening is evaluated periodically by tracking the resonance patterns in the reflected excitation signal. When the lips are open, deep and complex resonances are formed as energy propagates into and then reflects out from the open mouth and vocal tract, with resonance depth being related to the open lip area. When the lips are closed, these resonance patterns are absent. The presence of the resonances can thus serve as a low complexity detection measure. The technique is evaluated for multiple users in terms of sensitivity to source placement and sensor placement. Voice activity detection performance using this measure is further evaluated in the presence of realistic wideband acoustic background noise, as well as artificially added noise. The system is shown to be relatively insensitive to sensor placement, highly insensitive to background noise, and able to achieve greater than 90% voice activity detection accuracy. The technique is even suitable when a subject is whispering in the presence of much louder multi-speaker babble. The technique has potential for speech-based systems operating in high noise environments as well as in silent speech interfaces, whisper-input systems and voice prostheses for speech-impaired users.