Audio keywords generation for sports video analysis

作者:Xu Min*; Xu Changsheng; Duan Lingyu; Jin Jesse S; Luo Suhuai
来源:ACM Transactions on Multimedia Computing, Communications, and Applications, 2008, 4(2): 11.
DOI:10.1145/1352012.1352015

摘要

Sports video has attracted a global viewership. Research effort in this area has been focused on semantic event detection in sports video to facilitate accessing and browsing. Most of the event detection methods in sports video are based on visual features. However, being a significant component of sports video, audio may also play an important role in semantic event detection. In this paper, we have borrowed the concept of the "keyword" from the text mining domain to define a set of specific audio sounds. These specific audio sounds refer to a set of game-specific sounds with strong relationships to the actions of players, referees, commentators, and audience, which are the reference points for interesting sports events. Unlike low-level features, audio keywords can be considered as a mid-level representation, able to facilitate high-level analysis from the semantic concept point of view. Audio keywords are created from low-level audio features with learning by support vector machines. With the help of video shots, the created audio keywords can be used to detect semantic events in sports video by Hidden Markov Model ( HMM) learning. Experiments on creating audio keywords and, subsequently, event detection based on audio keywords have been very encouraging. Based on the experimental results, we believe that the audio keyword is an effective representation that is able to achieve satisfying results for event detection in sports video. Application in three sports types demonstrates the practicality of the proposed method.

  • 出版日期2008