Action Recognition by Time Series of Retinotopic Appearance and Motion Features

作者:Barrett Daniel Paul*; Siskind Jeffrey Mark
来源:IEEE Transactions on Circuits and Systems for Video Technology, 2016, 26(12): 2250-2263.
DOI:10.1109/TCSVT.2015.2502839

摘要

We present a method for recognizing and localizing actions in video by the sequence of changing appearance and motion of the participants. Appearance is modeled by histogram of oriented gradients object detectors, while motion is modeled by optical-flow motion-pattern detectors. Sequencing is modeled by a hidden Markov model (HMM) whose output models are these appearance and motion detectors. The HMM and associated detectors are simultaneously trained, learning the sequence of detectors that match the most distinctive temporal subsequences of the action represented in the training data. Training uses both positive and negative samples of a given action class and is accomplished without the need for annotation of the correspondence between training video frames and the state-conditioned detectors, by minimizing a discriminative cost function through gradient descent. Trained models are used to perform recognition and localization by simultaneous detection, tracking, and action recognition. In contrast to many prior methods, our approach learns intuitively meaningful models that represent action as a sequence of retinotopic models. We demonstrate such by rendering these models on unseen test video. This method was found to perform competitively on three standard datasets, Weizmann, KTH, and UCF Sports, as well as on the video from the Defence Advanced Research Project Agency (DARPA) Mind's Eye program and a newly filmed dataset.

  • 出版日期2016-12