摘要

Local-based approaches have recently shown a great promise for computer vision tasks and local space-time features have become a popular video representation for human action recognition. In this paper, we focus on recognizing the action of a person based on the appearance and motion information by constructing spatio-temporal features in the "bag of keypoints" paradigm. We evaluate the performance of some local appearance detectors and descriptors along with the Histogram of Oriented Optical Flow (HOOF) as a motion descriptor. These features consist of SIFT, SURF and Harris-PHOG, each of which is concatenated with the HOOF of interest points. Although no single type of these features is optimal for all needs in the recognition task, in this paper, we present a feature combination representation to improve the recognition rate. In this method, we assume an image representation based on a single feature and extend it to the case of multiple features from several descriptors. Such a representation for merging features has high flexibility such that other descriptors yielding other feature vectors can be easily incorporated. We use Support Vector Machine as the classifier and evaluate results using two kinds of kernels, the polynomial kernel and the radial basis function kernel. The proposed methods achieve higher or similar accuracies compared with several state-of-the-art categorization methods on the challenging benchmark dataset of KTH.

  • 出版日期2014-2

全文