摘要

Action recognition in static image is challenging. The authors propose mutually incoherent pose bases which are implicit poselet co-occurrences and are learned by dictionary training to describe body pose. Poselets in a pose basis are not constrained in space and quantity, thus pose basis can describe body pose more flexibly than k-poselet. In their method, body pose in an image is represented by a sparse linear combination of pose bases because pose in an action varies while each image only captures a snapshot from a single viewpoint. In dictionary training, the challenge is how to stabilise the sparse representation which is the input of Support Vector Machine (SVM) for action recognition, because the original pose signal is ambiguous while dictionary is an over complete matrix. Their solution is to add cumulative coherence as penalty in objective function and induce pose bases become mutually incoherent. They evaluate the method on two popular datasets and experiment results show the pose representation has encouraging performance in action recognition. Furthermore, they empirically exploit the complementary role of the local pose feature with deep convolutional neural network features from holistic image. Experiment results demonstrate aggressive performance improvement by concatenating the two features.