摘要

Vision-based scene recognition aims to find a semantic explanation of a scene image. First, how to learn a sparse feature representation by mapping each cluster of patches into few dimensions is one challenging issue. Sparse coding techniques with l(1) norm regularization are widely used to learn sparse features. However, l(1) norm regularization may not achieve enough sparsity. Furthermore, the initial value has a great influence on the sparsity of codes during the iterative optimization process. Second, how to learn a classifier in a supervised manner in order to improve the generalization is another issue. This paper therefore proposes a scene recognition method and it mainly includes two processes. One is the homotopy iterative hard thresholding (HIHT) algorithm that encodes the sparse representations of local patches by incorporating an l(0) norm regularization. Furthermore, a homotopy continuity strategy is used to improve the sparsity of feature codes by adaptively tuning the regularization factor from large to small values and using the sparse solution of last iteration as the warm start of the next iteration. The other is the extreme learning machine (ELM)-based classifier. Experimental results in 15-class scene data set have shown that the HIHT algorithm outperforms other unsupervised sparse feature learning algorithms in terms of sparsity and entropy. Meanwhile, the ELM-based scene recognition method outperforms the other state-of-the art methods in terms of recognition accuracy.