摘要

Using contextual information for scene labeling has gained substantial attention in the fields of image processing and computer vision. In this paper, a fusion model using flexible segmentation graph (FSG) is presented to explore multi-scale context for scene labeling problem. Given a family of segmentations, the representation of FSG is established based on the spatial relationship of these segmentations. In the scenario of FSG, the labeling inference process is formulated as a contextual fusion model, trained from the discriminative classifiers. Compared to previous approaches, which usually employ Conditional Random Fields (CRFs) or hierarchical models to explore contextual information, our FSG representation is flexible and efficient without hierarchical constraint, allowing us to capture a wide variety of visual context for the task of scene labeling. Our approach yields state-of-the-art results on the MSRC dataset (21 classes) and the LHI dataset (15 classes), and near-record results on the SIFT Flow dataset (33 classes) and PASCAL VOC segmentation dataset (20 classes), while producing a 320 x 240 scene labeling in less than a second. A remarkable fact is that our approach also outperforms recent CNN-based methods.