摘要

This paper presents an effective and efficient approach to extracting scene text from images. The approach first extracts the edge information by the local maximum difference filter (LMDF), and at the same time a given image is decomposed into a group of image layers by color clustering. Then, through combining the characteristics of geometric structure and spatial distribution of scene text with the edge map, the candidate text image layers are identified. Further, in character level, the candidate text connected components are identified using a set of heuristic rules. Finally, the graph-cut computation is utilized to identify and localize text lines with arbitrary directions. In the proposed approach, the segmentation of text pixels is efficiently embedded into the computation of text localization as a part. The comprehensive evaluation experiments are performed on four challenging datasets (ICDAR 2003, ICDAR 2011, MSRA-TD500 and The Street View Text (SVT)) to verify the validation of our approach. In the comparison experiments with many state-of-the-art methods, the results demonstrate that our approach can effectively handle scene text with diverse fonts, sizes, colors, different languages, as well as arbitrary orientations, and it is robust to the influence of illumination change.