Adaptive Script-Independent Text Line Extraction

作者:Ziaratban Majid*; Faez Karim
来源:IEICE Transactions on Information and Systems, 2011, E94D(4): 866-877.
DOI:10.1587/transinf.E94.D.866

摘要

In this paper, an adaptive block-based text line extraction algorithm is proposed. Three global and two local parameters are defined to adapt the method to various handwritings in different languages. A document image is segmented into several overlapping blocks. The skew of each block is estimated. Text block is de-skewed by using the estimated skew angle. Text regions are detected in the de-skewed text block. A number of data points are extracted from the detected text regions in each block. These data points are used to estimate the paths of text lines. By thinning the background of the image including text line paths, text line boundaries or separators are estimated. Furthermore, an algorithm is proposed to assign to the extracted text lines the connected components which have intersections with the estimated separators. Extensive experiments on different standard datasets in various languages demonstrate that the proposed algorithm outperforms previous methods.

  • 出版日期2011-4