Automatic line and word segmentation applied to densely line-skewed historical handwritten document images

Sanchez A<sup>*</sup>; Mello C A B; Suarez P D; Lopes A

doi:10.3233/ICA-2011-0365

摘要

There exists a high interest in the digitization of handwriting historical documents, in the quest to preserve the cultural heritage of nations. In general, these manuscript images present new segmentation difficulties with respect to non-historical documents. The problems come from features such as paper aging, faded ink, back-to-front ink superposition or variable line skew, among others. This paper presents a methodology for detecting and extracting the text lines of images from complex handwritten historical documents. The proposed line segmentation algorithm is based on computing a binary transition map of the document and then extracting and refining the corresponding line regions through skeletonization. To improve the accuracy of line segmentation, a new graph-based splitting method to separate the touching lines is introduced. Once text lines have been segmented, we propose an algorithm based on mathematical morphology operators and position heuristics, to extract the component words on each text line. The robustness and accuracy of our approach was tested on digitalized pages of two complex historical document datasets: the correspondence of Nabuco and the family papers of Graham Bell. We have also successfully compared our algorithms to other general line and word segmentation algorithms presented at the ICDAR 2007 Handwriting Segmentation Contest.

出版日期2011

全文

访问全文

收藏分享被引(9) 浏览

更新时间：2018-02-09 21:36

Automatic line and word segmentation applied to densely line-skewed historical handwritten document images

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友