A knowledge-based recognition system for historical Mongolian documents

Su, Xiangdong; Gao, Guanglai<sup>*</sup>; Wei, Hongxi; Bao, Feilong

doi:10.1007/s10032-016-0267-1

摘要

This paper proposes a knowledge-based system to recognize historical Mongolian documents in which the words exhibit remarkable variation and character overlapping. According to the characteristics of Mongolian word formation, the system combines a holistic scheme and a segmentation-based scheme for word recognition. Several types of words and isolated suffixes that cannot be segmented into glyph-units or do not require segmentation are recognized using the holistic scheme. The remaining words are recognized using the segmentation-based scheme, which is the focus of this paper. We exploit the knowledge of the glyph characteristics to segment words into glyph-units in the segmentation-based scheme. Convolutional neural networks are employed not only for word recognition in the holistic scheme, but also for glyph-unit recognition in the segmentation-based scheme. Based on the analysis of recognition errors in the segmentation-based scheme, the system is enhanced by integrating three strategies into glyph-unit recognition. These strategies involve incorporating baseline information, glyph-unit grouping, and recognizing under-segmented and over-segmented fragments. The proposed system achieves 80.86 % word accuracy on the Mongolian Kanjur test samples.

出版日期2016-9
单位内蒙古大学

全文

访问全文

收藏分享被引(13) 浏览

更新时间：2024-04-20 11:17

A knowledge-based recognition system for historical Mongolian documents

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友