A categorization system for handwritten documents

作者:Paquet Thierry; Heutte Laurent; Koch Guillaume; Chatelain Clement*
来源:International Journal on Document Analysis and Recognition, 2012, 15(4): 315-330.
DOI:10.1007/s10032-011-0173-5

摘要

This paper presents a complete system able to categorize handwritten documents, i.e. to classify documents according to their topic. The categorization approach is based on the detection of some discriminative keywords prior to the use of the well-known tf-idf representation for document categorization. Two keyword extraction strategies are explored. The first one proceeds to the recognition of the whole However, the performance of this strategy strongly decreases when the lexicon size increases. The second strategy only extracts the discriminative keywords in the handwritten documents. This information extraction strategy relies on the integration of a rejection model (or anti-lexicon model) in the recognition system. Experiments have been carried out on an unconstrained handwritten document database coming from an industrial application concerning the processing of incoming mails. Results show that the discriminative keyword extraction system leads to better recall/precision tradeoffs than the full recognition strategy. The keyword extraction strategy also outperforms the full recognition strategy for the categorization task.

  • 出版日期2012-12