摘要

In this paper, we present a method for segmentation of document page flow applied to heterogeneous real bank documents. The approach is based on the content of images and it also incorporates font based features inside the documents. Our method involves a bag of visual words (BoVW) model on the designed image based feature descriptors and a novel approach to combine the consecutive pages of a document into a single feature vector that represents the transition between these pages. The transitions here could be represented by one of the two different classes: continuity of the same document or beginning of a new Using the transition feature vectors, we utilize three different binary classifiers to make predictions on the relationship between consecutive pages. Our initial results demonstrate that the proposed method can exhibit promising performance for document flow segmentation at this stage.

  • 出版日期2015