Document segmentation and classification into musical scores and text

作者:Pedersoli Fabrizio*; Tzanetakis George
来源:International Journal on Document Analysis and Recognition, 2016, 19(4): 289-304.
DOI:10.1007/s10032-016-0271-5

摘要

A new algorithm for segmenting documents into regions containing musical scores and text is proposed. Such segmentation is a required step prior to applying optical character recognition and optical music recognition on scanned pages that contain both music notation and text. Our segmentation technique is based on the bag-of-visual-words representation followed by random block voting (RBV) in order to detect the bounding boxes containing the musical score and text within a document image. The RBV procedure consists of extracting a fixed number of blocks whose position and size are sampled from a discrete uniform distribution that "over"-covers the input image. Each block is automatically classified as either coming from musical score or text and votes with a particular posterior probability of classification in its spatial domain. An initial coarse segmentation is obtained by summarizing all the votes in a single image. Subsequently, the final segmentation is obtained by subdividing the image in microblocks and classifying them using a N-nearest neighbor classifier which is trained using the coarse segmentation. We demonstrate the potential of the proposed method by experiments on two different datasets. One is on a challenging dataset of images collected and artificially combined and manipulated for this project. The other is a music dataset obtained by the scanning of two music books. The results are reported using precision/recall metrics of the overlapping area with respect to the ground truth. The proposed system achieves an overall averaged F-measure of 85 %. The complete source code package and associated data are available at under the FreeBSD license to support reproducibility.

  • 出版日期2016-12