Document segmentation and classification into musical scores and text

Pedersoli Fabrizio<sup>*</sup>; Tzanetakis George

doi:10.1007/s10032-016-0271-5

摘要

A new algorithm for segmenting documents into regions containing musical scores and text is proposed. Such segmentation is a required step prior to applying optical character recognition and optical music recognition on scanned pages that contain both music notation and text. Our segmentation technique is based on the bag-of-visual-words representation followed by random block voting (RBV) in order to detect the bounding boxes containing the musical score and text within a document image. The RBV procedure consists of extracting a fixed number of blocks whose position and size are sampled from a discrete uniform distribution that "over"-covers the input image. Each block is automatically classified as either coming from musical score or text and votes with a particular posterior probability of classification in its spatial domain. An initial coarse segmentation is obtained by summarizing all the votes in a single image. Subsequently, the final segmentation is obtained by subdividing the image in microblocks and classifying them using a N-nearest neighbor classifier which is trained using the coarse segmentation. We demonstrate the potential of the proposed method by experiments on two different datasets. One is on a challenging dataset of images collected and artificially combined and manipulated for this project. The other is a music dataset obtained by the scanning of two music books. The results are reported using precision/recall metrics of the overlapping area with respect to the ground truth. The proposed system achieves an overall averaged F-measure of 85 %. The complete source code package and associated data are available at under the FreeBSD license to support reproducibility.

出版日期2016-12

全文

访问全文

收藏分享被引(6) 浏览

更新时间：2024-04-05 13:25

Document segmentation and classification into musical scores and text

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友