An identification framework for print-scan books in a large database

Lee Sang Hoon; Kim Jongyoo; Lee Sanghoon

doi:10.1016/j.ins.2017.02.001

摘要

In this paper, we propose an identification framework to determine copyright infringement in the form of illegally distributed print-scan books in a large database. The framework contains following main stages: image pre-processing, feature vector extraction, clustering, and indexing, and hierarchical search. The image pre-processing stage provides methods for alleviating the distortions induced by a scanner or digital camera. From the preprocessed image, we propose to generate feature vectors that are robust against distortion. To enhance the clustering performance in a large database, we use a clustering method based on the parallel-distributed computing of Hadobp MapReduce. In addition, to store the clustered feature vectors efficiently and minimize the searching time, we investigate an inverted index for fedture vectors. Finally, we implement a two-step hierarchical search to achieve fast and accurate on-line identification. In a simulation, the proposed identification framework shows accurate and robust in the presence of print-scan distortions. The processing time analysis in a parallel computing environment gives extensibility of the proposed framework to massive data. In the matching performance analysis, we empirically and theoretically find that in terms of query time, the optimal number of clusters scales with O(root N) for N print-scan books.

出版日期2017-8

全文

访问全文

收藏分享被引(4) 浏览

更新时间：2021-01-22 02:15

An identification framework for print-scan books in a large database

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友