摘要

Reconstruction of Cross-Cut Shredded Text Documents (RCCSTD) plays an important role in both forensics and archeology. It is a special case of the square jigsaw puzzle problem and has attracted the attention of many researchers. In the light of the low accuracy of existing RCCSTD solutions, especially regarding row splicing, this paper proposes a high accuracy splicing solution by using both a combination strategy and a divide-and-conquer strategy. Unlike other approaches based on the Swarm Intelligence Algorithm, where the results and splicing accuracy are bound up with the defined cost function and the number of fragments, in this case a clustering algorithm was used to transform a single RCCSTD problem into several Reconstruction of Strip Shredded Text Document (RSSTD) problems. The dual combination and divide-and-conquer strategies proposed in this paper are designed to improve the splicing accuracy in a row and make the algorithm more stable as the number of fragments in a row increases. Experiments were carried out on 10 text documents (5 Chinese and 5 English), which were shredded into ten patterns. The returned accuracy measures were over 0.95 for the Chinese documents and over 0.85 for the English ones, across all patterns. A comparison is made between our approach and another recently proposed solution, and we conclude that our approach gives both higher splicing accuracy and greater stability regardless of the number of fragments in a row.