摘要

On the basis of image processing technology and characteristics of web pages, a new web segmentation method - iterated shrinking and dividing is proposed in this paper. Dividing conditions and concept of dividing zone are introduced, based on which web page image is divided into visually consentaneous sub-images by shrinking and splitting iteratively. First, the web page is saved as image that is preprocessed by edge detection algorithm such as Canny. Then dividing zones are detected and the web image is segmented repeatedly until all blocks are indivisible. This method can be used to analyse the web pages such as detecting similar visual layout. Experiments show that the algorithm is suitable for web page segmentation, and does well in expansibility and performance.

全文