AN EFFECTIVE FUZZY CLUSTERING ALGORITHM FOR WEB DOCUMENT CLASSIFICATION: A CASE STUDY IN CULTURAL CONTENT MINING

Tsekouras George E<sup>*</sup>; Gavalas Damianos

doi:10.1142/S021819401350023X

摘要

This article presents a novel crawling and clustering method for extracting and processing cultural data from the web in a fully automated fashion. Our architecture relies upon a focused web crawler to download web documents relevant to culture. The focused crawler is a web crawler that searches and processes only those web pages that are relevant to a particular topic. After downloading the pages, we extract from each document a number of words for each thematic cultural area, filtering the documents with non-cultural content; we then create multidimensional document vectors comprising the most frequent cultural term occurrences. We calculate the dissimilarity between the cultural-related document vectors and for each cultural theme, we use cluster analysis to partition the documents into a number of clusters. Our approach is validated via a proof-of-concept application which analyzes hundreds of web pages spanning different cultural thematic areas.

出版日期2013-8

全文

访问全文

收藏分享被引浏览

更新时间：2019-03-28 08:07

AN EFFECTIVE FUZZY CLUSTERING ALGORITHM FOR WEB DOCUMENT CLASSIFICATION: A CASE STUDY IN CULTURAL CONTENT MINING

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友