摘要

In this paper, we present a novel genetic algorithm, called Multiple Search Genetic Algorithm (MSGA), for clustering the web pages returned by a search engine and providing a taxonomy of those web pages to the user. MSGA uses two different kinds of chromosomes (conservative and explorer) to improve the search capability as well as enhance the clustering result. The conservative chromosomes keep the better solutions found at each generation while the explorer chromosomes are used to increase the search directions to avoid falling into local minima. The proposed method can find the optimal solutions quickly via a multiple search strategy. Our simulation result shows that the proposed algorithm outperforms other algorithms. We also present a clustering search engine system, called Document Clustering Search Engine (DCSE). It is the DCSE that takes the responsibility for spawning agents for collecting the web pages from the meta-search engine and computing the similarity between the web pages. The user of the system will receive information that has been computed and sorted and web links that are ranked according to their relevance. The end result is that the amount of time required to filter out irrelevant information is highly reduced.