摘要

In view of the adverse effects of a variety of useless webpages, a method based on the Bayesian classification algorithm and domain ontology was proposed to filter the unwanted Chinese webpages. The method firstly calculated the weight of domain feature words according to the positive and negative domain webpages, established domain feature lexicon and constructed the domain ontology, got the weights library of ontology elements according to the positive domain webpages; then acquired the candidates by using the Bayesian classification algorithm; lastly semantically analyzed and filtered the candidates according to the domain ontology. This method can not only distinguish the positive and negative webpages which are in the same field but also get a good performance on the real-time of webpages filtering. The experiments on huge numbers of game-related webpages have shown promising results. The precision and recall are more than 98%, the average time of semantically analyzing one game webpage is 1~2 s, it has little effect on user browsing webpages.

全文