A method of filtering Chinese webpage

Liu Jie<sup>*</sup>; Luo Li Ming; Wu Yu Hang; Ma Yi Fang; Cai Hong Mei

摘要

In view of the adverse effects of a variety of useless webpages, a method based on the Bayesian classification algorithm and domain ontology was proposed to filter the unwanted Chinese webpages. The method firstly calculated the weight of domain feature words according to the positive and negative domain webpages, established domain feature lexicon and constructed the domain ontology, got the weights library of ontology elements according to the positive domain webpages; then acquired the candidates by using the Bayesian classification algorithm; lastly semantically analyzed and filtered the candidates according to the domain ontology. This method can not only distinguish the positive and negative webpages which are in the same field but also get a good performance on the real-time of webpages filtering. The experiments on huge numbers of game-related webpages have shown promising results. The precision and recall are more than 98%, the average time of semantically analyzing one game webpage is 1~2 s, it has little effect on user browsing webpages.

出版日期2014
单位首都师范大学

全文

访问全文

收藏分享被引浏览

更新时间：2018-08-03 20:54

A method of filtering Chinese webpage

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友