A semantic based Web page classification strategy using multi-layered domain ontology

作者:Saleh Ahmed I; Al Rahmawy Mohammed F; Abulwafa Arwa E*
来源:World Wide Web-internet and Web Information Systems, 2017, 20(5): 939-993.
DOI:10.1007/s11280-016-0415-z

摘要

World Wide Web is a continuously growing giant, and within the next few years, Web contents will surely increase tremendously. Hence, there is a great requirement to have algorithms that could accurately classify Web pages. Automatic Web page classification is significantly different from traditional text classification because of the presence of additional information, provided by the HTML structure. Recently, several techniques have been arisen from combinations of artificial intelligence and statistical approaches. However, it is not a simple matter to find an optimal classification technique for Web pages. This paper introduces a novel strategy for vertical Web page classification, which is called Classification using Multi-layered Domain Ontology (CMDO). It employs several Web mining techniques, and depends mainly on proposed multi-layered domain ontology. In order to promote the classification accuracy, CMDO implies a distiller to reject pages related to other domains. CMDO also employs a novel classification technique, which is called Graph Based Classification (GBC). The proposed GBC has pioneering features that other techniques do not have, such as outlier rejection and pruning. Experimental results have shown that CMDO outperforms recent techniques as it introduces better precision, recall, and classification accuracy.

  • 出版日期2017-9