A Generic Web News Extraction Approach

Dong Yongquan<sup>*</sup>; Li Qingzhong; Yan Zhongmin; Ding Yanhui

摘要

With the development of the Internet, the Web is becoming the largest data repository ever available in the history of humankind. Major efforts have been made in order to provide efficient access to relevant information within the web pages. Most previous works rely on the template of the web sites. When information like news needs to be extracted from different sites, it must create a template for every site which will spend much time and huge cost. In this paper, we present a generic news extraction method to easily identify news content based on a set of combined heuristics and to exact every part of news according to a predefined schema. Experimental results indicate that our approach is effective in extracting news across websites.

出版日期2008
单位山东大学

收藏分享被引浏览

更新时间：2018-08-02 19:48

A Generic Web News Extraction Approach

摘要

产品服务

站内浏览

服务支持

联系方式

科研之友