A focused crawler combinatory link and content model based on T-Graph principles

Seyfi Ali<sup>*</sup>; Patel Ahmed

doi:10.1016/j.csi.2015.07.001

登录

免费注册

赞收藏引用

科研之友

微信

新浪微博

Facebook

分享链接

A focused crawler combinatory link and content model based on T-Graph principles

作者：Seyfi Ali^*; Patel Ahmed

来源：Computer Standards & Interfaces, 2016, 43: 1-11.

DOI：10.1016/j.csi.2015.07.001

摘要

The two significant tasks of a focused Web crawler are finding relevant documents and prioritizing them for effective download. For the first task, we propose an algorithm to fetch and analyze the most effective HTML elements of the page to predict and elicit the topical focus of each unvisited page with high accuracy. For the second task, we propose a scoring function of the relevant URLs through the use of T-Graph to prioritize each unvisited link. Thus, our novel method uniquely combines these approaches, giving precision and recall values close to 50%, which indicate the significance of the proposed architecture.

出版日期2016-1

全文

访问全文

收藏分享被引(1) 浏览

更新时间：2019-02-19 01:27

相似论文
引用论文
参考文献

产品服务

科研之友科研之友机构版科创云

站内浏览

科研成果科研人员科研机构

服务支持

帮助中心隐私政策服务条款

联系方式

在线客服：【立即咨询】客户热线：400-1616-289 电子邮箱：support@scholarmate.com

微信公众号