A simple probabilistic explanation of term frequency-inverse document frequency (tf-idf) heuristic (and variations motivated by this explanation)

Havrlant Lukas; Kreinovich Vladik<sup>*</sup>

doi:10.1080/03081079.2017.1291635

登录

免费注册

赞收藏引用

科研之友

微信

新浪微博

Facebook

分享链接

A simple probabilistic explanation of term frequency-inverse document frequency (tf-idf) heuristic (and variations motivated by this explanation)

作者：Havrlant Lukas; Kreinovich Vladik^*

来源：International Journal of General Systems, 2017, 46(1): 27-36.

DOI：10.1080/03081079.2017.1291635

摘要

In document analysis, an important task is to automatically find keywords which best describe the subject of the One of the most widely used techniques for keyword detection is a technique based on the term frequency-inverse document frequency (tf-idf) heuristic. This techniques has some explanations, but these explanations are somewhat too complex to be fully convincing. In this paper, we provide a simple probabilistic explanation for the tf-idf heuristic. We also show that the ideas behind explanation can help us come up with more complex formulas which will hopefully lead to a more adequate detection of keywords.

出版日期2017

全文

访问全文

收藏分享被引(88) 浏览

更新时间：2024-04-23 23:52

相似论文
引用论文
参考文献

产品服务

科研之友科研之友机构版科创云

站内浏览

科研成果科研人员科研机构

服务支持

帮助中心隐私政策服务条款

联系方式

在线客服：【立即咨询】客户热线：400-1616-289 电子邮箱：support@scholarmate.com

微信公众号