A Novel TF-IDF Weighting Scheme for Effective Ranking

被引:0
|
作者
Paik, Jiaul H. [1 ]
机构
[1] Indian Stat Inst, Kolkata, India
来源
SIGIR'13: THE PROCEEDINGS OF THE 36TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH & DEVELOPMENT IN INFORMATION RETRIEVAL | 2013年
关键词
Document ranking; Retrieval model; Term weighting; INFORMATION-RETRIEVAL; MODEL;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Term weighting schemes are central to the study of information retrieval systems. This article proposes a novel TF-IDF term weighting scheme that employs two different within document term frequency normalizations to capture two different aspects of term saliency. One component of the term frequency is effective for short queries, while the other performs better on long queries. The final weight is then measured by taking a weighted combination of these components, which is determined on the basis of the length of the corresponding query. Experiments conducted on a large number of TREC news and web collections demonstrate that the proposed scheme almost always outperforms five state of the art retrieval models with remarkable significance and consistency. The experimental results also show that the proposed model achieves significantly better precision than the existing models.
引用
收藏
页码:343 / 352
页数:10
相关论文
共 50 条
  • [41] Comments Mining With TF-IDF: The Inherent Bias and Its Removal
    Yahav, Inbal
    Shehory, Onn
    Schwartz, David
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2019, 31 (03) : 437 - 450
  • [42] TF-IDF based loop closure detection algorithm for SLAM
    Dong R.
    Liu C.
    Yang G.
    Dongnan Daxue Xuebao (Ziran Kexue Ban)/Journal of Southeast University (Natural Science Edition), 2019, 49 (02): : 251 - 258
  • [43] Detection of DGA-Generated Domain Names with TF-IDF
    Vranken, Harald
    Alizadeh, Hassan
    ELECTRONICS, 2022, 11 (03)
  • [44] Internet Articles Classification by Industry Types Based on TF-IDF
    Cha, Jonghun
    Lee, Jee-Hyong
    ADVANCES IN COMPUTER SCIENCE AND UBIQUITOUS COMPUTING, 2018, 474 : 1121 - 1125
  • [45] Optimized TF-IDF Algorithm with the Adaptive Weight of Position of Word
    Chen, Jie
    Chen, Cai
    Liang, Yi
    PROCEEDINGS OF THE 2016 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INDUSTRIAL ENGINEERING (AIIE 2016), 2016, 133 : 114 - 117
  • [46] Estimating the selectivity of tf-idf based cosine similarity predicates
    Tata, Sandeep
    Patel, Jignesh M.
    SIGMOD RECORD, 2007, 36 (02) : 7 - 12
  • [47] Application of TF-IDF factor in the semantic analysis of a documentary collection
    Vuotto, Andres
    Bogetti, Celeste
    Fernandez, Gladys
    BIBLIOS-REVISTA DE BIBLIOTECOLOGIA Y CIENCIAS DE LA INFORMACION, 2015, (60): : 1 - 13
  • [48] A Method of Text Dimension Reduction Based on CHI and TF-IDF
    Tang, HaiBo
    Zhou, Lei
    Xu Chengjie
    Zhu, Quanyin
    PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON MECHATRONICS, MATERIALS, CHEMISTRY AND COMPUTER ENGINEERING 2015 (ICMMCCE 2015), 2015, 39 : 1854 - 1857
  • [49] Embedding User Behavioral Aspect in TF-IDF like Representation
    Pradhan, Ligaj
    Zhang, Chengcui
    Bethard, Steven
    Chen, Xin
    IEEE 1ST CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2018), 2018, : 262 - 267
  • [50] 基于Hadoop框架的TF-IDF算法改进
    李彬
    微型机与应用, 2012, 31 (07) : 14 - 16