Analysis of TF-IDF Model and its Variant for Document Retrieval

被引:20
作者
Mishra, Apra [1 ]
Vishwakarma, Santosh [1 ]
机构
[1] GGITS, Dept CSE, Jabalpur, India
来源
2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (CICN) | 2015年
关键词
Information Retrieval; IR Models; Weighting Schemes; TF-IDF; Retrieval Effectiveness; Precision;
D O I
10.1109/CICN.2015.157
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
An Information Retrieval System is a system that is capable of storage, retrieval, and maintenance of an Information. In this context Information can be composed of text (including numeric and date data), images, audio, video and other multi-media objects. The TF-IDF weight is a statistical measure used to evaluate how important a word is to a document in a collection or corpus. There exist various models for weighting terms of corpus documents and query terms. This work is carried out to analyze and evaluate the retrieval effectiveness of vector - space model while using the new data set of FIRE 2011. The experiments were performed with TF-IDF and its variants. For all experiments and evaluation the open search engine, Terrier 3.5 was used. Our result shows that TF-IDF model gives the highest precision values with the new corpus dataset.
引用
收藏
页码:772 / 776
页数:5
相关论文
共 8 条
[1]  
AKIKO A, 2003, INFORM PROCESSING MA, V39, P45
[2]  
[Anonymous], INTRO INFORM RETRIEV
[3]  
BERRY MW, 1995, USING LINEAR ALGEBRA
[4]   A probabilistic justification for using tf × idf term weighting in information retrieval [J].
Hiemstra D. .
International Journal on Digital Libraries, 2000, 3 (2) :131-139
[5]  
Jangid Chandra Shekhar, 2014, AD HOC RETRIEVAL FIR
[6]  
Ramos Juan, 2003, ICML
[7]   TERM-WEIGHTING APPROACHES IN AUTOMATIC TEXT RETRIEVAL [J].
SALTON, G ;
BUCKLEY, C .
INFORMATION PROCESSING & MANAGEMENT, 1988, 24 (05) :513-523
[8]   Interpreting TF-IDF term weights as making relevance decisions [J].
Wu, Ho Chung ;
Luk, Robert Wing Pong ;
Wong, Kam Fai ;
Kwok, Kui Lam .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2008, 26 (03)