The importance of Term Weighting in semantic understanding of text: A review of techniques

被引:8
作者
Rathi, R. N. [1 ]
Mustafi, A. [1 ]
机构
[1] Birla Inst Technol, Mesra, India
关键词
Term weighting; Word embedding; Term weighting techniques; LANGUAGE; FREQUENCY; CLASSIFICATION; EXTRACTION; SCHEMES; MODEL; LAW;
D O I
10.1007/s11042-022-12538-3
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we review a wide spectrum of techniques which have been proposed in literature to enable acceptable recognition of language and text by machines. We discuss many techniques which have been proposed by researchers in the field of term weighting and explore the mathematical foundations of these methods. Term weighting schemes have broadly been classified as supervised and statistical methods and we present numerous examples from both categories to highlight the difference in approaches between the two broad categories. We pay particular attention to the Vector Space Model and its variants which form the basis of many of the other methods which have been discussed in the paper.
引用
收藏
页码:9761 / 9783
页数:23
相关论文
共 84 条
[51]  
Polettini N., 2004, ENTROPY, V34, P1
[52]  
Quinlan J. R., 1986, Machine Learning, V1, P81, DOI 10.1007/BF00116251
[53]  
Le Q, 2014, PR MACH LEARN RES, V32, P1188
[54]  
Ramos Juan, 2003, Proceedings of the first instructional conference on machine learning
[55]  
Robertson, 2009, PROBABILISTIC RELEVA, P98
[56]  
Robertson S. E., 1994, SIGIR '94. Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, P232
[57]   RELEVANCE WEIGHTING OF SEARCH TERMS [J].
ROBERTSON, SE ;
SPARCK-JONES, K .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1976, 27 (03) :129-146
[58]  
Robertson Stephen E, 1995, Okapi at TREC-3, V109, P109
[59]  
Rong X, 2014, ARXIV 14112738, V31, P1103
[60]   Modified frequency-based term weighting schemes for text classification [J].
Sabbah, Thabit ;
Selamat, Ali ;
Selamat, Md Hafiz ;
Al-Anzi, Fawaz S. ;
Viedma, Enrique Herrera ;
Krejcar, Ondrej ;
Fujita, Hamido .
APPLIED SOFT COMPUTING, 2017, 58 :193-206