The importance of Term Weighting in semantic understanding of text: A review of techniques

被引:8
作者
Rathi, R. N. [1 ]
Mustafi, A. [1 ]
机构
[1] Birla Inst Technol, Mesra, India
关键词
Term weighting; Word embedding; Term weighting techniques; LANGUAGE; FREQUENCY; CLASSIFICATION; EXTRACTION; SCHEMES; MODEL; LAW;
D O I
10.1007/s11042-022-12538-3
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we review a wide spectrum of techniques which have been proposed in literature to enable acceptable recognition of language and text by machines. We discuss many techniques which have been proposed by researchers in the field of term weighting and explore the mathematical foundations of these methods. Term weighting schemes have broadly been classified as supervised and statistical methods and we present numerous examples from both categories to highlight the difference in approaches between the two broad categories. We pay particular attention to the Vector Space Model and its variants which form the basis of many of the other methods which have been discussed in the paper.
引用
收藏
页码:9761 / 9783
页数:23
相关论文
共 84 条
[1]  
Alaya, 2017, ARXIV 170308619, V39, P4760
[2]   Document clustering of scientific texts using citation contexts [J].
Aljaber, Bader ;
Stokes, Nicola ;
Bailey, James ;
Pei, Jian .
INFORMATION RETRIEVAL, 2010, 13 (02) :101-131
[3]   Analytical evaluation of term weighting schemes for text categorization [J].
Altincay, Hakan ;
Erenel, Zafer .
PATTERN RECOGNITION LETTERS, 2010, 31 (11) :1310-1323
[4]  
[Anonymous], 2005, AISTATS 2005 P 10 IN
[5]  
Aquino GO, 2015, J COMPUT SCI TECHNOL, V15, P55
[6]   Comparison of term frequency and document frequency based feature selection metrics in text categorization [J].
Azam, Nouman ;
Yao, JingTao .
EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (05) :4760-4768
[7]  
Bafna P, 2016, 2016 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, AND OPTIMIZATION TECHNIQUES (ICEEOT), P61, DOI 10.1109/ICEEOT.2016.7754750
[8]  
Baldwin T, 2016, ARXIV 160705368, V20, P723
[9]  
Bengio Y, 2001, ADV NEUR IN, V13, P932
[10]   Adaptive importance sampling to accelerate training of a neural probabilistic language model [J].
Bengio, Yoshua ;
Senecal, Jean-Sebastien .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2008, 19 (04) :713-722