Optimized TF-IDF Algorithm with the Adaptive Weight of Position of Word

被引:0
作者
Chen, Jie [1 ]
Chen, Cai [1 ]
Liang, Yi [1 ]
机构
[1] Beijing Univ Technol Beijing, Fac Informat Technol, Beijing, Peoples R China
来源
PROCEEDINGS OF THE 2016 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INDUSTRIAL ENGINEERING (AIIE 2016) | 2016年 / 133卷
关键词
text feature extraction; adaptive weight; weight of position; Term Frequency-Inverse Document Frequency(TF-IDF);
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The classical TF-IDF algorithm only considers the weight of the term frequency and the inverse document frequency, without considering the weights of other feature of word. After the author analyzing summary of Chinese expression habits, an adaptive weight of position of word algorithm based on TF-IDF is proposed in this paper, which can be called TF-IDF-AP algorithm. The TF-IDF-AP algorithm can dynamically determine the weight of position of word according to the position of word. This paper introduced the vector space model (VSM) and designed comparative experiment under the scene of Chinese document clustering. The results show that the F-measure of TF-IDF-AP algorithm has been improved by 12.9% comparing with the classical TF-IDF algorithm.
引用
收藏
页码:114 / 117
页数:4
相关论文
共 14 条
  • [1] [Anonymous], 2014, Data mining with decision trees: theory and applications
  • [2] Anoual H., 2010, P 5 INT S 1 5 COMM M, P1, DOI DOI 10.1109/ISVC.2010.5656284
  • [3] A text similarity measurement combining word semantic information with TF-IDF method
    Huang C.-H.
    Yin J.
    Hou F.
    [J]. Jisuanji Xuebao/Chinese Journal of Computers, 2011, 34 (05): : 856 - 864
  • [4] Li Xue-ming, 2012, Computer Engineering, V38, P37, DOI 10.3969/j.issn.1000-3428.2012.08.013
  • [5] Mikolov T., 2003, ARXIV13013781
  • [6] Improving the high order nonlinearity lower bound for Boolean functions with given algebraic immunity
    Rizomiliotis, Panagiotis
    [J]. DISCRETE APPLIED MATHEMATICS, 2010, 158 (18) : 2049 - 2055
  • [7] Russell S J, 2009, ARTIF INTELL, V15, P217
  • [8] EXTENDED BOOLEAN INFORMATION-RETRIEVAL
    SALTON, G
    FOX, EA
    WU, H
    [J]. COMMUNICATIONS OF THE ACM, 1983, 26 (11) : 1022 - 1036
  • [9] Salton G, 1990, OPERATOR ALGEBRAS UN, P48
  • [10] Sang S J, 2011, COMPUTER KNOWLEDGE T