Research of Text Classification Based on Improved TF-IDF Algorithm

被引:0
作者
Liu, Cai-zhi [1 ]
Sheng, Yan-xiu [1 ]
Wei, Zhi-qiang [1 ]
Yang, Yong-Quan [1 ]
机构
[1] Ocean Univ China, Coll Informat Sci & Engn, Qingdao, Peoples R China
来源
2018 IEEE INTERNATIONAL CONFERENCE OF INTELLIGENT ROBOTICS AND CONTROL ENGINEERING (IRCE) | 2018年
关键词
text classification; text representation; TF-IDF; Word2vec model;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, with the rapid development of Internet Technology, text data is growing rapidly every day. Users need to filter out the information they need from a large amount of text. Therefore, automatic text classification technology can help users find information. In order to address problems, such as ignoring contextual semantic links and different vocabulary importance in traditional text classification techniques, this paper presents a vector representation of feature words based on the deep learning tool Word2vec, and the weight of the feature words is calculated by the improved TF-IDF algorithm. By multiplying the weight of the word and the word vector, the vector representation of the word is realized. Finally, each text is represented by accumulating all the word vectors. Thus, text classification is carried out.
引用
收藏
页码:218 / 222
页数:5
相关论文
共 6 条
  • [1] Lan M., 2005, Posters Proc. 14th International World Wide Web Conference, P1032
  • [2] Lei Zhu, 2017, RES TEXT CLASSIFICAT
  • [3] Mikolov T., 2013, COMPUTER SCI, P28
  • [4] Ren Yao-peng, 2010, Computer Engineering and Design, V31, P2381
  • [5] Salton G, 1974, OPERATOR ALGEBRAS UN, P48
  • [6] [张玉芳 Zhang Yufang], 2006, [计算机工程, Computer Engineering], V32, P76