An Online Word Vector Generation Method Based on Incremental Huffman Tree Merging

被引:2
作者
Qian, Kui [1 ]
Tian, Lei [1 ]
Wen, Xiulan [1 ]
Song, Zhenzhong [2 ]
机构
[1] Nanjing Inst Technol, Sch Automat, 1 Hongjing Ave, Nanjing, Jiangsu, Peoples R China
[2] Shanghai Electromech Engn Inst, 3888 Yuanjiang Rd, Shanghai, Peoples R China
来源
TEHNICKI VJESNIK-TECHNICAL GAZETTE | 2021年 / 28卷 / 01期
基金
中国国家自然科学基金;
关键词
Huffman tree; hierarchical softmax; incremental learning; neural network; online word vector;
D O I
10.17559/TV-20190506102016
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Aiming at high real-time performance processing requirements for large amounts of online text data in natural language processing applications, an online word vector model generation method based on incremental Huffman tree merging is proposed. Maintaining the inherited word Huffman tree in existing word vector model unchanged, a new Huffman tree of incoming words is constructed and ensures that there is no leaf node identical to the inherited Huffman tree. Then the Huffman tree is updated by a method of node merging. Thus based on the existing word vector model, each word still has a unique encoding for the calculation of the hierarchical softmax model. Finally, the generation of incremental word vector model is realized by using neural network on the basis of hierarchical softmax model. The experimental results show that the method could realize the word vector model generation online based on incremental learning with faster time and better performance.
引用
收藏
页码:52 / 57
页数:6
相关论文
共 20 条
[1]  
[Anonymous], 2015, P 2015 C EMP METH NA
[2]   Annual flow duration curve model for ungauged basins [J].
Burgan, Halil Ibrahim ;
Aksoy, Hafzullah .
HYDROLOGY RESEARCH, 2018, 49 (05) :1684-1695
[3]  
Chiu J.P., 2016, Trans. Assoc. Comput. Linguist., V4, P357, DOI 10.1162/tacl_a_00104
[4]   Emerging Trends Word2Vec [J].
Church, Kenneth Ward .
NATURAL LANGUAGE ENGINEERING, 2017, 23 (01) :155-162
[5]  
Fauzi M.A., 2018, INT J ELECT COMPUTER, V7, P244, DOI DOI 10.11591/IJECE.V9I1.PP525-530
[6]  
Gutmann MU, 2012, J MACH LEARN RES, V13, P307
[7]   Parallelizing Word2Vec in Shared and Distributed Memory [J].
Ji, Shihao ;
Satish, Nadathur ;
Li, Sheng ;
Dubey, Pradeep K. .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2019, 30 (09) :2090-2100
[8]   Association between Body Iron Status and Leukocyte Telomere Length, a Biomarker of Biological Aging, in a Nationally Representative Sample of US Adults [J].
Liu, Buyun ;
Sun, Yangbo ;
Xu, Guifeng ;
Snetselaar, Linda G. ;
Ludewig, Gabriele ;
Wallace, Robert B. ;
Bao, Wei .
JOURNAL OF THE ACADEMY OF NUTRITION AND DIETETICS, 2019, 119 (04) :617-625
[9]  
Liu PF, 2015, PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), P1284
[10]   The Influence of Feature Representation of Text on the Performance of Document Classification [J].
Martincic-Ipsic, Sanda ;
Milicic, Tanja ;
Todorovski, Ljupco .
APPLIED SCIENCES-BASEL, 2019, 9 (04)