Attention-Enabled Multi-layer Subword Joint Learning for Chinese Word Embedding

被引:0
作者
Xue, Pengpeng [1 ]
Xiong, Jing [2 ]
Tan, Liang [1 ,3 ]
Liu, Zhongzhu [4 ]
Liu, Kanglong [5 ]
机构
[1] Sichuan Normal Univ, Sch Comp Sci, Chengdu 610101, Sichuan, Peoples R China
[2] Chongqing Coll Mobile Commun, Chongqing Key Lab Publ Big Data Secur Technol, Chongqing 401420, Peoples R China
[3] Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China
[4] Huizhou Univ, Sch Math & Stat, Huizhou 516007, Guangdong, Peoples R China
[5] Hong Kong Polytech Univ, Dept Chinese & Bilingual Studies, Hong Kong 999077, Peoples R China
基金
中国国家自然科学基金;
关键词
Chinese word embedding; Semantic analysis; Attention mechanism; Feature substring; Morphological information; Pronunciation information;
D O I
10.1007/s12559-025-10431-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, Chinese word embeddings have attracted significant attention in the field of natural language processing (NLP). The complex structures and diverse influences of Chinese characters present distinct challenges for semantic representation. As a result, Chinese word embeddings are primarily investigated in conjunction with characters and their subcomponents. Previous research has demonstrated that word vectors frequently fail to capture the subtle semantics embedded within the complex structure of Chinese characters. Furthermore, they often neglect the varying contributions of subword information to semantics at different levels. To tackle these challenges, we present a weight-based word vector model that takes into account the internal structure of Chinese words at various levels. The model further categorizes the internal structure of Chinese words into six layers of subword information: words, characters, components, pinyin, strokes, and structures. The semantics of Chinese words can be derived by integrating the subword information from various layers. Moreover, the model considers the varying contributions of each subword layer to the semantics of Chinese words. It utilizes an attention mechanism to determine the weights between and within the subword layers, facilitating the comprehensive extraction of word semantics. The word-level subwords act as the attention mechanism query for subwords in other layers to learn semantic bias. Experimental results show that the proposed word vector model achieves enhancements in various evaluation metrics, such as word similarity, word analogy, text categorization, and case studies.
引用
收藏
页数:16
相关论文
共 32 条
[1]  
Sun Y., Lin L., Tang D., Modeling mention, context and entity with neural networks for entity disambiguation, Twenty-fourth International Joint Conference on Artificial Intelligence, (2015)
[2]  
Shijia E., Xiang Y., Chinese named entity recognition with character-word mixed embedding, Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM), pp. 2055-2058, (2017)
[3]  
Joulin A., Grave E., Bojanowski P., Mikolov T., Bag of tricks for efficient text classification, Arxiv, (2016)
[4]  
Zhang S., Xu X., Pang Y., Et al., Multi-layer attention based CNN for target-dependent sentiment classification, Neural Process Lett, 51, pp. 2089-2103, (2020)
[5]  
Bahdanau D., Cho K., Bengio Y., Neural machine translation by jointly learning to align and translate, (2014)
[6]  
Mikolov T., Chen K., Corrado G., Dean J., Efficient estimation of word representations in vector space, (2013)
[7]  
Mikolov T., Sutskever I., Chen K., Corrado G.S., Dean J., Distributed representations of words and phrases and their compositionality, Adv Neural Inf Process Syst, 26, (2013)
[8]  
Xu J.W.L., A comparative study on the construction of English words and Chinese characters, J Educ Inst Jilin Province., 28, pp. 117-119, (2012)
[9]  
Chen X., Xu L., Liu Z., Sun M., Luan H., Joint learning of character and word embeddings, Twenty-fourth international joint conference on artificial intelligence, (2015)
[10]  
Yin R., Wang Q., Li P., Li R., Wang B., Multi-granularity Chinese word embedding, Proceedings of the 2016 conference on empirical methods in natural language processing, pp. 981-986, (2016)