A gating context-aware text classification model with BERT and graph convolutional networks

被引:31
作者
Gao, Weiqi [1 ]
Huang, Hao [1 ,2 ]
机构
[1] Xinjiang Univ, Coll Informat Sci & Engn, Urumqi, Peoples R China
[2] Xinjiang Prov Key Lab Multilingual Informat Techn, Urumqi, Peoples R China
基金
国家重点研发计划;
关键词
Text classification; graph convolutional network; BERT; gating mechanism; Euclidean distance;
D O I
10.3233/JIFS-201051
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Graph convolutional networks (GCNs), which are capable of effectively processing graph-structural data, have been successfully applied in text classification task. Existing studies on GCN based text classification model largely concerns with the utilization of word co-occurrence and Term Frequency-Inverse Document Frequency (TF-IDF) information for graph construction, which to some extent ignore the context information of the texts. To solve this problem, we propose a gating context-aware text classification model with Bidirectional Encoder Representations from Transformers (BERT) and graph convolutional network, named as Gating Context GCN (GC-GCN). More specifically, we integrate the graph embedding with BERT embedding by using a GCN with gating mechanism to enable the acquisition of context coding. We carry out text classification experiments to show the effectiveness of the proposed model. Experimental results shown our model has respectively obtained 0.19%, 0.57%, 1.05% and 1.17% improvements over the Text-GCN baseline on the 20NG, R8, R52, and Ohsumed benchmark datasets. Furthermore, to overcome the problem that word co-occurrence and TF-IDF are not suitable for graph construction for short texts, Euclidean distance is used to combine with word co-occurrence and TF-IDF information. We obtain an improvement by 1.38% on the MR dataset compared to Text-GCN baseline.
引用
收藏
页码:4331 / 4343
页数:13
相关论文
共 42 条
[1]  
Aggarwal CharuC., 2012, MINING TEXT DATA, DOI [10.1007/978-1-4614-3223-46, DOI 10.1007/978-1-4614-3223-4.6]
[2]  
[Anonymous], CVPR
[3]  
[Anonymous], 2018, P 2018 C EMPIRICAL M
[4]  
[Anonymous], 1986, P 8 ANN C COGNITIVE, DOI DOI 10.1109/69.917563
[5]  
[Anonymous], 2013, arXiv preprint arXiv:1301.3781
[6]  
Bastings J., 2017, P 2017 C EMPIRICAL M, P1957
[7]   A neural probabilistic language model [J].
Bengio, Y ;
Ducharme, R ;
Vincent, P ;
Jauvin, C .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (06) :1137-1155
[8]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[9]  
Busbridge D., 2019, ARXIV190405811, parXiv:190405811, DOI [10.48550/arXiv.1904.05811, DOI 10.48550/ARXIV.1904.05811]
[10]  
Cavnar W.B., 1994, P SDAIR 94 3 ANN S D, V161175