Chinese long text similarity calculation of semantic progressive fusion based on Bert

被引:0
作者
Li, Xiao [1 ,2 ]
Hu, Lanlan [1 ]
机构
[1] Anyang Normal Univ, Sch Comp & Informat Engn, Anyang, Henan, Peoples R China
[2] Anyang Normal Univ, Key Lab Oracle Bone Inscript Informat Proc, Minist Educ, Anyang, Henan, Peoples R China
关键词
Natural language processing; long text similarity; Bert model; transformer;
D O I
10.3233/JCM-247245
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Text similarity is an important index to measure the similarity between two or more texts. It is widely used in many fields of natural language processing tasks. With the maturity of deep learning technology, a large number of neural network models have been used to calculate text similarity and have achieved good results in similarity calculation task of sentences or short texts. Among them, Bert model has become a research hotspot in this field due to its excellent performance. However, the application effect of existing similarity algorithms on long texts is not ideal, and they cannot truly extract richer semantic information hidden in the structure of long text documents. This paper takes Chinese long text as the research object, proposes a long text similarity calculation method using sentence sequence instead of word level sequence, constructs a long text semantic representation model with semantic progressive fusion, solves the practical problems faced by applications or natural language processing tasks related to long text semantics, in order to breaks through the bottleneck of long text similarity calculation.
引用
收藏
页码:2213 / 2225
页数:13
相关论文
共 12 条
  • [1] Cho K., 2014, P C EMP METH NAT LAN, P1724, DOI 10.3115/v1/d14
  • [2] Pre-Training With Whole Word Masking for Chinese BERT
    Cui, Yiming
    Che, Wanxiang
    Liu, Ting
    Qin, Bing
    Yang, Ziqing
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3504 - 3514
  • [3] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [4] Howard J, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, P328
  • [5] Li BH, 2020, PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), P9119
  • [6] Lin Z., 2017, 5 INT C LEARN REPR I, DOI DOI 10.48550/ARXIV.1703.03130
  • [7] Mikolov T., 2013, INT C LEARN REPR ICL, V1301, P3781
  • [8] A Methodology Combining Cosine Similarity with Classifier for Text Classification
    Park, Kwangil
    Hong, June Seok
    Kim, Wooju
    [J]. APPLIED ARTIFICIAL INTELLIGENCE, 2020, 34 (05) : 396 - 411
  • [9] Radford A., 2018, IMPROVING LANGUAGE U
  • [10] Reimers N, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P3982