Named Entity Recognition in Government Audit Texts Based on ChineseBERT and Character-Word Fusion

被引:2
|
作者
Huang, Baohua [1 ]
Lin, Yunjie [1 ]
Pang, Si [1 ]
Fu, Long [1 ]
机构
[1] Guangxi Univ, Sch Comp Elect & Informat, Nanning 530004, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 04期
基金
中国国家自然科学基金;
关键词
smart audit; named entity recognition; character-word fusion; GHM loss function; ChineseBERT;
D O I
10.3390/app14041425
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Named entity recognition of government audit text is a key task of intelligent auditing. Aiming at the problems of scarcity of corpus in the field of governmental auditing, insufficient utilization of traditional character vector word-level information features, and insufficient capturing of auditing entity features, this study builds its own dataset in the field of auditing and proposes the model CW-CBGC for recognizing named entities in governmental auditing text based on ChineseBERT and character-word fusion. First, the ChineseBERT pre-training model is used to extract the character vector that integrates the features of glyph and pinyin, combining with word vectors dynamically constructed by the BERT pre-training model; then, the sequences of character-word fusion vectors are input into the bi-directional gated recurrent neural network (BiGRU) to learn the textual features. Finally, the global optimal sequence label is generated by Conditional Random Field (CRF), and the GHM classification loss function is used in the model training to solve the problem of error evaluation under the conditions of noisy entities and unbalanced number of entities. The F1 value of this study's model on the audit dataset is 97.23%, which is 3.64% higher than the baseline model's F1 value; the F1 value of the model on the public dataset Resume is 96.26%, which is 0.73-2.78% higher than the mainstream model. The experimental results show that the model proposed in this paper can effectively recognize the entities in government audit texts and has certain generalization ability.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] A Named Entity Recognition Model for Chinese Electricity Violation Descriptions Based on Word-Character Fusion and Multi-Head Attention Mechanisms
    Meng, Lingwen
    Wang, Yulin
    Huang, Yuanjun
    Ma, Dingli
    Zhu, Xinshan
    Zhang, Shumei
    ENERGIES, 2025, 18 (02)
  • [22] A Novel Method for Chinese Named Entity Recognition Based on Character Vector
    Lu, Jing
    Ye, Mao
    Tang, Zhi
    Huang, Xiao-Jun
    Ma, Jia-Le
    COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS, AND WORKSHARING, COLLABORATECOM 2015, 2016, 163 : 141 - 150
  • [23] Named Entity Recognition From Biomedical Texts Using a Fusion Attention-Based BiLSTM-CRF
    Wei, Hao
    Gao, Mingyuan
    Zhou, Ai
    Chen, Fei
    Qu, Wen
    Wang, Chunli
    Lu, Mingyu
    IEEE ACCESS, 2019, 7 : 73627 - 73636
  • [24] Thai Named Entity Recognition Using Bi-LSTM-CRF with Word and Character Representation
    Thattinaphanich, Suphanut
    Prom-on, Santitham
    PROCEEDINGS OF THE 2019 4TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY (INCIT): ENCOMPASSING INTELLIGENT TECHNOLOGY AND INNOVATION TOWARDS THE NEW ERA OF HUMAN LIFE, 2019, : 149 - 154
  • [25] A Named Entity Recognition Method based on Decomposition and Concatenation of Word Chunks
    Iwakura, Tomoya
    Takamura, Hiroya
    Okumura, Manabu
    IJCNLP 2011 - Proceedings of the 5th International Joint Conference on Natural Language Processing, 2011, : 828 - 836
  • [26] A named entity recognition method based on decomposition and concatenation of word chunks
    Iwakura, Tomoya
    Takamura, Hiroya
    Okumura, Manabu
    ACM Transactions on Asian Language Information Processing, 2013, 12 (03):
  • [27] Named-entity recognition in Turkish legal texts
    Cetindag, Can
    Yazicioglu, Berkay
    Koc, Aykut
    NATURAL LANGUAGE ENGINEERING, 2023, 29 (03) : 615 - 642
  • [28] ENRICHING PORTUGUESE MEDIEVAL TEXTS WITH NAMED ENTITY RECOGNITION
    Bico, Maria Ines
    Baptista, Jorge
    Batista, Fernando
    Cardeira, Esperanca
    INTERNATIONAL JOURNAL OF HUMANITIES AND ARTS COMPUTING-A JOURNAL OF DIGITAL HUMANITIES, 2024, 18 (01): : 109 - 124
  • [29] Named Entity Recognition to Detect Criminal Texts on the Web
    Skorzewski, Pawel
    Pieniowski, Mikolaj
    Demenko, Grazyna
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6223 - 6231
  • [30] Enhanced character embedding for Chinese named entity recognition
    Jia, Bingjing
    Wu, Zhongli
    Wu, Bin
    Liu, Yutong
    Zhou, Pengpeng
    MEASUREMENT & CONTROL, 2020, 53 (9-10): : 1669 - 1681