A Chinese text classification based on active

被引:4
作者
Deng, Song [1 ]
Li, Qianliang [1 ]
Dai, Renjie [2 ]
Wei, Siming [2 ]
Wu, Di [3 ]
He, Yi [4 ]
Wu, Xindong [5 ]
机构
[1] Nanjing Univ Post & Telecommun, Inst Adv Technol, Nanjing 210003, Peoples R China
[2] State Grid Shanghai Municipal Elect Power Co, Shanghai 200122, Peoples R China
[3] Southwest Univ, Coll Comp & Informat Sci, Chongqing 400715, Peoples R China
[4] Old Dominion Univ, Norfolk, VA 23462 USA
[5] Hefei Univ Technol, Key Lab Knowledge Engn Big Data, Minist Educ China, Hefei 230009, Peoples R China
关键词
Natural language processing; Deep active learning; Hierarchical confidence; Power text; Knowledge graph; ALGORITHM;
D O I
10.1016/j.asoc.2023.111067
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The construction of knowledge graph is beneficial for grid production, electrical safety protection, fault diagnosis and traceability in an observable and controllable way. Highly-precision text classification algorithm is crucial to build a professional knowledge graph in power system. Unfortunately, there are a large number of poorly described and specialized texts in the power business system, and the amount of data containing valid labels in these texts is low. This will bring great challenges to improve the precision of text classification models. To offset the gap, we propose a classification algorithm for Chinese text in the power system based on deep active learning (CCTP-DAL). Our core idea is to apply a hierarchical confidence strategy to a deep active learning model, to balance the trade-offs between the amount of training data and the accuracy of text classification. Our CCTP-DAL (1) trains the Bert model using a small amount of labeled data to calculate the confidence level of each short text, (2) selects high-confidence text data with optimal model generalization capability based on the hierarchical confidence level, and (3) fuses deep learning models and active learning strategies to ensure high text classification accuracy with less labeled training data. We benchmark our model on a real crawler data on the web with extensive experiments. The experimental results demonstrate that our proposed model can achieve higher text classification accuracy with less labeled training data compared with other deep learning models.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Text Classification by Genres Based on Rhythmic Characteristics
    Lagutina, K. V.
    Lagutina, N. S.
    Boychuk, E. I.
    AUTOMATIC CONTROL AND COMPUTER SCIENCES, 2022, 56 (07) : 735 - 743
  • [22] Text Classification Based on Title Semantic Information
    Liu, YunXiang
    Xu, Qi
    Wang, ChunYa
    2020 5TH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATICS AND BIOMEDICAL SCIENCES (ICIIBMS 2020), 2020, : 29 - 33
  • [23] Text Classification by Genres Based on Rhythmic Characteristics
    K. V. Lagutina
    N. S. Lagutina
    E. I. Boychuk
    Automatic Control and Computer Sciences, 2022, 56 : 735 - 743
  • [24] Text categorization based on a new classification by thresholds
    Cherif, Walid
    Madani, Abdellah
    Kissi, Mohamed
    PROGRESS IN ARTIFICIAL INTELLIGENCE, 2021, 10 (04) : 433 - 447
  • [25] Chinese Text Sentiment Orientation Identification Based on Chinese-Characters
    Lan, Qiujun
    Li, Weikang
    Liu, Wenxing
    2015 12TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2015, : 663 - 668
  • [26] Contrastive knowledge integrated graph neural networks for Chinese medical text classification
    Lan, Ge
    Hu, Mengting
    Li, Ye
    Zhang, Yuzhi
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 122
  • [27] Gender classification of microblog text based on authorial style
    Mukherjee, Shubhadeep
    Bala, Pradip Kumar
    INFORMATION SYSTEMS AND E-BUSINESS MANAGEMENT, 2017, 15 (01) : 117 - 138
  • [28] A Neural Network Based Text Classification with Attention Mechanism
    Lu SiChen
    PROCEEDINGS OF 2019 IEEE 7TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2019), 2019, : 333 - 338
  • [29] Gender classification of microblog text based on authorial style
    Shubhadeep Mukherjee
    Pradip Kumar Bala
    Information Systems and e-Business Management, 2017, 15 : 117 - 138
  • [30] An Ecology-based Index for Text Embedding and Classification
    Martino, Alessio
    De Santis, Enrico
    Rizzi, Antonello
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,