A Chinese text classification based on active

被引:4
|
作者
Deng, Song [1 ]
Li, Qianliang [1 ]
Dai, Renjie [2 ]
Wei, Siming [2 ]
Wu, Di [3 ]
He, Yi [4 ]
Wu, Xindong [5 ]
机构
[1] Nanjing Univ Post & Telecommun, Inst Adv Technol, Nanjing 210003, Peoples R China
[2] State Grid Shanghai Municipal Elect Power Co, Shanghai 200122, Peoples R China
[3] Southwest Univ, Coll Comp & Informat Sci, Chongqing 400715, Peoples R China
[4] Old Dominion Univ, Norfolk, VA 23462 USA
[5] Hefei Univ Technol, Key Lab Knowledge Engn Big Data, Minist Educ China, Hefei 230009, Peoples R China
关键词
Natural language processing; Deep active learning; Hierarchical confidence; Power text; Knowledge graph; ALGORITHM;
D O I
10.1016/j.asoc.2023.111067
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The construction of knowledge graph is beneficial for grid production, electrical safety protection, fault diagnosis and traceability in an observable and controllable way. Highly-precision text classification algorithm is crucial to build a professional knowledge graph in power system. Unfortunately, there are a large number of poorly described and specialized texts in the power business system, and the amount of data containing valid labels in these texts is low. This will bring great challenges to improve the precision of text classification models. To offset the gap, we propose a classification algorithm for Chinese text in the power system based on deep active learning (CCTP-DAL). Our core idea is to apply a hierarchical confidence strategy to a deep active learning model, to balance the trade-offs between the amount of training data and the accuracy of text classification. Our CCTP-DAL (1) trains the Bert model using a small amount of labeled data to calculate the confidence level of each short text, (2) selects high-confidence text data with optimal model generalization capability based on the hierarchical confidence level, and (3) fuses deep learning models and active learning strategies to ensure high text classification accuracy with less labeled training data. We benchmark our model on a real crawler data on the web with extensive experiments. The experimental results demonstrate that our proposed model can achieve higher text classification accuracy with less labeled training data compared with other deep learning models.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Research on Chinese News Text Classification Based on ERNIE Model
    Zhang, Wenxu
    PROCEEDINGS OF THE WORLD CONFERENCE ON INTELLIGENT AND 3-D TECHNOLOGIES, WCI3DT 2022, 2023, 323 : 89 - 100
  • [2] An emotional classification method of Chinese short comment text based on ELECTRA
    Zhang, Shunxiang
    Yu, Hongbin
    Zhu, Guangli
    CONNECTION SCIENCE, 2022, 34 (01) : 254 - 273
  • [3] Active Learning for Turkish Text Classification
    Sapci, Ali Osman Berk
    Tastan, Oznur
    Yeniterzi, Reyyan
    2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,
  • [4] Chinese Multilabel Short Text Classification Method Based on GAN and Pinyin Embedding
    Bai, Jinpeng
    Li, Xinfu
    IEEE ACCESS, 2024, 12 : 83323 - 83329
  • [5] A Long-Text Classification Method of Chinese News Based on BERT and CNN
    Chen, Xinying
    Cong, Peimin
    Lv, Shuo
    IEEE ACCESS, 2022, 10 : 34046 - 34057
  • [6] Active Learning for Biomedical Text Classification Based on Automatically Generated Regular Expressions
    Flores, Christopher A.
    Figueroa, Rosa L.
    Pezoa, Jorge E.
    IEEE ACCESS, 2021, 9 : 38767 - 38777
  • [7] BERT-based chinese text classification for emergency management with a novel loss function
    Wang, Zhongju
    Wang, Long
    Huang, Chao
    Sun, Shutong
    Luo, Xiong
    APPLIED INTELLIGENCE, 2023, 53 (09) : 10417 - 10428
  • [8] BERT-based chinese text classification for emergency management with a novel loss function
    Zhongju Wang
    Long Wang
    Chao Huang
    Shutong Sun
    Xiong Luo
    Applied Intelligence, 2023, 53 : 10417 - 10428
  • [9] Active Learning for Text Classification and Fake News Detection
    Sahan, Marko
    Smidl, Vaclav
    Marik, Radek
    2021 INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE AND INTELLIGENT CONTROLS (ISCSIC 2021), 2021, : 87 - 94
  • [10] Text classification of Chinese news based on multi-scale CNN and LSTM hybrid model
    Zhai, ZhengLi
    Zhang, Xin
    Fang, FeiFei
    Yao, LuYao
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (14) : 20975 - 20988