Long text feature extraction network with data augmentation

被引:10
|
作者
Tang, Changhao [1 ]
Ma, Kun [1 ]
Cui, Benkuan [1 ]
Ji, Ke [1 ]
Abraham, Ajith [2 ]
机构
[1] Univ Jinan, Shandong Prov Key Lab Network Based Intelligent C, Jinan 250022, Peoples R China
[2] Sci Network Innovat & Res Excellence, Machine Intelligence Res Labs, Auburn, WA USA
基金
中国国家自然科学基金;
关键词
COVID-19; Fake news; Social media; Data augmentation; Long text;
D O I
10.1007/s10489-022-03185-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The spread of COVID-19 has had a serious impact on either work or the lives of people. With the decrease in physical social contacts and the rise of anxiety on the pandemic, social media has become the primary approach for people to access information related to COVID-19. Social media is rife with rumors and fake news, causing great damage to the Society. Facing shortages, imbalance, and nosiness, the current Chinese data set related to the epidemic has not helped the detection of fake news. Besides, the accuracy of classification was also affected by the easy loss of edge characteristics in long text data. In this paper, long text feature extraction network with data augmentation (LTFE) was proposed, which improves the learning performance of the classifier by optimizing the data feature structure. In the stage of encoding, Twice-Masked Language Modeling for Fine-tuning (TMLM-F) and Data Alignment that Preserves Edge Characteristics (DA-PEC) was proposed to extract the classification features of the Chinese Dataset. Between the TMLM-F and DA-PEC processes, we use Attention to capture the dependencies between words and generate corresponding vector representations. The experimental results illustrate that this method is effective for the detection of Chinese fake news pertinent to the pandemic.
引用
收藏
页码:17652 / 17667
页数:16
相关论文
共 50 条
  • [1] Long text feature extraction network with data augmentation
    Changhao Tang
    Kun Ma
    Benkuan Cui
    Ke Ji
    Ajith Abraham
    Applied Intelligence, 2022, 52 : 17652 - 17667
  • [2] Microbial data augmentation combining feature extraction and transformer network
    Wen, Liu-Ying
    Chen, Zhu
    Xie, Xiao-Nan
    Min, Fan
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (06) : 2539 - 2550
  • [3] A network-based feature extraction model for imbalanced text data
    Li, Keping
    Yan, Dongyang
    Liu, Yanyan
    Zhu, Qiaozhen
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 195
  • [4] EEG Feature Extraction and Data Augmentation in Emotion Recognition
    Kalashami, Mahsa Pourhosein
    Pedram, Mir Mohsen
    Sadr, Hossein
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [5] A method to monitor a BBS using feature extraction of text data
    Ichifuji, Y
    Konno, S
    Sone, H
    WEB AND COMMUNICATION TECHNOLOGIES AND INTERNET -RELATED SOCIAL ISSUES - HSI 2005, 2005, 3597 : 349 - 352
  • [6] Data Augmentation and Feature Extraction using Variational Autoencoder for Acoustic Modeling
    Nishizaki, Hiromitsu
    2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1263 - 1268
  • [7] Few-Shot Learning With Enhancements to Data Augmentation and Feature Extraction
    Zhang, Yourun
    Gong, Maoguo
    Li, Jianzhao
    Feng, Kaiyuan
    Zhang, Mingyang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 14
  • [8] Text Feature Extraction and Classification Based on Convolutional Neural Network (CNN)
    Zhang, Taohong
    Li, Cunfang
    Cao, Nuan
    Ma, Rui
    Zhang, ShaoHua
    Ma, Nan
    DATA SCIENCE, PT 1, 2017, 727 : 472 - 485
  • [9] A RBF network for Chinese text classification based on concept feature extraction
    Jiang, Minghu
    Wang, Lin
    Lu, Yinghua
    Liao, Shasha
    NEURAL INFORMATION PROCESSING, PT 3, PROCEEDINGS, 2006, 4234 : 285 - 294
  • [10] Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition
    Luo, Canjie
    Zhu, Yuanzhi
    Jin, Lianwen
    Wang, Yongpan
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 13743 - 13752