A Novel Model Based on Big Data Environment for Text Content Security Recognition

被引:1
作者
Su, Peng [1 ]
Zhao, Hui [2 ]
Wang, Ying [3 ]
机构
[1] Henan Univ, Henan Prov Engn Res Ctr Intelligent Data Proc, Kaifeng 475004, Peoples R China
[2] Henan Univ, Educ Informat Technol Lab, Kaifeng 475000, Peoples R China
[3] Henan Univ, Henan Int Joint Lab Theories & Key Technol Intelli, Kaifeng, Peoples R China
来源
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY | 2024年 / 96卷 / 02期
关键词
Big data; Text recognition; Text vector extraction; Improved TF-IDF algorithm; TF-IDF;
D O I
10.1007/s11265-023-01860-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the big data environment, text content security recognition is one of the main ways to intelligently manage the Internet and maintain privacy. However, traditional text content security recognition methods lack semantic understanding and ignore scenarios where keywords are evenly distributed, resulting in high false positive rate and low accuracy. To address this problem, we propose a novel model based on big data environment for text content security recognition. In the scenario where keywords are evenly distributed, we design the TFC-BPLW-AM algorithm to extract text vectors. The TFC BPLW-AM algorithm considers the problem of uniform distribution of keywords, the problem of calculating weights in a single form, and the time-consuming problem caused by too large weight matrix. Thus, the weight integrity is enhanced, the recognition accuracy is improved, and the running time is shortened. Under the 20 newgroups and Fudan University Chinese text datasets, we conduct experimental comparisons with existing models and results show that our model achieves 96.7% F1 score, with a maximum increase of 30.7% and a minimum increase of 2.7%.
引用
收藏
页码:99 / 112
页数:14
相关论文
共 32 条
  • [1] Arabic Questions Classification Using Modified TF-IDF
    Alammary, Ali Saleh
    [J]. IEEE ACCESS, 2021, 9 : 95109 - 95122
  • [2] Fan H, 2018, INT C NETW COMM COMP, V147, P501, DOI [10.2991/ncce-18.2018.79, DOI 10.2991/NCCE-18.2018.79]
  • [3] Firdaus Diaz Harizky, 2020, 2020 3rd International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), P476, DOI 10.1109/ISRITI51436.2020.9315449
  • [4] Dynamic energy-aware cloudlet-based mobile cloud computing model for green computing
    Gai, Keke
    Qiu, Meikang
    Zhao, Hui
    Tao, Lixin
    Zong, Ziliang
    [J]. JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2016, 59 : 46 - 54
  • [5] Gao J, 2017, INT CONF SOFTW ENG, P33, DOI 10.1109/ICSESS.2017.8342858
  • [6] Giliazova A., 2021, 2021 14 INT C MANAGE, P1
  • [7] Guo AZ, 2016, 2016 IEEE INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC), P415, DOI 10.1109/ITNEC.2016.7560393
  • [8] Jiang Z., 2021, MATH PROBL ENG, V2021
  • [9] EnSWF: effective features extraction and selection in conjunction with ensemble learning methods for document sentiment classification
    Khan, Jawad
    Alam, Aftab
    Hussain, Jamil
    Lee, Young-Koo
    [J]. APPLIED INTELLIGENCE, 2019, 49 (08) : 3123 - 3145
  • [10] Multi-co-training for document classification using various document representations: TF-IDF, LDA, and Doc2Vec
    Kim, Donghwa
    Seo, Deokseong
    Cho, Suhyoun
    Kang, Pilsung
    [J]. INFORMATION SCIENCES, 2019, 477 : 15 - 29