Deep Learning-Based Algorithm for Classification of News Text

被引:1
作者
Yu Li, Xiao [1 ]
Han, Ling Bo [1 ]
Feng Jiang, Zheng [2 ]
机构
[1] Guangdong Ocean Univ, Coll Elect & Informat Engn, Zhanjiang 524088, Peoples R China
[2] Guangxi Minzu Normal Univ, Coll Math & Comp Sci, Chongzuo 532200, Peoples R China
关键词
Text categorization; Feature extraction; Bayes methods; Convolutional neural networks; Vectors; Classification algorithms; Accuracy; Long short term memory; Deep learning; Machine learning algorithms; Convolutional neural network (CNN); naive Bayes; news text classification; TF-IDF; SVM;
D O I
10.1109/ACCESS.2024.3487311
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As online news grows exponentially, hotspot classification is becoming increasingly important. Although traditional machine learning-based text classification methods, such as plain Bayes, support vector machines (SVMs), and classification trees, provide a certain degree of interpretability, they are often incapable of handling complex semantic relations, which frequently results in poor classification accuracy. To address this issue, in this study we introduce a novel deep learning-based text classification (TC) method based on a convolutional neural network (CNN), long short-term memory (LSTM), and an attention mechanism. This method accurately predicts news popularity by combining the feature extraction ability of the CNN, the sequence modeling ability of LSTM, and the weighted summation ability of the attention mechanism. The experimental results demonstrated that, compared to other deep learning models, the proposed method achieved a higher accuracy, more effectively accounted for the context of the text data, and addressed the problem of poor classification accuracy. Along with model selection, feature engineering was also the key to improving the accuracy. Accordingly, we developed a plain Bayesian TC model based on feature extraction, using word embeddings to convert text into richer vector representations. Then we combined the model with the different plain Bayes distributions, proving that the polynomial plain Bayes was the most suitable model for TC. Consequently, we added the feature word classification expressiveness index to improve the term frequency-inverse document frequency (TF-IDF) feature extraction, which produced a classification accuracy of 96%. This demonstrates that the improved model is superior at understanding and classifying text.
引用
收藏
页码:159086 / 159098
页数:13
相关论文
共 34 条
[1]  
Agarwal J., 2023, 2023 13 INT C CLOUD, P463
[2]   Towards enriching the quality of k-nearest neighbor rule for document classification [J].
Basu, Tanmay ;
Murthy, C. A. .
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2014, 5 (06) :897-905
[3]  
Bengio Y, 2001, ADV NEUR IN, V13, P932
[4]   Fuzzy KNN Method With Adaptive Nearest Neighbors [J].
Bian, Zekang ;
Vong, Chi Man ;
Wong, Pak Kin ;
Wang, Shitong .
IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (06) :5380-5393
[5]   A novel clustering approach and adaptive SVM classifier for intrusion detection in WSN: A data mining concept [J].
Borkar, Gautam M. ;
Patil, Leena H. ;
Dalgade, Dilip ;
Hutke, Ankush .
SUSTAINABLE COMPUTING-INFORMATICS & SYSTEMS, 2019, 23 (120-135) :120-135
[6]  
Chintalapudi N., 2019, INFORM MED UNLOCKED, V16, DOI [10.1016/j.imu.2019.100200, DOI 10.1016/J.IMU.2019.100200]
[7]  
ELAffendi MA, 2018, 2018 SIXTH INTERNATIONAL CONFERENCE ON DIGITAL INFORMATION, NETWORKING, AND WIRELESS COMMUNICATIONS (DINWC), P70, DOI 10.1109/DINWC.2018.8356998
[8]  
Fang Miao, 2018, 2018 10th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC). Proceedings, P48, DOI 10.1109/IHMSC.2018.10117
[9]   An approach based on classifier combination for online handwritten text and non-text classification in Devanagari script [J].
Ghosh, Rajib ;
Shanu, Saurav ;
Ranjan, Sugandha ;
Kumari, Khusboo .
SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2019, 44 (08)
[10]   A Novel Active Learning Method Using SVM for Text Classification [J].
Goudjil M. ;
Koudil M. ;
Bedda M. ;
Ghoggali N. .
International Journal of Automation and Computing, 2018, 15 (03) :290-298