BiGRU attention capsule neural network for persian text classification

被引:0
作者
Amir Kenarang
Mehrdad Farahani
Mohammad Manthouri
机构
[1] Islamic Azad University North Tehran Branch,Computer Engineering
[2] Shahed University,Electrical and Electronic Engineering
来源
Journal of Ambient Intelligence and Humanized Computing | 2022年 / 13卷
关键词
Deep learning; Text classification; Persian NLP; Neural networks;
D O I
暂无
中图分类号
学科分类号
摘要
Text classification is a significant part of the business world. In the news classification world, detection of the subject is an important issue that can lead to the recognition of news trends and junk news. There are different algorithms of deep learning to process text classification. In this paper, specific algorithms have been implemented and compared to obtain the subject of the text in the Persian news corpus. The best results belong to BiGRU with the attention mechanism and CapsNet (BiGRUACaps) method. The GRU network outperforms LSTM because of fewer gates and, therefore, fewer parameters. In the GRU, the flow control is done without a memory unit, and this network has shown that it has better performance in case of existing less data. Moreover, given that long sentences are used in the news texts, the existence of the attention mechanism has made important words more relevant and has solved the problem in the long sequences data. The most significant problem in classifying Persian texts was the lack of a suitable dataset. One of the contributions of this work is scraped data. Collecting 20,726 records from Persian news websites is the best Persian news dataset with the category. Due to the lack of appropriate pre-trained Persian models and also the combination of various neural networks with these models, and determining the optimal model to identify the subject of Persian text, has been another problem of this research. The use of Model CapsNet in Persian data has also been looked into, which has had exciting results. The results of the comparison show the improvement of the classification performance of the Persian texts. The best result obtained the combination of BiGRUACaps with 0.8608 in F Measure
引用
收藏
页码:3923 / 3933
页数:10
相关论文
共 44 条
[1]  
Asghar MZ(2019)Exploring deep neural networks for rumor detection J Ambient Intell Hum Comput 12 1-19
[2]  
Habib A(2019)Comparative effectiveness of convolutional neural network (cnn) and recurrent neural network (rnn) architectures for radiology text report classification Artif Intell Med 97 79-88
[3]  
Habib A(2017)Enriching word vectors with subword information Trans Assoc Comput Linguist 5 135-146
[4]  
Khan AM(2019)Bidirectional lstm with attention mechanism and convolutional layer for text classification Neurocomputing 337 325-338
[5]  
Ali R(2019)Word n-gram attention models for sentence similarity and inference Expert Syst Appl 132 1-11
[6]  
Khattak AM(2016)Recognizing emotions in text using ensemble of classifiers Eng Appl Artif Intell 51 191-201
[7]  
Banerjee I(1997)Bidirectional recurrent neural networks IEEE Trans Signal Process 45 2673-2681
[8]  
Ling Y(2018)Deep bi-directional lstm network for query intent detection Procedia Comput Sci 143 939-946
[9]  
Chen MC(1968)Programming techniques: regular expression search algorithm Commun ACM 11 419-422
[10]  
Hasan SA(2020)Feature extraction and analysis of natural language processing for deep learning english language IEEE Access 8 46335-46345