Punjabi news multi-classification using language generation-based optimized long short-term memory networks

被引:0
作者
Varun Gupta
Ekta Gupta
机构
[1] Chandigarh College of Engineering and Technology,Department of Computer Science and Engineering
来源
Evolving Systems | 2023年 / 14卷
关键词
Text-classification; News classification; Asian languages; Punjabi language; Deep neural networks; Recurrent neural networks; LSTM; Averaged SGD-Long short term memory networks;
D O I
暂无
中图分类号
学科分类号
摘要
Text classification is a method that assigns a specific category to each piece of written information. It is one of the fundamental tasks in natural language processing that has a wide range of applications like spam detection, sentiment analysis, etc. One type of text classification is news classification which can help the reader to focus on news as per their choice. In this paper, we propose a novel method for multiclassification of Punjabi news articles using a pretrained language generation model based optimized and regularized long short-term memory model. The proposed method employs Averaged Stochastic Gradient Descent Weight-Dropped LSTM model, which uses a recurrent regularization technique known as DropConnect on hidden-to-hidden weights and a variant of the averaged stochastic gradient method wherein the averaging trigger is determined using a non-monotonic condition instead of being tuned by the user. The proposed news classification method works in three stages. In the first stage, we train a language model on Punjabi text acquired from Wikipedia, and in the second stage, we fine-tune the language model on the Punjabi news dataset. Finally, we train a classifier using the pretrained encoder part of the language model. The pretrained encoder part of the language model helps the classifier in the linguistic understanding of the text, resulting in better classification results on the text. The results obtained from the proposed work indicate that the proposed method outperforms the other direct methods of news classification, which are not using pretrained language generation models.
引用
收藏
页码:49 / 58
页数:9
相关论文
共 55 条
[1]  
Agarwal V(2018)Kumar P (2018) UNLization of Punjabi text for natural language processing applications Sadhana 43 1-23
[2]  
Akhter MP(2020)Document-level text classification using single-layer multisize filters convolutional neural network IEEE Access 8 42689-42707
[3]  
Jiangbin Z(2017)natural language generation with computational intelligence IEEE Comput Intell Mag 12 8-9
[4]  
Naqvi IR(2017)Empirical data analytics Int J Intell Syst 32 1261-1284
[5]  
Abdelmajeed M(2009)Accelerating scientific computations with mixed precisionalgorithms Comput Phys Commun 180 2526-2533
[6]  
Mehmood A(2019)Comparing automated text classification methods Int J Res Mark 36 20-38
[7]  
Sadiq MT(2019)Supervised representation learning for multi-label classification Mach Learn 108 747-763
[8]  
Alonso JM(2019)Survey on supervised machine learning techniques for automatic text classification Artif Intell Rev 52 273-292
[9]  
Bugarín A(2019)Text classification: Naïve Bayes classifier with sentiment Lexicon IAENG Int J Comput Sci 46 141-148
[10]  
Reiter E(2020)Deep learning based text classification: a comprehensive review ACM Comput Serv 1 1-43