Multi-label Log-Loss function using L-BFGS for document categorization

被引:20
作者
Borhani, Mostafa [1 ]
机构
[1] Shahid Beheshti Univ, Quran Miracle Res Inst, Tehran 1983963113, Iran
关键词
Multi-label classification; Text mining; Quasi-Newton method; Holy Quran; Corpus analysis; BFGS; Scikit-learn; Artificial neural networks; Text classification;
D O I
10.1016/j.engappai.2020.103623
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text mining, which fundamentally involves quantitative tactics to analyze textual data, can be used for discovering knowledge and to achieve scholarly research goals. For large-scale data such as corpus text, intelligent learning methods have been effectively approached. In this paper, an artificial neural network with a quasi-Newton updating procedure is presented for multi-label multi-class text classification. This numerical unconstrained training technique, the Multi-Label extension of Log-Loss function using in Limited-memory Broyden-Fletcher-Goldfarb-Shanno algorithm (ML4BFGS), provides a noteworthy opportunity for text mining and leads to a significant improvement in text classification performances. The ML4BFGS training approach is applied to allocate some (one or multi) of the classes to each corresponding sentence from different available labels. We evaluate this method on English translations of the Holy Quran. These religious texts have been chosen for experiments of this manuscript because each verse (sentence) usually has multiple labels (topics) and different translations of each verse should have the same labels. Experimental results show that ML4BFGS is talented for multi-label multi-class classification in the Quranic corpus. Evaluation criteria of some advanced updating methods such as ITCG, BFGS, L-BFGS-B, L3BFGS as well as some other multi-label approaches such as ML-k-NN, and well-known SVM are compared with the proposed ML4BFGS and the outcomes are fullydescribed in this study. The performance measures including the Hamming loss, recall, precision, and F1 score show that the ML4BFGS achieves the best results in extracting related classes for each verse, while the proposed network takes the least epochs compared to the other training approach for completing learning or training phase. Simultaneously, the elapsed time for ML4BFGS is just 78% (in seconds) of the best experience of this term. Compared with the applicability of some state-of-the-art algorithms, ML4BFGS has a less computational cost, faster convergence rate, and much accuracy in corpus analysis.
引用
收藏
页数:7
相关论文
共 43 条
[1]  
Abdul-Baqi S., 2018, TEXT MINING QURAN
[2]  
Abramowitz M., 1972, Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, V10th
[3]   A corpus-based semantic kernel for text classification by using meaning values of terms [J].
Altinel, Berna ;
Ganiz, Murat Can ;
Diri, Banu .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2015, 43 :54-66
[4]  
[Anonymous], 2017, ARXIV170201460
[5]   2-POINT STEP SIZE GRADIENT METHODS [J].
BARZILAI, J ;
BORWEIN, JM .
IMA JOURNAL OF NUMERICAL ANALYSIS, 1988, 8 (01) :141-148
[6]  
Bifet A, 2010, J MACH LEARN RES, V11, P1601
[9]   Syntactic clustering of the Web [J].
Broder, AZ ;
Glassman, SC ;
Manasse, MS ;
Zweig, G .
COMPUTER NETWORKS AND ISDN SYSTEMS, 1997, 29 (8-13) :1157-1166
[10]   Performance improvement of deep neural network classifiers by a simple training strategy [J].
Caliskan, Abdullah ;
Yuksel, Mehmet Emin ;
Badem, Hasan ;
Basturk, Alper .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2018, 67 :14-23