Study on suitability and importance of multilayer extreme learning machine for classification of text data

被引:25
作者
Roul, Rajendra Kumar [1 ,2 ]
Asthana, Shubham Rohan [1 ]
Kumar, Gaurav [1 ]
机构
[1] BITS, Pilani KK Birla Goa Campus, Pilani, Goa, India
[2] BITS, Dept Comp Sci, Pilani KK Birla Goa Campus, Pilani, Goa, India
关键词
Connected component; Deep learning; Extreme learning machine; Feature selection; Multilayer extreme learning machine; ALGORITHM;
D O I
10.1007/s00500-016-2189-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The dynamic Web, which contains huge number of digital documents, is expanding day by day. Thus, it has become a tough challenge to search for a particular document from such a large volume of collections. Text classification is a technique which can speed up the search and retrieval tasks and hence is the need of the hour. Aiming in this direction, this study proposes an efficient technique that uses the concept of connected component (CC) of a graph and Word-net along with four established feature selection techniques [e.g., TF-IDF, Chi-square, Bi-Normal Separation (BNS) and Information Gain (IG)] to select the best features from a given input dataset in order to prepare an efficient training feature vector. Next, multilayer extreme learning machine (ML-ELM) (which is based on the architecture of deep learning) and other state-of-the-art classifiers are trained on this efficient training feature vector for classification of text data. The experimental work has been carried out on DMOZ and 20-Newsgroups datasets. We have studied the behavior and compared the results of different classifiers using these four important feature selection techniques used for classification process and observed that ML-ELM achieved the maximum overall F-measure of 72.28 % on DMOZ dataset using TF-IDF as the feature selection technique and 81.53 % on 20-Newsgroups dataset using BNS as the feature selection technique compared to other state-of-the-art classifiers which signifies the usefulness of deep learning used by ML-ELM for classifying the text data. Experimental results on these benchmark datasets show the stability and effectiveness of our approach over other competing approaches.
引用
收藏
页码:4239 / 4256
页数:18
相关论文
共 32 条
[1]  
[Anonymous], 1997, ICML
[2]  
[Anonymous], 2008, INTRO INFORM RETRIEV, DOI DOI 10.1017/CBO9780511809071
[3]  
Bai RJ, 2011, COMM COM INF SC, V257, P1
[4]   Sparse Extreme Learning Machine for Classification [J].
Bai, Zuo ;
Huang, Guang-Bin ;
Wang, Danwei ;
Wang, Han ;
Westover, M. Brandon .
IEEE TRANSACTIONS ON CYBERNETICS, 2014, 44 (10) :1858-1870
[5]   Web page classification based on a support vector machine using a weighted vote schema [J].
Chen, Rung-Ching ;
Hsieh, Chung-Hsun .
EXPERT SYSTEMS WITH APPLICATIONS, 2006, 31 (02) :427-435
[6]  
Ding S, 2015, MATH PROBL ENG, V2015
[7]   Extreme learning machine and its applications [J].
Ding, Shifei ;
Xu, Xinzheng ;
Nie, Ru .
NEURAL COMPUTING & APPLICATIONS, 2014, 25 (3-4) :549-556
[8]  
Forman G., 2003, Journal of Machine Learning Research, V3, P1289, DOI 10.1162/153244303322753670
[9]  
Gomez Juan Carlos, 2012, Multidisciplinary Information Retrieval. Proceedings 5th Information Retrieval Facility Conference. IRFC 2012, P94, DOI 10.1007/978-3-642-31274-8_8
[10]  
Gopal S, 2013, 19TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'13), P257