A Scalable Hybrid Ensemble Model for Text Classification

被引:0
作者
Singh, Bharat [1 ]
Kushwaha, Nidhi [1 ]
Vyas, Om Prakash [2 ]
机构
[1] Indian Inst Informat Technol, Allahabad 211012, Uttar Pradesh, India
[2] Int Inst Informat Technol, Allahabad 211012, Uttar Pradesh, India
来源
PROCEEDINGS OF THE 2016 IEEE REGION 10 CONFERENCE (TENCON) | 2016年
关键词
Ensemble Learning; Classification; Bagging; Boosting; Algorithim;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Text classification is a major problem and an evolving area of wide research with many algorithms had been already proposed. With the recent advancements in the field of Ensemble Learning there are many new techniques consistently emerging that need to be implemented for text classification in search of a better classifier. In this paper, a machine learning model BagBoo, which is a combination of Bag + Boo, where Bag and Boo are representing the Bagging and Boosting respectively. BagBoo model in essence gains its performance from using Bagged ensemble of Boosted trees. In this paper, it is a specific model we use to build our own text classifier using the already existing or some self-modified Bagging and Boosting techniques. We run evaluations on Reuters 21578 data sets 10 most frequent classes, SMS Spam Collection [1] data set, DB World e-mails Data Set and shown through results whether our implementation of BagBoo gives a better performance for text based classification. Results show that the proposed method can achieve an accuracy of 78.94% and 94.21% with SVM and J48 classifier on DBword datasets.
引用
收藏
页码:3148 / 3152
页数:5
相关论文
共 12 条
[1]  
[Anonymous], 2005, Data Mining: Concepts and Techniques
[2]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[3]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[4]  
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411
[5]   Stochastic gradient boosting [J].
Friedman, JH .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2002, 38 (04) :367-378
[6]  
Gomez Hidalgo J. M., 2006, P 2006 ACM S DOC ENG, P107
[7]  
Iyyer M, 2015, PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1, P1681
[8]  
Pavlov D.Y., 2010, Proceedings of the 19th ACM international conference on Information and knowledge management, CIKM '10, P1897
[9]  
Pilannino M., 2011, DBWORLD E MAIL CLASS
[10]  
Walker A. J., 1977, ACM Transactions on Mathematical Software, V3, P253, DOI 10.1145/355744.355749