Hybrid supervised clustering based ensemble scheme for text classification

被引:93
作者
Onan, Aytug [1 ]
机构
[1] Celal Bayar Univ, Dept Software Engn, Manisa, Turkey
关键词
Diversity; Text classification; Classifier ensemble; Supervised clustering; CLASSIFIERS; FOREST;
D O I
10.1108/K-10-2016-0300
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Purpose - The immense quantity of available unstructured text documents serve as one of the largest source of information. Text classification can be an essential task for many purposes in information retrieval, such as document organization, text filtering and sentiment analysis. Ensemble learning has been extensively studied to construct efficient text classification schemes with higher predictive performance and generalization ability. The purpose of this paper is to provide diversity among the classification algorithms of ensemble, which is a key issue in the ensemble design. Design/methodology/approach - An ensemble scheme based on hybrid supervised clustering is presented for text classification. In the presented scheme, supervised hybrid clustering, which is based on cuckoo search algorithm and k-means, is introduced to partition the data samples of each class into clusters so that training subsets with higher diversities can be provided. Each classifier is trained on the diversified training subsets and the predictions of individual classifiers are combined by the majority voting rule. The predictive performance of the proposed classifier ensemble is compared to conventional classification algorithms (such as Naive Bayes, logistic regression, support vector machines and C4.5 algorithm) and ensemble learning methods (such as AdaBoost, bagging and random subspace) using 11 text benchmarks. Findings - The experimental results indicate that the presented classifier ensemble outperforms the conventional classification algorithms and ensemble learning methods for text classification. Originality/value - The presented ensemble scheme is the first to use supervised clustering to obtain diverse ensemble for text classification
引用
收藏
页码:330 / 348
页数:19
相关论文
共 54 条
  • [1] Aggarwal C. C., 2012, MINING TEXT DATA, P163, DOI [DOI 10.1007/978-1-4614-3223-46, DOI 10.1007/978-1-4614-3223-4, 10.1007/978-1-4614-3223-4]
  • [2] Adapting k-means for supervised clustering
    Al-Harbi, SH
    Rayward-Smith, VJ
    [J]. APPLIED INTELLIGENCE, 2006, 24 (03) : 219 - 226
  • [3] RFBoost: An improved multi-label boosting algorithm and its application to text categorisation
    Al-Salemi, Bassam
    Noah, Shahrul Azman Mohd
    Ab Aziz, Mohd Juzaiddin
    [J]. KNOWLEDGE-BASED SYSTEMS, 2016, 103 : 104 - 117
  • [4] Probabilistic Topic Models
    Blei, David M.
    [J]. COMMUNICATIONS OF THE ACM, 2012, 55 (04) : 77 - 84
  • [5] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [6] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [7] Financial credit analysis via a clustering weightless neural classifier
    Cardoso, Douglas O.
    Carvalho, Danilo S.
    Alves, Daniel S. F.
    Souza, Diego F. P.
    Carneiro, Hugo C. C.
    Pedreira, Carlos E.
    Lima, Priscila M. V.
    Franca, Felipe M. G.
    [J]. NEUROCOMPUTING, 2016, 183 : 70 - 78
  • [8] Cherkassky V, 1997, IEEE Trans Neural Netw, V8, P1564, DOI 10.1109/TNN.1997.641482
  • [9] Ensemble methods in machine learning
    Dietterich, TG
    [J]. MULTIPLE CLASSIFIER SYSTEMS, 2000, 1857 : 1 - 15
  • [10] Eick CF, 2004, PROC INT C TOOLS ART, P774