Ensemble Learning Based Feature Selection with an Application to Text Classification

被引:0
作者
Onan, Aytug [1 ]
机构
[1] Manisa Celal Bayar Univ, Yazilim Muhendisligi Bolumu, Manisa, Turkey
来源
2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU) | 2018年
关键词
feature selection; text classification; ensemble learning; SCHEME;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
An important problem of text classification is high dimensionality. The performance of different feature selection methods can change based on the characteristics of different datasets. In this study, a feature selection method is developed, which integrates different filter-based feature selection methods by an ensemble learning approach. In the presented method, feature rankings obtained by five filter-based feature selection methods (mutual information measure, chi-square statistics, odds ratio, information gain and weighted log likelihood ratio) are aggregated by enhanced Borda count rank aggregation. In the experimental analysis, Reuters-21578 and 20 Newsgroups datasets are employed on support vector machines and C4.5 classifier. The experimental results indicate that the presented method outperforms conventional filter-based feature selection schemes.
引用
收藏
页数:4
相关论文
共 17 条
[1]  
Aledo J., 2017, P IEEE S SER COMP IN, P1
[2]  
[Anonymous], 1997, ICML
[3]  
[Anonymous], 2012, MINING TEXT DATA
[4]   Ensemble feature selection for high dimensional data: a new method and a comparative study [J].
Ben Brahim, Afef ;
Limam, Mohamed .
ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2018, 12 (04) :937-952
[5]   A study of supervised term weighting scheme for sentiment analysis [J].
Deng, Zhi-Hong ;
Luo, Kun-Hu ;
Yu, Hong-Liang .
EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (07) :3506-3513
[6]  
Dittman D.J., 2012, P 26 INT FLAIRS C, P420
[7]  
Fattah MA, 2017, J INF PROCESS SYST, V13, P1397, DOI 10.3745/JIPS.02.0076
[8]   Hybrid feature selection for text classification [J].
Gunal, Serkan .
TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2012, 20 :1296-1311
[9]   Benchmarking attribute selection techniques for discrete class data mining [J].
Hall, MA ;
Holmes, G .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2003, 15 (06) :1437-1447
[10]   A feature selection model based on genetic rank aggregation for text sentiment classification [J].
Onan, Aytug ;
Korukoglu, Serdar .
JOURNAL OF INFORMATION SCIENCE, 2017, 43 (01) :25-38