Mrmr plus and Cfs plus feature selection algorithms for high-dimensional data

被引:13
作者
Angulo, Adrian Pino [1 ]
Shin, Kilho [1 ]
机构
[1] Univ Hyogo, Grad Sch Appl Informat, Kobe, Hyogo 6512197, Japan
基金
日本学术振兴会;
关键词
Feature selection; Minimum redundancy maximum relevance; High-dimensional data; Machine learning;
D O I
10.1007/s10489-018-1381-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection is a central issue in machine learning and applied mathematics. Filter feature selection algorithms aim to solve the optimization problem of selecting a set of features that maximize the correlation feature-class and minimize the correlation feature-feature. MRMR (Minimum Redundancy Maximum Relevance) and CFs (Correlation-based Feature Selection) are one of the most well-known algorithms that can find an approximate solution to this optimization problem. However, as time passes, the availability of data becomes greater, which makes the feature selection process more challenging. In this paper, we propose two new versions of MRMR and CFs that output the same feature set as the original algorithms, but are considerably much faster. Our novel algorithms are based on the solution of the duplication and the redundancy problems intrinsic in the original algorithms. We applied our proposals to thirty datasets related to the field of microarray and cancer analysis. Experiments revealed that the proposed algorithms MRMR+ and CFs+ are on average fourteen and three times faster than the original algorithms respectively.
引用
收藏
页码:1954 / 1967
页数:14
相关论文
共 25 条
[1]   mRMR-ABC: A Hybrid Gene Selection Algorithm for Cancer Classification Using Microarray Gene Expression Profiling [J].
Alshamlan, Hala ;
Badr, Ghada ;
Alohali, Yousef .
BIOMED RESEARCH INTERNATIONAL, 2015, 2015
[2]   Genetic Bee Colony (GBC) algorithm: A new gene selection method for microarray cancer classification [J].
Alshamlan, Hala M. ;
Badr, Ghada H. ;
Alohali, Yousef A. .
COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2015, 56 :49-60
[3]   Gene Selection for Microarray Cancer Data Classification by a Novel Rule-Based Algorithm [J].
Angulo, Adrian Pino .
INFORMATION, 2018, 9 (01)
[4]  
Cheng Soon Ong, 2011, 2011 IEEE International Workshop on Open-source Software for Scientific Computation (OSSC), DOI 10.1109/OSSC.2011.6184715
[5]   Towards improving cluster-based feature selection with a simplified silhouette filter [J].
Covoes, Thiago F. ;
Hruschka, Eduardo R. .
INFORMATION SCIENCES, 2011, 181 (18) :3766-3782
[6]   A two-stage gene selection scheme utilizing MRMR filter and GA wrapper [J].
El Akadi, Ali ;
Amine, Aouatif ;
El Ouardighi, Abdeljalil ;
Aboutajdine, Driss .
KNOWLEDGE AND INFORMATION SYSTEMS, 2011, 26 (03) :487-500
[7]   Development of a two-stage gene selection method that incorporates a novel hybrid approach using the cuckcio optimization algorithm and harmony search for cancer classification [J].
Elyasigomari, V. ;
Lee, D. A. ;
Screen, H. R. C. ;
Shaheed, M. H. .
JOURNAL OF BIOMEDICAL INFORMATICS, 2017, 67 :11-20
[8]   Research on collaborative negotiation for e-commerce. [J].
Feng, YQ ;
Lei, Y ;
Li, Y ;
Cao, RZ .
2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, :2085-2088
[9]  
Guyon I., 2020, J MACH LEARN RES, V3, P1157, DOI [DOI 10.1162/153244303322753616, 10.1162/153244303322753616]
[10]  
Guyon I., 2004, ADV NEURAL INFORM PR, V17