A methodology for evaluating multi-objective evolutionary feature selection for classification in the context of virtual screening

被引:8
|
作者
Jimenez, Fernando [1 ]
Perez-Sanchez, Horacio [2 ]
Palma, Jose [1 ]
Sanchez, Gracia [1 ]
Martinez, Carlos [3 ]
机构
[1] Univ Murcia, Fac Informat, Dept Informat & Commun Engn, E-30100 Murcia, Spain
[2] Catholic Univ San Antonio Murcia UCAM, Comp Engn Dept, Bioinformat & High Performance Comp Res Grp BIOHP, Murcia 30107, Spain
[3] Univ Murcia, Int Doctorate Sch, E-30100 Murcia, Spain
关键词
Feature selection; Multi-objective evolutionary algorithms; Classification; Decision trees; Virtual screening; Drug discovery; FEATURE SUBSET-SELECTION; DRUG DISCOVERY; DIFFERENTIAL EVOLUTION; GENETIC ALGORITHM; SCORING FUNCTIONS; DOCKING; OPTIMIZATION; DESIGN; MODELS;
D O I
10.1007/s00500-018-3479-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Virtual screening (VS) methods have been shown to increase success rates in many drug discovery campaigns, when they complement experimental approaches, such as high-throughput screening methods or classical medicinal chemistry approaches. Nevertheless, predictive capability of VS is not yet optimal, mainly due to limitations in the underlying physical principles describing drug binding phenomena. One approach that can improve VS methods is the aid of machine learning methods. When enough experimental data are available to train such methods, predictive capability can considerably increase. We show in this research work how a multi-objective evolutionary search strategy for feature selection, which can provide with small and accurate decision trees that can be very easily understood by chemists, can drastically increase the applicability and predictive ability of these techniques and therefore aid considerable in the drug discovery problem. With the proposed methodology, we find classification models with accuracy between 0.9934 and 1.00 and area under ROC between 0.96 and 1.00 evaluated in full training sets, and accuracy between 0.9849 and 0.9940 and area under ROC between 0.89 and 0.93 evaluated with tenfold cross-validation over 30 iterations, while substantially reducing the model size.
引用
收藏
页码:8775 / 8800
页数:26
相关论文
共 50 条
  • [41] Multi-objective evolutionary feature selection for instrument recognition in polyphonic audio mixtures
    Vatolkin, Igor
    Preuss, Mike
    Rudolph, Guenter
    Eichhoff, Markus
    Weihs, Claus
    SOFT COMPUTING, 2012, 16 (12) : 2027 - 2047
  • [42] Risk-Sensitive Learning to Rank with Evolutionary Multi-Objective Feature Selection
    Sousa, Daniel Xavier
    Canuto, Sergio
    Goncalves, Marcos Andre
    Rosa, Thierson Couto
    Martins, Wellington Santos
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2019, 37 (02)
  • [43] Multi-objective evolutionary feature selection for instrument recognition in polyphonic audio mixtures
    Igor Vatolkin
    Mike Preuß
    Günter Rudolph
    Markus Eichhoff
    Claus Weihs
    Soft Computing, 2012, 16 : 2027 - 2047
  • [44] Information gain-based multi-objective evolutionary algorithm for feature selection
    Zhang, Baohang
    Wang, Ziqian
    Li, Haotian
    Lei, Zhenyu
    Cheng, Jiujun
    Gao, Shangce
    INFORMATION SCIENCES, 2024, 677
  • [45] Multi-surrogate assisted multi-objective evolutionary algorithms for feature selection in regression and classification problems with time series data
    Espinosa, Raquel
    Jimenez, Fernando
    Palma, Jose
    INFORMATION SCIENCES, 2023, 622 : 1064 - 1091
  • [46] Multi-objective PSO based online feature selection for multi-label classification
    Paul, Dipanjyoti
    Jain, Anushree
    Saha, Sriparna
    Mathew, Jimson
    KNOWLEDGE-BASED SYSTEMS, 2021, 222
  • [47] Online Feature Selection for Multi-label Classification in Multi-objective Optimization Framework
    Paul, Dipanjyoti
    Kumar, Rahul
    Saha, Sriparna
    Mathew, Jimson
    PROCEEDINGS OF THE 2019 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM 2019), 2019, : 530 - 531
  • [48] Multi-objective Optimisation-Based Feature Selection for Multi-label Classification
    Khan, Mohammed Arif
    Ekbal, Asif
    Mencia, Eneldo Loza
    Fuernkranz, Johannes
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, NLDB 2017, 2017, 10260 : 38 - 41
  • [49] Multi-objective genetic algorithm for feature selection in a protein function prediction context
    dos Santos, Bruno Cesar
    Nobre, Cristiane Neri
    Zarate, Luis Enrique
    2018 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2018, : 2267 - 2274
  • [50] Comparison of Multi-objective Evolutionary Algorithms for Prototype Selection in Nearest Neighbor Classification
    Acampora, Giovanni
    Tortora, Genoveffa
    Vitiello, Autilia
    PROCEEDINGS OF 2016 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2016,