Screening model of candidate drugs for breast cancer based on ensemble learning algorithm and molecular descriptor

被引:11
作者
Shi, Lihua
Yan, Fang
Liu, Haihong [1 ]
机构
[1] Yunnan Normal Univ, Key Lab Complex Syst Modeling & Applicat Univ Yun, Kunming, Yunnan, Peoples R China
基金
中国国家自然科学基金;
关键词
Molecular descriptor; Feature selection; Classification; Ensemble learning; Drug screening; PREDICTION; EXPRESSION;
D O I
10.1016/j.eswa.2022.119185
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Breast cancer is one of the leading killers of women around the world. Finding compounds with good bioactivity, metabolic dynamics and safety, including Absorption, Distribution, Metabolism, Excretion and Toxicity (ADMET for short), is a long and challenging task in breast cancer therapy. In the paper, molecular descriptor data of compounds was analyzed by the ensemble learning algorithm, and important features were selected for the development and validation of ADMET classification models. The overall process includes data cleaning, data splitting to training and testing sets, feature selection and classification model evaluation. A Two -Level Stacking Algorithm (TLSA) based on ensemble learning is proposed for ADMET classification. Various performance measures like classification accuracy, precision, recall, confusion matrix, F1-score, Receiver Operating Characteristic (ROC) curves and the Area Under the ROC Curves (AUC) are reported to show the superiority of the proposed method as compared to different classifiers. The experimental results show that the second level algorithm for TLSA utilizes Logistic Regression is better than other classifiers for the properties of Absorption, Distribution and Excretion, with accuracy of 94.6037%, 94.9410% and 88.1956% respectively. For the properties Metabolism and Toxicity, the second level algorithm utilizes Support Vector Machine to achieve the best classification performance, with accuracy of 88.8702% and 96.7960% respectively. The results show that the proposed approach works well with the classification of compound properties and can be a good alternative for the well-known machine learning program.
引用
收藏
页数:9
相关论文
共 42 条
[1]   Robust biomarker identification for cancer diagnosis with ensemble feature selection methods [J].
Abeel, Thomas ;
Helleputte, Thibault ;
Van de Peer, Yves ;
Dupont, Pierre ;
Saeys, Yvan .
BIOINFORMATICS, 2010, 26 (03) :392-398
[2]   On the Scalability of Machine-Learning Algorithms for Breast Cancer Prediction in Big Data Context [J].
Alghunaim, Sara ;
Al-Baity, Heyam H. .
IEEE ACCESS, 2019, 7 :91535-91546
[3]   Breast cancer in young women: an overview [J].
Anastasiadi, Zoi ;
Lianos, Georgios D. ;
Ignatiadou, Eleftheria ;
Harissis, Haralampos V. ;
Mitsis, Michail .
UPDATES IN SURGERY, 2017, 69 (03) :313-317
[4]   ADME-Space: a new tool for medicinal chemists to explore ADME properties [J].
Bocci, Giovanni ;
Carosati, Emanuele ;
Vayer, Philippe ;
Arrault, Alban ;
Lozano, Sylvain ;
Cruciani, Gabriele .
SCIENTIFIC REPORTS, 2017, 7
[5]   Recent advances and emerging challenges of feature selection in the context of big data [J].
Bolon-Canedo, V. ;
Sanchez-Marono, N. ;
Alonso-Betanzos, A. .
KNOWLEDGE-BASED SYSTEMS, 2015, 86 :33-45
[6]  
Breiman L, 1996, MACH LEARN, V24, P123, DOI 10.1007/BF00058655
[7]   In Silico Prediction of PAMPA Effective Permeability Using a Two-QSAR Approach [J].
Chi, Cheng-Ting ;
Lee, Ming-Han ;
Weng, Ching-Feng ;
Leong, Max K. .
INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2019, 20 (13)
[8]  
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411
[9]  
Dash M., 1997, Intelligent Data Analysis, V1
[10]   ADMETlab: a platform for systematic ADMET evaluation based on a comprehensively collected ADMET database [J].
Dong, Jie ;
Wang, Ning-Ning ;
Yao, Zhi-Jiang ;
Zhang, Lin ;
Cheng, Yan ;
Ouyang, Defang ;
Lu, Ai-Ping ;
Cao, Dong-Sheng .
JOURNAL OF CHEMINFORMATICS, 2018, 10