Machine Learning Enables Accurate and Rapid Prediction of Active Molecules Against Breast Cancer Cells

被引:19
作者
He, Shuyun [1 ,2 ]
Zhao, Duancheng [1 ,2 ]
Ling, Yanle [1 ,2 ]
Cai, Hanxuan [1 ,2 ]
Cai, Yike [3 ]
Zhang, Jiquan [4 ]
Wang, Ling [1 ,2 ]
机构
[1] South China Univ Technol, Sch Biol & Biol Engn, Guangdong Prov Engn & Technol Res Ctr Biopharmace, Guangdong Prov Key Lab Fermentat & Enzyme Engn, Guangzhou, Peoples R China
[2] South China Univ Technol, Sch Biol & Biol Engn, Guangdong Prov Engn & Technol Res Ctr Biopharmace, Joint Int Res Lab Synthet Biol & Med, Guangzhou, Peoples R China
[3] Guangdong Drug Adm, Ctr Certificat & Evaluat, Guangzhou, Peoples R China
[4] Guizhou Med Univ, Coll Pharm, Guizhou Prov Engn Technol Res Ctr Chem Drug R, State Key Lab Funct & Applicat Med Plants, Guiyang, Peoples R China
基金
美国国家科学基金会; 中国国家自然科学基金;
关键词
breast cancer; machine learning; graph neural networks; molecular fingerprints; structural fragments; SUPPORT VECTOR MACHINES; DRUG DISCOVERY; BIOLOGICAL EVALUATION; INHIBITORS; TRASTUZUMAB; DESIGN; FUTURE;
D O I
10.3389/fphar.2021.796534
中图分类号
R9 [药学];
学科分类号
1007 ;
摘要
Breast cancer (BC) has surpassed lung cancer as the most frequently occurring cancer, and it is the leading cause of cancer-related death in women. Therefore, there is an urgent need to discover or design new drug candidates for BC treatment. In this study, we first collected a series of structurally diverse datasets consisting of 33,757 active and 21,152 inactive compounds for 13 breast cancer cell lines and one normal breast cell line commonly used in in vitro antiproliferative assays. Predictive models were then developed using five conventional machine learning algorithms, including naive Bayesian, support vector machine, k-Nearest Neighbors, random forest, and extreme gradient boosting, as well as five deep learning algorithms, including deep neural networks, graph convolutional networks, graph attention network, message passing neural networks, and Attentive FP. A total of 476 single models and 112 fusion models were constructed based on three types of molecular representations including molecular descriptors, fingerprints, and graphs. The evaluation results demonstrate that the best model for each BC cell subtype can achieve high predictive accuracy for the test sets with AUC values of 0.689-0.993. Moreover, important structural fragments related to BC cell inhibition were identified and interpreted. To facilitate the use of the model, an online webserver called ChemBC () and its local version software () were developed to predict whether compounds have potential inhibitory activity against BC cells.
引用
收藏
页数:19
相关论文
共 75 条
[1]   From combinations to multitarget-directed ligands: A continuum in Alzheimer's disease polypharmacology [J].
Albertini, Claudia ;
Salerno, Alessandra ;
de Sena Murteira Pinheiro, Pedro ;
Bolognesi, Maria L. .
MEDICINAL RESEARCH REVIEWS, 2021, 41 (05) :2606-2633
[2]   A machine learning approach to define antimalarial drug action from heterogeneous cell-based screens [J].
Ashdown, George W. ;
Dimon, Michelle ;
Fan, Minjie ;
Teran, Fernando Sanchez-Roman ;
Witmer, Kathrin ;
Gaboriau, David C. A. ;
Armstrong, Zan ;
Ando, D. Michael ;
Baum, Jake .
SCIENCE ADVANCES, 2020, 6 (39)
[3]   The properties of known drugs .1. Molecular frameworks [J].
Bemis, GW ;
Murcko, MA .
JOURNAL OF MEDICINAL CHEMISTRY, 1996, 39 (15) :2887-2893
[4]   The future of phenotypic drug discovery [J].
Berg, Ellen L. .
CELL CHEMICAL BIOLOGY, 2021, 28 (03) :424-430
[5]   LOF: Identifying density-based local outliers [J].
Breunig, MM ;
Kriegel, HP ;
Ng, RT ;
Sander, J .
SIGMOD RECORD, 2000, 29 (02) :93-104
[6]   Cardiotoxicity Debated for Anthracyclines and Trastuzumab in Breast Cancer [J].
Brower, Vicki .
JNCI-JOURNAL OF THE NATIONAL CANCER INSTITUTE, 2013, 105 (12) :835-836
[7]   Phenotypic Drug Discovery for Human African Trypanosomiasis: A Powerful Approach [J].
Buckner, Frederick S. ;
Buchynskyy, Andriy ;
Nagendar, Pendem ;
Patrick, Donald A. ;
Gillespie, J. Robert ;
Herbst, Zackary ;
Tidwell, Richard R. ;
Gelb, Michael H. .
TROPICAL MEDICINE AND INFECTIOUS DISEASE, 2020, 5 (01)
[8]   11 years' follow-up of trastuzumab after adjuvant chemotherapy in HER2-positive early breast cancer: final analysis of the HERceptin Adjuvant (HERA) trial [J].
Cameron, David ;
Piccart-Gebhart, Martine J. ;
Gelber, Richard D. ;
Procter, Marion ;
Goldhirsch, Aron ;
de Azambuja, Evandro ;
Castro, Gilberto, Jr. ;
Untch, Michael ;
Smith, Ian ;
Gianni, Luca ;
Baselga, Jose ;
Al-Sakaff, Nedal ;
Lauer, Sabine ;
McFadden, Eleanor ;
Leyland-Jones, Brian ;
Bell, Richard ;
Dowsett, Mitch ;
Jackisch, Christian .
LANCET, 2017, 389 (10075) :1195-1205
[9]   ATOM PAIRS AS MOLECULAR-FEATURES IN STRUCTURE ACTIVITY STUDIES - DEFINITION AND APPLICATIONS [J].
CARHART, RE ;
SMITH, DH ;
VENKATARAGHAVAN, R .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1985, 25 (02) :64-73
[10]   Image-based profiling for drug discovery: due for a machine-learning upgrade? [J].
Chandrasekaran, Srinivas Niranj ;
Ceulemans, Hugo ;
Boyd, Justin D. ;
Carpenter, Anne E. .
NATURE REVIEWS DRUG DISCOVERY, 2021, 20 (02) :145-159