Employing Molecular Conformations for Ligand-Based Virtual Screening with Equivariant Graph Neural Network and Deep Multiple Instance Learning

被引:7
作者
Gu, Yaowen [1 ,2 ]
Li, Jiao [1 ]
Kang, Hongyu [1 ,3 ]
Zhang, Bowen [4 ]
Zheng, Si [1 ,5 ]
机构
[1] Chinese Acad Med Sci & Peking Union Med Coll CAMS, Inst Med Informat IMI, Beijing 100020, Peoples R China
[2] NYU, Dept Chem, New York, NY 10027 USA
[3] Beijing Inst Technol, Sch Life Sci, Dept Biomed Engn, Beijing 100081, Peoples R China
[4] Beijing StoneWise Technol Co Ltd, Beijing, Peoples R China
[5] Tsinghua Univ, Inst Artificial Intelligence, Dept Comp Sci & Technol, BNRist, Beijing 100084, Peoples R China
来源
MOLECULES | 2023年 / 28卷 / 16期
关键词
virtual screening; bioactivity prediction; equivariant graph neural network; multiple instance learning; molecular conformation; benchmark dataset; DRUG DISCOVERY; GENERATION; SEARCH;
D O I
10.3390/molecules28165982
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Ligand-based virtual screening (LBVS) is a promising approach for rapid and low-cost screening of potentially bioactive molecules in the early stage of drug discovery. Compared with traditional similarity-based machine learning methods, deep learning frameworks for LBVS can more effectively extract high-order molecule structure representations from molecular fingerprints or structures. However, the 3D conformation of a molecule largely influences its bioactivity and physical properties, and has rarely been considered in previous deep learning-based LBVS methods. Moreover, the relative bioactivity benchmark dataset is still lacking. To address these issues, we introduce a novel end-to-end deep learning architecture trained from molecular conformers for LBVS. We first extracted molecule conformers from multiple public molecular bioactivity data and consolidated them into a large-scale bioactivity benchmark dataset, which totally includes millions of endpoints and molecules corresponding to 954 targets. Then, we devised a deep learning-based LBVS called EquiVS to learn molecule representations from conformers for bioactivity prediction. Specifically, graph convolutional network (GCN) and equivariant graph neural network (EGNN) are sequentially stacked to learn high-order molecule-level and conformer-level representations, followed with attention-based deep multiple-instance learning (MIL) to aggregate these representations and then predict the potential bioactivity for the query molecule on a given target. We conducted various experiments to validate the data quality of our benchmark dataset, and confirmed EquiVS achieved better performance compared with 10 traditional machine learning or deep learning-based LBVS methods. Further ablation studies demonstrate the significant contribution of molecular conformation for bioactivity prediction, as well as the reasonability and non-redundancy of deep learning architecture in EquiVS. Finally, a model interpretation case study on CDK2 shows the potential of EquiVS in optimal conformer discovery. The overall study shows that our proposed benchmark dataset and EquiVS method have promising prospects in virtual screening applications.
引用
收藏
页数:20
相关论文
共 65 条
[1]   Hybrid-Enhanced Siamese Similarity Models in Ligand-Based Virtual Screen [J].
Altalib, Mohammed Khaldoon ;
Salim, Naomie .
BIOMOLECULES, 2022, 12 (11)
[2]   Selected concepts and investigations in compound classification, molecular descriptor analysis, and virtual screening [J].
Bajorath, J .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2001, 41 (02) :233-245
[3]  
Bickerton GR, 2012, NAT CHEM, V4, P90, DOI [10.1038/NCHEM.1243, 10.1038/nchem.1243]
[4]   Molecular fingerprint similarity search in virtual screening [J].
Cereto-Massague, Adria ;
Jose Ojeda, Maria ;
Valls, Cristina ;
Mulero, Miguel ;
Garcia-Vallve, Santiago ;
Pujadas, Gerard .
METHODS, 2015, 71 :58-63
[5]  
Chen T., 2023, Xgboost: Extreme Gradient Boosting, V114
[6]  
Drucker H., 1995, Adv. Neural Inf. Process. Syst., V8
[7]   AutoDock Vina 1.2.0: New Docking Methods, Expanded Force Field, and Python']Python Bindings [J].
Eberhardt, Jerome ;
Santos-Martins, Diogo ;
Tillack, Andreas F. ;
Forli, Stefano .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2021, 61 (08) :3891-3898
[8]   Ligand-Based Virtual Screening Using Graph Edit Distance as Molecular Similarity Measure [J].
Garcia-Hernandez, Carlos ;
Fernandez, Alberto ;
Serratosa, Francesc .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2019, 59 (04) :1410-1421
[9]  
Gasteiger J, 2021, 35 C NEURAL INFORM P, V34
[10]   BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems pharmacology [J].
Gilson, Michael K. ;
Liu, Tiqing ;
Baitaluk, Michael ;
Nicola, George ;
Hwang, Linda ;
Chong, Jenny .
NUCLEIC ACIDS RESEARCH, 2016, 44 (D1) :D1045-D1053