DrugCLIP: Contrastive Protein-Molecule Representation Learning for Virtual Screening

被引:0
作者
Gao, Bowen [1 ]
Qiang, Bo [2 ]
Tan, Haichuan [1 ]
Ren, Minsi [3 ]
Jia, Yinjun [4 ]
Lu, Minsi [5 ]
Liu, Jingjing [1 ]
Ma, Wei-Ying [1 ]
Lan, Yanyan [1 ,6 ]
机构
[1] Tsinghua Univ, Inst AI Ind Res AIR, Beijing, Peoples R China
[2] Peking Univ, Dept Pharmaceut Sci, Beijing, Peoples R China
[3] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
[4] Tsinghua Univ, Sch Life Sci, Beijing, Peoples R China
[5] Tsinghua Univ, Dept Pharmaceut Sci, Beijing, Peoples R China
[6] Beijing Acad Artificial Intelligence, Beijing, Peoples R China
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年
基金
国家重点研发计划;
关键词
NEURAL-NETWORK; DATABASE; DOCKING;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Virtual screening, which identifies potential drugs from vast compound databases to bind with a particular protein pocket, is a critical step in AI-assisted drug discovery. Traditional docking methods are highly time-consuming, and can only work with a restricted search library in real-life applications. Recent supervised learning approaches using scoring functions for binding-affinity prediction, although promising, have not yet surpassed docking methods due to their strong dependency on limited data with reliable binding-affinity labels. In this paper, we propose a novel contrastive learning framework, DrugCLIP, by reformulating virtual screening as a dense retrieval task and employing contrastive learning to align representations of binding protein pockets and molecules from a large quantity of pairwise data without explicit binding-affinity scores. We also introduce a biological-knowledge inspired data augmentation strategy to learn better protein-molecule representations. Extensive experiments show that DrugCLIP significantly outperforms traditional docking and supervised learning methods on diverse virtual screening benchmarks with highly reduced computation time, especially in zero-shot setting. The code for DrugCLIP is available at https://github.com/bowen- gao/DrugCLIP.
引用
收藏
页数:20
相关论文
共 55 条
[1]  
[Anonymous], NAT MACH INTELL
[2]  
Ballester P., 2010, BIOINFORMATICS
[3]  
Brocidiacono Michael, 2022, Bigbind: Learning from nonstructural data for structure-based virtual screening
[4]   SphereNet: Learning Spherical Representations for Detection and Classification in Omnidirectional Images [J].
Coors, Benjamin ;
Condurache, Alexandru Paul ;
Geiger, Andreas .
COMPUTER VISION - ECCV 2018, PT IX, 2018, 11213 :525-541
[5]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[6]   NNScore 2.0: A Neural-Network Receptor-Ligand Scoring Function [J].
Durrant, Jacob D. ;
McCammon, J. Andrew .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2011, 51 (11) :2897-2903
[7]  
Gao Zhangyang, 2022, Cosp: Co-supervised pretraining of pocket and ligand
[8]   ChEMBL: a large-scale bioactivity database for drug discovery [J].
Gaulton, Anna ;
Bellis, Louisa J. ;
Bento, A. Patricia ;
Chambers, Jon ;
Davies, Mark ;
Hersey, Anne ;
Light, Yvonne ;
McGlinchey, Shaun ;
Michalovich, David ;
Al-Lazikani, Bissan ;
Overington, John P. .
NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) :D1100-D1107
[9]   APPLICATION OF THE 3-DIMENSIONAL STRUCTURES OF PROTEIN TARGET MOLECULES IN STRUCTURE-BASED DRUG DESIGN [J].
GREER, J ;
ERICKSON, JW ;
BALDWIN, JJ ;
VARNEY, MD .
JOURNAL OF MEDICINAL CHEMISTRY, 1994, 37 (08) :1035-1054
[10]  
Grill J.-B., 2020, P 34 INT C NEUR INF, P1