Generating Novel Compounds Targeting SARS-CoV-2 Main Protease Based On Imbalanced Dataset

被引:6
作者
Hu, Fan [1 ]
Wang, Dongqi [1 ,2 ]
Hu, Yishen [1 ,2 ]
Jiang, Jiaxin [1 ]
Yin, Peng [1 ]
机构
[1] Chinese Acad Sci, Shenzhen Inst Adv Technol, Guangdong Hong Kong Macao Joint Lab Human Machine, Shenzhen 518055, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
来源
2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE | 2020年
基金
中国国家自然科学基金;
关键词
de novo drug design; SARS-CoV-2; 3C-like protease; imbalanced dataset; transformer;
D O I
10.1109/BIBM49941.2020.9313317
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The de novo drug design plays an important role in the drug discovery. Recently deep learning based method has been popular as a promising approach for the design of novel drugs with desirable properties. However, conventional target-specific generative models mainly concentrate on the known inhibitors and thus produce similar molecules. And these derivatives of known inhibitors are probably negative against the same target. Considering the cost of chemical synthesis and experimental validation, the low false positive rate of generative molecules is very important. In this paper, we propose an efficient pipeline to generate novel SARS-CoV-2 3C-like protease inhibitors. Based on the GPT2 generator and the well performing multi-task predictor which achieves high precision on the highly imbalanced 3CL in vitro screening dataset (650 positive of 297,467 molecules), we acquired a number of novel 3CL-target compounds and analyzed their molecular properties. Moreover, we applied randomized SMILES for data augmentation of positive molecules to create larger chemical space for the generator. Finally, the selected positive compounds with desirable properties are exhibited, as well as their nearest neighbors of 3CL inhibitors which have already been verified in vitro.
引用
收藏
页码:432 / 436
页数:5
相关论文
共 18 条
[1]   Randomized SMILES strings improve the quality of molecular generative models [J].
Arus-Pous, Josep ;
Johansson, Simon Viet ;
Prykhodko, Oleksii ;
Bjerrum, Esben Jannik ;
Tyrchan, Christian ;
Reymond, Jean-Louis ;
Chen, Hongming ;
Engkvist, Ola .
JOURNAL OF CHEMINFORMATICS, 2019, 11 (01)
[2]  
Bickerton GR, 2012, NAT CHEM, V4, P90, DOI [10.1038/NCHEM.1243, 10.1038/nchem.1243]
[3]  
Cao B, 2020, NEW ENGL J MED, V382, P1787, DOI [10.1056/NEJMc2008043, 10.1056/NEJMoa2001282]
[4]  
Chenthamarakshan V, 2020, TARGET SPECIFIC SELE
[5]   Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions [J].
Ertl, Peter ;
Schuffenhauer, Ansgar .
JOURNAL OF CHEMINFORMATICS, 2009, 1
[6]   Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules [J].
Gomez-Bombarelli, Rafael ;
Wei, Jennifer N. ;
Duvenaud, David ;
Hernandez-Lobato, Jose Miguel ;
Sanchez-Lengeling, Benjamin ;
Sheberla, Dennis ;
Aguilera-Iparraguirre, Jorge ;
Hirzel, Timothy D. ;
Adams, Ryan P. ;
Aspuru-Guzik, Alan .
ACS CENTRAL SCIENCE, 2018, 4 (02) :268-276
[7]   The species Severe acute respiratory syndrome-related coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2 [J].
Gorbalenya, Alexander E. ;
Baker, Susan C. ;
Baric, Ralph S. ;
de Groot, Raoul J. ;
Drosten, Christian ;
Gulyaeva, Anastasia A. ;
Haagmans, Bart L. ;
Lauber, Chris ;
Leontovich, Andrey M. ;
Neuman, Benjamin W. ;
Penzar, Dmitry ;
Perlman, Stanley ;
Poon, Leo L. M. ;
Samborskiy, Dmitry V. ;
Sidorov, Igor A. ;
Sola, Isabel ;
Ziebuhr, John .
NATURE MICROBIOLOGY, 2020, 5 (04) :536-544
[8]  
Guimaraes G. L., 2017, OBJECTIVE REINTBRCED
[10]   ChEMBL: towards direct deposition of bioassay data [J].
Mendez, David ;
Gaulton, Anna ;
Bento, A. Patricia ;
Chambers, Jon ;
De Veij, Marleen ;
Felix, Eloy ;
Magarinos, Maria Paula ;
Mosquera, Juan F. ;
Mutowo, Prudence ;
Nowotka, Michal ;
Gordillo-Maranon, Maria ;
Hunter, Fiona ;
Junco, Laura ;
Mugumbate, Grace ;
Rodriguez-Lopez, Milagros ;
Atkinson, Francis ;
Bosc, Nicolas ;
Radoux, ChrisJ ;
Segura-Cabrera, Aldo ;
Hersey, Anne ;
Leach, Andrew R. .
NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) :D930-D940