MolGPT: Molecular Generation Using a Transformer-Decoder Model

被引:245
作者
Bagal, Viraj [1 ,2 ]
Aggarwal, Rishal [1 ]
Vinod, P. K. [1 ]
Priyakumar, U. Deva [1 ]
机构
[1] Int Inst Informat Technol, Hyderabad 500032, India
[2] Indian Inst Sci Educ & Res, Pune 411008, Maharashtra, India
关键词
PREDICTION; DATABASE;
D O I
10.1021/acs.jcim.1c00600
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Application of deep learning techniques for de novo generation of molecules, termed as inverse molecular design, has been gaining enormous traction in drug design. The representation of molecules in SMILES notation as a string of characters enables the usage of state of the art models in natural language processing, such as Transformers, for molecular design in general. Inspired by generative pre-training (GPT) models that have been shown to be successful in generating meaningful text, we train a transformer-decoder on the next token prediction task using masked self-attention for the generation of druglike molecules in this study. We show that our model, MolGPT, performs on par with other previously proposed modern machine learning frameworks for molecular generation in terms of generating valid, unique, and novel molecules. Furthermore, we demonstrate that the model can be trained conditionally to control multiple properties of the generated molecules. We also show that the model can be used to generate molecules with desired scaffolds as well as desired molecular properties by conditioning the generation on scaffold SMILES strings of desired scaffolds and property values. Using saliency maps, we highlight the interpretability of the generative process of the model.
引用
收藏
页码:2064 / 2076
页数:13
相关论文
共 54 条
[1]  
[Anonymous], 2019, ARXIV190513639
[2]  
[Anonymous], 2019, LANGUAGE MODELS ARE
[3]  
[Anonymous], 2017, P INT C MACH LEARN
[4]   SMILES-based deep generative scaffold decorator for de-novo drug design [J].
Arus-Pous, Josep ;
Patronov, Atanas ;
Bjerrum, Esben Jannik ;
Tyrchan, Christian ;
Reymond, Jean-Louis ;
Chen, Hongming ;
Engkvist, Ola .
JOURNAL OF CHEMINFORMATICS, 2020, 12 (01)
[5]   Randomized SMILES strings improve the quality of molecular generative models [J].
Arus-Pous, Josep ;
Johansson, Simon Viet ;
Prykhodko, Oleksii ;
Bjerrum, Esben Jannik ;
Tyrchan, Christian ;
Reymond, Jean-Louis ;
Chen, Hongming ;
Engkvist, Ola .
JOURNAL OF CHEMINFORMATICS, 2019, 11 (01)
[6]   The properties of known drugs .1. Molecular frameworks [J].
Bemis, GW ;
Murcko, MA .
JOURNAL OF MEDICINAL CHEMISTRY, 1996, 39 (15) :2887-2893
[7]  
Bickerton GR, 2012, NAT CHEM, V4, P90, DOI [10.1038/NCHEM.1243, 10.1038/nchem.1243]
[8]  
Bjerrum E. J., 2017, ARXIV
[9]   Improving Chemical Autoencoder Latent Space and Molecular De Novo Generation Diversity with Heteroencoders [J].
Bjerrum, Esben Jannik ;
Sattarov, Boris .
BIOMOLECULES, 2018, 8 (04)
[10]   GuacaMol: Benchmarking Models for de Novo Molecular Design [J].
Brown, Nathan ;
Fiscato, Marco ;
Segler, Marwin H. S. ;
Vaucher, Alain C. .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2019, 59 (03) :1096-1108