Seq2seq Fingerprint: An Unsupervised Deep Molecular Embedding for Drug Discovery

被引:101
作者
Xu, Zheng [1 ]
Wang, Sheng [1 ]
Zhu, Feiyun [1 ]
Huang, Junzhou [1 ]
机构
[1] Univ Texas Arlington, 701 S Nedderman Dr, Arlington, TX 76019 USA
来源
ACM-BCB' 2017: PROCEEDINGS OF THE 8TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY,AND HEALTH INFORMATICS | 2017年
基金
美国国家科学基金会;
关键词
Unsupervised Learning; Structured Prediction; Learning Representation; Sequence to Sequence Learning; Deep Learning; Drug Discovery; Virtual Screening; Molecular Representation; Imaging; Computational Biology; FORCE-FIELD; DESIGN;
D O I
10.1145/3107411.3107424
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Many of today's drug discoveries require expertise knowledge and insanely expensive biological experiments for identifying the chemical molecular properties. However, despite the growing interests of using supervised machine learning algorithms to automatically identify those chemical molecular properties, there is little advancement of the performance and accuracy due to the limited amount of training data. In this paper, we propose a novel unsupervised molecular embedding method, providing a continuous feature vector for each molecule to perform further tasks, e.g., solubility classification. In the proposed method, a multi-layered Gated Recurrent Unit (GRU) network is used to map the input molecule into a continuous feature vector of fixed dimensionality, and then another deep GRU network is employed to decode the continuous vector back to the original molecule. As a result, the continuous encoding vector is expected to contain rigorous and enough information to recover the original molecule and predict its chemical properties. The proposed embedding method could utilize almost unlimited molecule data for the training phase. With sufficient information encoded in the vector, the proposed method is also robust and task-insensitive. The performance and robustness are confirmed and interpreted in our extensive experiments.
引用
收藏
页码:285 / 294
页数:10
相关论文
共 60 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]  
[Anonymous], 29 AAAI C ART INT
[3]  
[Anonymous], IEEE C COMP VIS PATT
[4]  
[Anonymous], PMPS
[5]  
[Anonymous], 2016, ARXIV161103199
[6]  
[Anonymous], 2016, DEEP LEARNING
[7]  
[Anonymous], COMPUTATIONAL LEARNI
[8]  
[Anonymous], 2016, ARXIV161002415
[9]  
[Anonymous], ARXIV170300564
[10]  
[Anonymous], INT C MED IM COMP CO