Research progress on drug representation learning

被引:0
作者
Chen X. [1 ]
Liu X. [1 ]
Wu J. [1 ]
机构
[1] Department of Electronic Engineering, Tsinghua University, Beijing
来源
Qinghua Daxue Xuebao/Journal of Tsinghua University | 2020年 / 60卷 / 02期
关键词
Drug; Molecular graph; Representation learning; Simplified molecular input line entry specification (SMILES);
D O I
10.16511/j.cnki.qhdxxb.2019.21.038
中图分类号
学科分类号
摘要
The drug development process is characterized by large capital density, high risk and long cycles; thus, drug development requires much capital, manpower and resources. While traditional machine learning methods can aid drug development some, they require molecular descriptors as inputs. The selection of the molecular descriptors then greatly impacts the performance of the machine learning models. Therefore, most traditional machine learning methods require complex and time-consuming feature engineering. The emerging deep learning methods can directly learn the features from raw representations of the drugs which bypasses the feature engineering and shortens the drug development cycle. In this paper, the drug representation learning methods are divided into simplified molecular input line entry specification (SMILES) expression based drug representation learning methods and molecular graph based representation learning methods. This paper then surveys the innovations and limitations of various drug representation learning methods. This paper then identifies major challenges in current drug representation learning methods and presents possible solutions. © 2020, Tsinghua University Press. All right reserved.
引用
收藏
页码:171 / 180
页数:9
相关论文
共 60 条
  • [1] Merkwirth C., Lengauer T., Automatic generation of complementary descriptors with molecular graph networks, Journal of Chemical Information and Modeling, 45, 5, pp. 1159-1168, (2005)
  • [2] Weininger D., SMILES, a chemical language and information system. 1. Introduction to Methodology and Encoding rules, Journal of Chemical Information and Computer Sciences, 28, 1, pp. 31-36, (1988)
  • [3] Sutskever I., Vinyals O., Le Q.V., Sequence to sequence learning with neural networks, Proceedings of the 27th International Conference on Neural Information Processing Systems, pp. 3104-3112, (2014)
  • [4] Kingma D.P., Welling M., Auto-encoding variational Bayes
  • [5] Kipf T.N., Welling M., Semi-supervised classification with graph convolutional networks
  • [6] Gomez-Bombarelli R., Wei J.N., Duvenaud D., Et al., Automatic chemical design using a data-driven continuous representation of molecules, ACS Central Science, 4, 2, pp. 268-276, (2018)
  • [7] Bengio Y., Ducharme R., Vincent P., Et al., A neural probabilistic language model, Journal of Machine Learning Research, 3, 6, pp. 1137-1155, (2003)
  • [8] Mikolov T., Karafiat M., Burget L., Et al., Recurrent neural network based language model, Eleventh Annual Conference of the International Speech Communication Association, pp. 1045-1048, (2010)
  • [9] Segler M.H.S., Kogej T., Tyrchan C., Et al., Generating focused molecule libraries for drug discovery with recurrent neural networks, ACS Central Science, 4, 1, pp. 120-131, (2017)
  • [10] Olivecrona M., Blaschke T., Engkvist O., Et al., Molecular de-novo design through deep reinforcement learning, Journal of Cheminformatics, 9, (2017)