TOP: A deep mixture representation learning method for boosting molecular toxicity prediction

被引:19
作者
Peng, Yuzhong [1 ,2 ,3 ]
Zhang, Ziqiao [1 ,2 ]
Jiang, Qizhi [1 ,2 ]
Guan, Jihong [4 ]
Zhou, Shuigeng [1 ,2 ]
机构
[1] Fudan Univ, Shanghai Key Lab Intelligent Informat Proc, Shanghai 200433, Peoples R China
[2] Fudan Univ, Sch Comp Sci, Shanghai 200433, Peoples R China
[3] Nanning Normal Univ, Key Lab Sci Comp & Intelligent Informat Proc Univ, Nanning 530001, Peoples R China
[4] Tongji Univ, Dept Comp Sci & Technol, Shanghai 201804, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Drug screening; Toxicity prediction; Molecular representation; Deep learning; ACUTE ORAL TOXICITY; SMILES; DRUG; DISCOVERY; ALGORITHM; MACHINE; MODELS; DOMAIN;
D O I
10.1016/j.ymeth.2020.05.013
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
At the early stages of the drug discovery, molecule toxicity prediction is crucial to excluding drug candidates that are likely to fail in clinical trials. In this paper, we presented a novel molecular representation method and developed a corresponding deep learning-based framework called TOP (the abbreviation of TOxicity Prediction). TOP integrates specifically designed data preprocessing methods, an RNN based on bidirectional gated recurrent unit (BiGRU), and fully connected neural networks for end-to-end molecular representation learning and chemical toxicity prediction. TOP can automatically learn a mixed molecular representation from not only SMILES contextual information that describes the molecule structure, but also physiochemical properties. Therefore, TOP can overcome the drawbacks of existing methods that use either of them, thus greatly promotes toxicity prediction accuracy. We conducted extensive experiments over 14 classic toxicity prediction tasks on three different benchmark datasets, including balanced and imbalanced ones. The results show that, with the help of the novel molecular representation method, TOP significantly outperforms not only three baseline machine learning methods, but also five state-of-the-art methods.
引用
收藏
页码:55 / 64
页数:10
相关论文
共 40 条
  • [1] Aalaei S, 2016, IRAN J BASIC MED SCI, V19, P476
  • [2] [Anonymous], ARXIV14061078
  • [3] [Anonymous], ARXIV150301445
  • [4] [Anonymous], ARXIV151002855
  • [5] [Anonymous], ARXIV170606689
  • [6] How Similar Are Similarity Searching Methods? A Principal Component Analysis of Molecular Descriptor Space
    Bender, Andreas
    Jenkins, Jeremy L.
    Scheiber, Josef
    Sukuru, Sai Chelan K.
    Glick, Meir
    Davies, John W.
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2009, 49 (01) : 108 - 119
  • [7] Brooks WH, 2011, CURR TOP MED CHEM, V11, P760
  • [8] Cameron R., 2018, MITOCHONDRIAL DYSFUN, P205
  • [9] In silico toxicity prediction by support vector machine and SMILES representation-based string kernel
    Cao, D. -S.
    Zhao, J. -C.
    Yang, Y. -N.
    Zhao, C. -X.
    Yan, J.
    Liu, S.
    Hu, Q. -N.
    Xu, Q. -S.
    Liang, Y. -Z.
    [J]. SAR AND QSAR IN ENVIRONMENTAL RESEARCH, 2012, 23 (1-2) : 141 - 153
  • [10] Correlation-Based Ensemble Feature Selection Using Bioinspired Algorithms and Classification Using Backpropagation Neural Network
    Christo, V. R. Elgin
    Nehemiah, H. Khanna
    Minu, B.
    Kannan, A.
    [J]. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2019, 2019