Traditional Machine and Deep Learning for Predicting Toxicity Endpoints

被引:3
作者
Norinder, Ulf [1 ]
机构
[1] Stockholm Univ, Dept Comp & Syst Sci, S-16407 Kista, Sweden
来源
MOLECULES | 2023年 / 28卷 / 01期
关键词
CATMoS dataset; CDDD; BERT; conformal prediction; random forest; RDKit; LANGUAGE;
D O I
10.3390/molecules28010217
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Molecular structure property modeling is an increasingly important tool for predicting compounds with desired properties due to the expensive and resource-intensive nature and the problem of toxicity-related attrition in late phases during drug discovery and development. Lately, the interest for applying deep learning techniques has increased considerably. This investigation compares the traditional physico-chemical descriptor and machine learning-based approaches through autoencoder generated descriptors to two different descriptor-free, Simplified Molecular Input Line Entry System (SMILES) based, deep learning architectures of Bidirectional Encoder Representations from Transformers (BERT) type using the Mondrian aggregated conformal prediction method as overarching framework. The results show for the binary CATMoS non-toxic and very-toxic datasets that for the former, almost equally balanced, dataset all methods perform equally well while for the latter dataset, with an 11-fold difference between the two classes, the MolBERT model based on a large pre-trained network performs somewhat better compared to the rest with high efficiency for both classes (0.93-0.94) as well as high values for sensitivity, specificity and balanced accuracy (0.86-0.87). The descriptor-free, SMILES-based, deep learning BERT architectures seem capable of producing well-balanced predictive models with defined applicability domains. This work also demonstrates that the class imbalance problem is gracefully handled through the use of Mondrian conformal prediction without the use of over- and/or under-sampling, weighting of classes or cost-sensitive methods.
引用
收藏
页数:10
相关论文
共 43 条
  • [1] [Anonymous], MISTRA SAFECHEM
  • [2] [Anonymous], CONTINUOUS DATA DRIV
  • [3] [Anonymous], MOL GRAPH BERT
  • [4] Editorial: In silico Methods for Drug Design and Discovery
    Brogi, Simone
    Ramalho, Teodorico Castro
    Kuca, Kamil
    Medina-Franco, Jose L.
    Valko, Marian
    [J]. FRONTIERS IN CHEMISTRY, 2020, 8
  • [5] Carlsson L., 2014, Proceedings, P231
  • [6] The rise of deep learning in drug discovery
    Chen, Hongming
    Engkvist, Ola
    Wang, Yinhai
    Olivecrona, Marcus
    Blaschke, Thomas
    [J]. DRUG DISCOVERY TODAY, 2018, 23 (06) : 1241 - 1250
  • [7] Cortes-Ciriano I., 2020, ARTIF INTELL, P65, DOI [10.1039/9781788016841-00063, DOI 10.1039/9781788016841-00063]
  • [8] Cox PB, 2022, ACS MED CHEM LETT, V13, P1016, DOI 10.1021/acsmedchemlett.1c00662
  • [9] Machine Learning in Drug Discovery: A Review
    Dara, Suresh
    Dhamercherla, Swetha
    Jadav, Surender Singh
    Babu, C. H. Madhu
    Ahsan, Mohamed Jawed
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (03) : 1947 - 1999
  • [10] Innovation in the pharmaceutical industry: New estimates of R&D costs
    DiMasi, Joseph A.
    Grabowski, Henry G.
    Hansen, Ronald W.
    [J]. JOURNAL OF HEALTH ECONOMICS, 2016, 47 : 20 - 33