ADMET evaluation in drug discovery: 21. Application and industrial validation of machine learning algorithms for Caco-2 permeability prediction

被引:1
作者
Wang, Dong [1 ]
Jin, Jieyu [1 ]
Shi, Guqin [2 ]
Bao, Jingxiao [2 ]
Wang, Zheng [2 ]
Li, Shimeng [1 ]
Pan, Peichen [1 ]
Li, Dan [1 ]
Kang, Yu [1 ]
Hou, Tingjun [1 ]
机构
[1] Zhejiang Univ, Innovat Inst Artif Intelligence Med Zhejiang, Coll Pharmaceut Sci, Hangzhou 310058, Peoples R China
[2] Shanghai Qilu Pharmaceut R&D Ctr, 576 Libing Rd, Shanghai 310115, Peoples R China
来源
JOURNAL OF CHEMINFORMATICS | 2025年 / 17卷 / 01期
基金
中国国家自然科学基金;
关键词
Caco-2; permeability; Machine learning; Matched molecular pair; MATCHED MOLECULAR PAIRS; CELL-PERMEABILITY; COMBINATION;
D O I
10.1186/s13321-025-00947-z
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The Caco-2 cell model has been widely used to assess the intestinal permeability of drug candidates in vitro, owing to its morphological and functional similarity to human enterocytes. While Caco-2 cell assay is considered safe and cost-effective, it is also characterized by being time-consuming. Therefore, computational models that achieve high accuracies in predicting Caco-2 permeability are crucial for enhancing the efficiency of oral drug development. In this study, we conducted an in-depth analysis of the characteristics of an augmented Caco-2 permeability dataset, and evaluated a diverse range of machine learning algorithms in combination with different molecular representations. The results indicated that XGBoost generally provided better predictions than comparable models for the test sets. In addition, we investigated the transferability of machine learning models trained on publicly available data to internal pharmaceutical industry datasets. Our findings, based on the Shanghai Qilu's in-house dataset, showed that the boosting models retained a degree of predictive efficacy when applied to industry data. Furthermore, Y-randomization test and applicability domain analysis were employed to assess the robustness and generalizability of these models. Matched Molecular Pair Analysis (MMPA) was utilized to extract chemical transformation rules. We believe that the model developed in this study could represent a reliable tool for assessing Caco-2 permeability during early-stage drug discovery and the chemical transformation rules derived here could provide insights for optimizing Caco-2 permeability.Scientific contributionA comprehensive validation of various machine learning algorithms combined with diverse molecular representations on a large dataset for predicting Caco-2 permeability was reported. The transferability of machine learning models trained on publicly available data to internal pharmaceutical industry datasets was also investigated. Matched molecular pair analysis was carried out to provide reasonable suggestions for researchers to improve the Caco-2 permeability of compounds.
引用
收藏
页数:14
相关论文
共 47 条
  • [1] Advances in Oral Drug Delivery
    Alqahtani, Mohammed S.
    Kazi, Mohsin
    Alsenaidy, Mohammad A.
    Ahmad, Muhammad Z.
    [J]. FRONTIERS IN PHARMACOLOGY, 2021, 12
  • [2] Development of a 7-day, 96-well Caco-2 permeability assay with high-throughput direct UV compound analysis
    Alsenz, J
    Haenel, E
    [J]. PHARMACEUTICAL RESEARCH, 2003, 20 (12) : 1961 - 1969
  • [3] Andrew A.M., 2001, AI MAG, V32, P1, DOI [DOI 10.1108/K.2001.30.1.103.6, 10.1609/aimag.v22i2.1566, DOI 10.1609/AIMAG.V22I2.1566]
  • [4] Caco-2 monolayers in experimental and theoretical predictions of drug transport (Reprinted from Advanced Drug Delivery Reviews, vol 22, pg 67-84, 1996)
    Artursson, P
    Palm, K
    Luthman, K
    [J]. ADVANCED DRUG DELIVERY REVIEWS, 2001, 46 (1-3) : 27 - 43
  • [5] PAMPA - Critical factors for better predictions of absorption
    Avdeef, Alex
    Bendels, Stefanie
    Di, Li
    Faller, Bernard
    Kansy, Manfred
    Sugano, Kiyohiko
    Yamauchi, Yukinori
    [J]. JOURNAL OF PHARMACEUTICAL SCIENCES, 2007, 96 (11) : 2893 - 2909
  • [6] State of the Art and Uses for the Biopharmaceutics Drug Disposition Classification System (BDDCS): New Additions, Revisions, and Citation References
    Bocci, Giovanni
    Oprea, Tudor I.
    Benet, Leslie Z.
    [J]. AAPS JOURNAL, 2022, 24 (02)
  • [7] Bohets Hilde, 2001, Current Topics in Medicinal Chemistry, V1, P367, DOI 10.2174/1568026013394886
  • [8] Breiman L, 1996, MACH LEARN, V24, P123, DOI 10.1023/A:1018054314350
  • [9] XGBoost: A Scalable Tree Boosting System
    Chen, Tianqi
    Guestrin, Carlos
    [J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 785 - 794
  • [10] QSAR Modeling: Where Have You Been? Where Are You Going To?
    Cherkasov, Artem
    Muratov, Eugene N.
    Fourches, Denis
    Varnek, Alexandre
    Baskin, Igor I.
    Cronin, Mark
    Dearden, John
    Gramatica, Paola
    Martin, Yvonne C.
    Todeschini, Roberto
    Consonni, Viviana
    Kuz'min, Victor E.
    Cramer, Richard
    Benigni, Romualdo
    Yang, Chihae
    Rathman, James
    Terfloth, Lothar
    Gasteiger, Johann
    Richard, Ann
    Tropsha, Alexander
    [J]. JOURNAL OF MEDICINAL CHEMISTRY, 2014, 57 (12) : 4977 - 5010