The impact of training set data distributions for modelling of passive intestinal absorption

被引:14
作者
Ghafourian, Taravat [1 ,2 ,3 ]
Freitas, Alex A. [4 ]
Newby, Danielle [1 ,2 ]
机构
[1] Univ Kent, Medway Sch Pharm, Chatham ME4 4TB, Kent, England
[2] Univ Greenwich, Medway Sch Pharm, Chatham ME4 4TB, Kent, England
[3] Tabriz Univ Med Sci, Drug Appl Res Ctr, Fac Pharm, Tabriz, Iran
[4] Univ Kent, Sch Comp, Canterbury CT2 7NZ, Kent, England
关键词
Intestinal absorption; QSAR; Oral absorption; Training set; Regression; Classification; IN-SILICO PREDICTIONS; CACO-2; CELL-PERMEABILITY; DRUG DISCOVERY; ADME EVALUATION; ORAL BIOAVAILABILITY; MOLECULAR-PROPERTIES; FEATURE-SELECTION; SURFACE-AREA; QSAR; CLASSIFICATION;
D O I
10.1016/j.ijpharm.2012.07.041
中图分类号
R9 [药学];
学科分类号
1007 ;
摘要
This study presents regression and classification models to predict human intestinal absorption of 645 drug and drug like compounds using percentage human intestinal values from the published dataset by Hou et al. (2007c). The problem with this dataset and other datasets in the literature is there are more highly than poorly absorbed compounds. Any models developed using these datasets will be biased towards highly absorbed compounds and not applicable for use in industry where now more compounds are likely to be poorly absorbed. The study compared two training sets, TS1, a balanced (50: 50) distribution of highly and poorly absorbed compounds created by under-sampling the majority high absorption compounds, with TS2, a randomly selected training set with biased distribution towards highly absorbed compounds. The regression results indicate that the best models were those developed using the balanced dataset (TS1). Also for classification, TS1 led to the most accurate models and the highest specificity value of 0.949. In comparison, TS2 led to the highest sensitivity with a value of 0.939. Thus, under-sampling the majority class of the highly absorbed compounds leads to a balanced training set (TS1) that can achieve more applicable in silico regression and classification models for the use in the industry. (c) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:711 / 720
页数:10
相关论文
共 50 条
  • [31] Multiple Transport Mechanisms Involved in the Intestinal Absorption of Metformin: Impact on the Nonlinear Absorption Kinetics
    Shirasaka, Yoshiyuki
    Seki, Maria
    Hatakeyama, Marie
    Kurokawa, Yuko
    Uchiyama, Hiroki
    Takemura, Miyuki
    Yasugi, Yugo
    Kishimoto, Hisanao
    Tamai, Ikumi
    Wang, Joanne
    Inoue, Katsuhisa
    JOURNAL OF PHARMACEUTICAL SCIENCES, 2022, 111 (05) : 1531 - 1541
  • [32] Physiologically Based Absorption Modelling to Explore the Impact of Food and Gastric pH Changes on the Pharmacokinetics of Entrectinib
    Parrott, Neil
    Stillhart, Cordula
    Lindenberg, Marc
    Wagner, Bjoern
    Kowalski, Karey
    Guerini, Elena
    Djebli, Nassim
    Meneses-Lorente, Georgina
    AAPS JOURNAL, 2020, 22 (04)
  • [33] Flotation modelling based on floatability distributions regressed from routine data
    Oosthuizen, Daniel J.
    Craig, Ian K.
    IFAC PAPERSONLINE, 2018, 51 (21): : 105 - 110
  • [34] An assessment method for the impact of missing data in the rough set-based decision fusion
    Han, Shan
    Jin, Xiaoning
    Li, Jianxun
    INTELLIGENT DATA ANALYSIS, 2016, 20 (06) : 1267 - 1284
  • [35] Keeping walls straight: data model and training set size matter for deep learning in building generalization
    Fu, Cheng
    Zhou, Zhiyong
    Feng, Yu
    Weibel, Robert
    CARTOGRAPHY AND GEOGRAPHIC INFORMATION SCIENCE, 2024, 51 (01) : 130 - 145
  • [36] Removal of Inconsistent Training Data in Electronic Nose Using Rough Set
    Bag, Anil Kumar
    Tudu, Bipan
    Bhattacharyya, Nabarun
    Bandyopadhyay, Rajib
    ADVANCES IN COMPUTING AND INFORMATION TECHNOLOGY, VOL 2, 2013, 177 : 115 - +
  • [37] Modelling of extended objects using sparse multi-aspect High Range Resolution radar data set
    Fasoula, A.
    van Genderen, P.
    IET RADAR SONAR AND NAVIGATION, 2011, 5 (07) : 756 - 768
  • [38] Protecting Machine Learning Models from Training Data Set Extraction
    Kalinin, M. O.
    Muryleva, A. A.
    Platonov, V. V.
    AUTOMATIC CONTROL AND COMPUTER SCIENCES, 2024, 58 (08) : 1234 - 1241
  • [39] The shape of our gut: Dissecting its impact on drug absorption in a 3D bioprinted intestinal model
    Macedo, Maria Helena
    Torras, Nuria
    Garcia-Diaz, Maria
    Barrias, Cristina
    Sarmento, Bruno
    Martinez, Elena
    BIOMATERIALS ADVANCES, 2023, 153
  • [40] Impact of Training Set Configurations for Differentiating Plantation Forest Genera with Sentinel-2 Imagery and Machine Learning
    Higgs, Caley
    van Niekerk, Adriaan
    REMOTE SENSING, 2022, 14 (16)