The impact of training set data distributions for modelling of passive intestinal absorption

被引:14
作者
Ghafourian, Taravat [1 ,2 ,3 ]
Freitas, Alex A. [4 ]
Newby, Danielle [1 ,2 ]
机构
[1] Univ Kent, Medway Sch Pharm, Chatham ME4 4TB, Kent, England
[2] Univ Greenwich, Medway Sch Pharm, Chatham ME4 4TB, Kent, England
[3] Tabriz Univ Med Sci, Drug Appl Res Ctr, Fac Pharm, Tabriz, Iran
[4] Univ Kent, Sch Comp, Canterbury CT2 7NZ, Kent, England
关键词
Intestinal absorption; QSAR; Oral absorption; Training set; Regression; Classification; IN-SILICO PREDICTIONS; CACO-2; CELL-PERMEABILITY; DRUG DISCOVERY; ADME EVALUATION; ORAL BIOAVAILABILITY; MOLECULAR-PROPERTIES; FEATURE-SELECTION; SURFACE-AREA; QSAR; CLASSIFICATION;
D O I
10.1016/j.ijpharm.2012.07.041
中图分类号
R9 [药学];
学科分类号
1007 ;
摘要
This study presents regression and classification models to predict human intestinal absorption of 645 drug and drug like compounds using percentage human intestinal values from the published dataset by Hou et al. (2007c). The problem with this dataset and other datasets in the literature is there are more highly than poorly absorbed compounds. Any models developed using these datasets will be biased towards highly absorbed compounds and not applicable for use in industry where now more compounds are likely to be poorly absorbed. The study compared two training sets, TS1, a balanced (50: 50) distribution of highly and poorly absorbed compounds created by under-sampling the majority high absorption compounds, with TS2, a randomly selected training set with biased distribution towards highly absorbed compounds. The regression results indicate that the best models were those developed using the balanced dataset (TS1). Also for classification, TS1 led to the most accurate models and the highest specificity value of 0.949. In comparison, TS2 led to the highest sensitivity with a value of 0.939. Thus, under-sampling the majority class of the highly absorbed compounds leads to a balanced training set (TS1) that can achieve more applicable in silico regression and classification models for the use in the industry. (c) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:711 / 720
页数:10
相关论文
共 50 条
  • [41] PASSIVE TRANSEPITHELIAL ABSORPTION OF THYROTROPIN-RELEASING-HORMONE (TRH) VIA A PARACELLULAR ROUTE IN CULTURED INTESTINAL AND RENAL EPITHELIAL-CELL LINES
    THWAITES, DT
    HIRST, BH
    SIMMONS, NL
    PHARMACEUTICAL RESEARCH, 1993, 10 (05) : 674 - 681
  • [42] A Study on the Impact of Data Augmentation for Training Convolutional Neural Networks in the Presence of Noisy Labels
    Pereira, Emeson
    Carneiro, Gustavo
    Cordeiro, Filipe R.
    2022 35TH SIBGRAPI CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI 2022), 2022, : 25 - 30
  • [43] DATA EXTRACTION FROM SOUND WAVES TOWARDS NEURAL NETWORK TRAINING SET
    Volna, Eva
    Jarusek, Robert
    Kotyrba, Martin
    Janosek, Michal
    Kocian, Vaclav
    MENDEL 2011 - 17TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING, 2011, : 177 - 184
  • [44] Evaluation of human intestinal absorption data and subsequent derivation of a quantitative structure-activity relationship (QSAR) with the Abraham descriptors
    Zhao, YH
    Le, J
    Abraham, MH
    Hersey, A
    Eddershaw, PJ
    Luscombe, CN
    Boutina, D
    Beck, G
    Sherborne, B
    Cooper, I
    Platts, JA
    JOURNAL OF PHARMACEUTICAL SCIENCES, 2001, 90 (06) : 749 - 784
  • [45] The Koros Basin from the Neolithic to the Hapsburgs: Linking Settlement Distributions with Pre-Regulation Hydrology Through Multiple Data Set Overlay
    Gyucha, Attila
    Duffy, Paul R.
    Frolking, Tod A.
    GEOARCHAEOLOGY-AN INTERNATIONAL JOURNAL, 2011, 26 (03): : 392 - 419
  • [46] kScore: a novel machine learning approach that is not dependent on the data structure of the training set
    Oloff, Scott
    Muegge, Ingo
    JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2007, 21 (1-3) : 87 - 95
  • [47] kScore: a novel machine learning approach that is not dependent on the data structure of the training set
    Scott Oloff
    Ingo Muegge
    Journal of Computer-Aided Molecular Design, 2007, 21 : 87 - 95
  • [48] On modelling asymmetric data using two-piece sinh-arcsinh distributions
    Rubio, F. J.
    Ogundimu, E. O.
    Hutton, J. L.
    BRAZILIAN JOURNAL OF PROBABILITY AND STATISTICS, 2016, 30 (03) : 485 - 501
  • [49] The impact of information quantity and strength of relationship between training set and validation set on accuracy of genomic estimated breeding values
    Saatchi, M.
    Miraei-Ashtiani, S. R.
    Javaremi, A. Nejati
    Moradi-Shahrebabak, M.
    Mehrabani-Yeghaneh, H.
    AFRICAN JOURNAL OF BIOTECHNOLOGY, 2010, 9 (04): : 438 - 442
  • [50] Impact of Regional Intestinal pH Modulation on Absorption of Peptide Drugs: Oral Absorption Studies of Salmon Calcitonin in Beagle Dogs
    Yong-Hee Lee
    Barbara A. Perry
    Stacy Labruno
    Hee Sang Lee
    William Stern
    Lisa M. Falzone
    Patrick J. Sinko
    Pharmaceutical Research, 1999, 16 : 1233 - 1239