The impact of training set data distributions for modelling of passive intestinal absorption

被引:14
作者
Ghafourian, Taravat [1 ,2 ,3 ]
Freitas, Alex A. [4 ]
Newby, Danielle [1 ,2 ]
机构
[1] Univ Kent, Medway Sch Pharm, Chatham ME4 4TB, Kent, England
[2] Univ Greenwich, Medway Sch Pharm, Chatham ME4 4TB, Kent, England
[3] Tabriz Univ Med Sci, Drug Appl Res Ctr, Fac Pharm, Tabriz, Iran
[4] Univ Kent, Sch Comp, Canterbury CT2 7NZ, Kent, England
关键词
Intestinal absorption; QSAR; Oral absorption; Training set; Regression; Classification; IN-SILICO PREDICTIONS; CACO-2; CELL-PERMEABILITY; DRUG DISCOVERY; ADME EVALUATION; ORAL BIOAVAILABILITY; MOLECULAR-PROPERTIES; FEATURE-SELECTION; SURFACE-AREA; QSAR; CLASSIFICATION;
D O I
10.1016/j.ijpharm.2012.07.041
中图分类号
R9 [药学];
学科分类号
1007 ;
摘要
This study presents regression and classification models to predict human intestinal absorption of 645 drug and drug like compounds using percentage human intestinal values from the published dataset by Hou et al. (2007c). The problem with this dataset and other datasets in the literature is there are more highly than poorly absorbed compounds. Any models developed using these datasets will be biased towards highly absorbed compounds and not applicable for use in industry where now more compounds are likely to be poorly absorbed. The study compared two training sets, TS1, a balanced (50: 50) distribution of highly and poorly absorbed compounds created by under-sampling the majority high absorption compounds, with TS2, a randomly selected training set with biased distribution towards highly absorbed compounds. The regression results indicate that the best models were those developed using the balanced dataset (TS1). Also for classification, TS1 led to the most accurate models and the highest specificity value of 0.949. In comparison, TS2 led to the highest sensitivity with a value of 0.939. Thus, under-sampling the majority class of the highly absorbed compounds leads to a balanced training set (TS1) that can achieve more applicable in silico regression and classification models for the use in the industry. (c) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:711 / 720
页数:10
相关论文
共 50 条
  • [1] The Proportion for Splitting Data into Training and Test Set for the Bootstrap in Classification Problems
    Vrigazova, Borislava
    BUSINESS SYSTEMS RESEARCH JOURNAL, 2021, 12 (01): : 228 - 242
  • [2] Intestinal Absorption of Miltefosine: Contribution of Passive Paracellular Transport
    Cécile Ménez
    Marion Buyse
    Christophe Dugave
    Robert Farinotti
    Gillian Barratt
    Pharmaceutical Research, 2007, 24 : 546 - 554
  • [3] Intestinal absorption of miltefosine:: Contribution of passive paracellular transport
    Menez, Cecile
    Buyse, Marion
    Dugave, Christophe
    Farinotti, Robert
    Barratt, Gillian
    PHARMACEUTICAL RESEARCH, 2007, 24 (03) : 546 - 554
  • [4] Modelling intestinal absorption of salbutamol sulphate in rats
    Valenzuela, B.
    Lopez-Pintor, E.
    Perez-Ruixo, J. J.
    Nacher, A.
    Martin-Villodre, A.
    Casabo, V. G.
    INTERNATIONAL JOURNAL OF PHARMACEUTICS, 2006, 314 (01) : 21 - 30
  • [5] Perspective on improving passive human intestinal absorption
    Yalkowsky, Samuel H.
    JOURNAL OF PHARMACEUTICAL SCIENCES, 2012, 101 (09) : 3047 - 3050
  • [6] A Comprehensive Analysis of the Impact of Selecting the Training Set Elements on the Correctness of Classification for Highly Variable Ecological Data
    Kiersztyn, Adam
    Lopucki, Rafal
    Kiersztyn, Krystyna
    Karczmarek, Pawel
    Powroznik, Pawel
    Czerwinski, Dariusz
    Pedrycz, Witold
    IEEE CIS INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS 2021 (FUZZ-IEEE), 2021,
  • [7] Separation of data on the training and test set for modelling: a case study for modelling of five colour properties of a white pigment
    Rajer-Kanduc, K
    Zupan, J
    Majcen, N
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2003, 65 (02) : 221 - 229
  • [8] Vitamin D intestinal absorption is not a simple passive diffusion: Evidences for involvement of cholesterol transporters
    Reboul, Emmanuelle
    Goncalves, Aurelie
    Comera, Christine
    Bott, Romain
    Nowicki, Marion
    Landrier, Jean-Francois
    Jourdheuil-Rahmani, Dominique
    Dufour, Claire
    Collet, Xavier
    Borel, Patrick
    MOLECULAR NUTRITION & FOOD RESEARCH, 2011, 55 (05) : 691 - 702
  • [9] Simulation modelling of human intestinal absorption using Caco-2 permeability and kinetic solubility data for early drug discovery
    Thomas, Simon
    Brightman, Frances
    Gill, Helen
    Lee, Sally
    Pufong, Boris
    JOURNAL OF PHARMACEUTICAL SCIENCES, 2008, 97 (10) : 4557 - 4574
  • [10] Dealing With Redundant Features and Inconsistent Training Data in Electronic Nose: A Rough Set Based Approach
    Bag, Anil Kumar
    Tudu, Bipan
    Bhattacharyya, Nabarun
    Bandyopadhyay, Rajib
    IEEE SENSORS JOURNAL, 2014, 14 (03) : 758 - 767