The impact of training set data distributions for modelling of passive intestinal absorption

被引:14
作者
Ghafourian, Taravat [1 ,2 ,3 ]
Freitas, Alex A. [4 ]
Newby, Danielle [1 ,2 ]
机构
[1] Univ Kent, Medway Sch Pharm, Chatham ME4 4TB, Kent, England
[2] Univ Greenwich, Medway Sch Pharm, Chatham ME4 4TB, Kent, England
[3] Tabriz Univ Med Sci, Drug Appl Res Ctr, Fac Pharm, Tabriz, Iran
[4] Univ Kent, Sch Comp, Canterbury CT2 7NZ, Kent, England
关键词
Intestinal absorption; QSAR; Oral absorption; Training set; Regression; Classification; IN-SILICO PREDICTIONS; CACO-2; CELL-PERMEABILITY; DRUG DISCOVERY; ADME EVALUATION; ORAL BIOAVAILABILITY; MOLECULAR-PROPERTIES; FEATURE-SELECTION; SURFACE-AREA; QSAR; CLASSIFICATION;
D O I
10.1016/j.ijpharm.2012.07.041
中图分类号
R9 [药学];
学科分类号
1007 ;
摘要
This study presents regression and classification models to predict human intestinal absorption of 645 drug and drug like compounds using percentage human intestinal values from the published dataset by Hou et al. (2007c). The problem with this dataset and other datasets in the literature is there are more highly than poorly absorbed compounds. Any models developed using these datasets will be biased towards highly absorbed compounds and not applicable for use in industry where now more compounds are likely to be poorly absorbed. The study compared two training sets, TS1, a balanced (50: 50) distribution of highly and poorly absorbed compounds created by under-sampling the majority high absorption compounds, with TS2, a randomly selected training set with biased distribution towards highly absorbed compounds. The regression results indicate that the best models were those developed using the balanced dataset (TS1). Also for classification, TS1 led to the most accurate models and the highest specificity value of 0.949. In comparison, TS2 led to the highest sensitivity with a value of 0.939. Thus, under-sampling the majority class of the highly absorbed compounds leads to a balanced training set (TS1) that can achieve more applicable in silico regression and classification models for the use in the industry. (c) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:711 / 720
页数:10
相关论文
共 50 条
  • [21] Data Mining and Modelling of Charpy Impact Energy for Alloy Steels Using Fuzzy Rough Sets
    Colas-Marquez, R.
    Mahfouf, M.
    IFAC PAPERSONLINE, 2017, 50 (01): : 14970 - 14975
  • [22] Polar interactions drug phospholipids estimated by IAM-HPLC vs cultured cell line passage data: Their relationships and comparison of their effectiveness in predicting drug human intestinal absorption
    Grumetto, Lucia
    Russo, Giacomo
    Barbato, Francesco
    INTERNATIONAL JOURNAL OF PHARMACEUTICS, 2016, 500 (1-2) : 275 - 290
  • [23] An Input Data Set Compression Method for Improving the Training Ability of Neural Networks
    Tusor, Balazs
    Varkonyi-Koczy, Annamaria R.
    Rudas, Imre J.
    Klie, Gabor
    Kocsis, Gabor
    2012 IEEE INTERNATIONAL INSTRUMENTATION AND MEASUREMENT TECHNOLOGY CONFERENCE (I2MTC), 2012, : 1774 - 1779
  • [24] A new in vitro system for evaluation of passive intestinal drug absorption: Establishment of a double artificial membrane permeation assay
    Kataoka, Makoto
    Tsuneishi, Saki
    Maeda, Yukako
    Masaoka, Yoshie
    Sakuma, Shinji
    Yamashita, Shinji
    EUROPEAN JOURNAL OF PHARMACEUTICS AND BIOPHARMACEUTICS, 2014, 88 (03) : 840 - 846
  • [25] Choosing an appropriate training set size when using existing data to train neural networks for land cover segmentation
    Ning, Huan
    Li, Zhenlong
    Wang, Cuizhen
    Yang, Lina
    ANNALS OF GIS, 2020, 26 (04) : 329 - 342
  • [26] Modelling of Drug Disposition Kinetics in In Vitro Intestinal Absorption Cell Models
    Heikkinen, Aki T.
    Korjamo, Timo
    Monkkonen, Jukka
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2010, 106 (03) : 180 - 188
  • [27] Suppression Based Immune Mechanism to Find a Representative Training Set in Data Classification Tasks
    Figueredo, Grazziela P.
    Ebecken, Nelson F. F.
    Barbosa, Helio J. C.
    GECCO 2007: GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, VOL 1 AND 2, 2007, : 171 - 171
  • [28] ALS Point Cloud Classification With Small Training Data Set Based on Transfer Learning
    Zhao, Chuan
    Guo, Haitao
    Lu, Jun
    Yu, Donghang
    Li, Daoji
    Chen, Xiaowei
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2020, 17 (08) : 1406 - 1410
  • [29] An approach to constructing effective training data for a classification model to evaluate the reliability of a passive safety system
    Jin, Kyungho
    Kim, Hyeonmin
    Ryu, Seunghyoung
    Kim, Seunggeun
    Park, Jinkyun
    RELIABILITY ENGINEERING & SYSTEM SAFETY, 2022, 222
  • [30] Passive transepithelial diltiazem absorption across intestinal tissue leading to tight junction openings
    Brayden, DJ
    Creed, E
    Meehan, E
    OMalley, KE
    JOURNAL OF CONTROLLED RELEASE, 1996, 38 (2-3) : 193 - 203