Weighted similarity-based clustering of chemical structures and bioactivity data in early drug discovery

被引:5
作者
Perualila-Tan, Nolen Joy [1 ]
Shkedy, Ziv [1 ]
Talloen, Willem [2 ]
Gohlmann, Hinrich W. H. [2 ]
Van Moerbeke, Marijke [1 ]
Kasim, Adetayo [3 ]
机构
[1] Hasselt Univ, Ctr Stat, Interuniv Inst Biostat & Stat Bioinformat I BioSt, Hasselt, Belgium
[2] Janssen Pharmaceut NV, B-2340 Beerse, Belgium
[3] Univ Durham, Wolfson Res Inst Hlth & Wellbeing, Durham DH1 3HP, England
关键词
Bioactivity; chemical structure; clustering; transcriptomic; GENE-EXPRESSION; GROWTH; MODEL; ADENOCARCINOMA; PREDICTION;
D O I
10.1142/S0219720016500189
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The modern process of discovering candidate molecules in early drug discovery phase includes a wide range of approaches to extract vital information from the intersection of biology and chemistry. A typical strategy in compound selection involves compound clustering based on chemical similarity to obtain representative chemically diverse compounds (not incorporating potency information). In this paper, we propose an integrative clustering approach that makes use of both biological (compound efficacy) and chemical (structural features) data sources for the purpose of discovering a subset of compounds with aligned structural and biological properties. The datasets are integrated at the similarity level by assigning complementary weights to produce a weighted similarity matrix, serving as a generic input in any clustering algorithm. This new analysis work flow is semi-supervised method since, after the determination of clusters, a secondary analysis is performed wherein it finds differentially expressed genes associated to the derived integrated cluster(s) to further explain the compound-induced biological effects inside the cell. In this paper, datasets from two drug development oncology projects are used to illustrate the usefulness of the weighted similarity-based clustering approach to integrate multi-source high-dimensional information to aid drug discovery. Compounds that are structurally and biologically similar to the reference compounds are discovered using this proposed integrative approach.
引用
收藏
页数:22
相关论文
共 42 条
[1]  
Amaratunga D, 2014, WILEY SERIES PROBABI
[2]  
[Anonymous], 2005, P 18 INT C NEUR INF
[3]  
[Anonymous], 2005, FINDING GROUPS DATA, DOI DOI 10.1002/9780470316801
[4]  
[Anonymous], 2008, Handbook of molecular descriptors
[5]  
[Anonymous], 2007, ACM Transactions on Knowledge Discovery from Data, DOI [DOI 10.1145/1217299.1217303, 10.1145/1217299.1217303]
[6]   Elevated preoperative serum levels of angiogenic cytokines correlate to larger primary tumours and poorer survival in non-small cell lung cancer patients [J].
Brattström, D ;
Bergqvist, M ;
Hesselius, P ;
Larsson, A ;
Lamberg, K ;
Wernlund, J ;
Brodin, O ;
Wagenius, G .
LUNG CANCER, 2002, 37 (01) :57-63
[7]   Using quantitative structure-activity relationships (QSAR) to predict toxic endpoints for polycyclic aromatic hydrocarbons (PAH) [J].
Bruce, Erica D. ;
Autenrieth, Robin L. ;
Burghardt, Robert C. ;
Donnelly, K. C. ;
McDonald, Thomas J. .
JOURNAL OF TOXICOLOGY AND ENVIRONMENTAL HEALTH-PART A-CURRENT ISSUES, 2008, 71 (16) :1073-1084
[8]   Integrative clustering methods for high-dimensional molecular data [J].
Chalise, Prabhakar ;
Koestler, Devin C. ;
Bimali, Milan ;
Yu, Qing ;
Fridley, Brooke L. .
TRANSLATIONAL CANCER RESEARCH, 2014, 3 (03) :202-216
[9]   Investigating the correlations among the chemical structures, bioactivity profiles and molecular targets of small molecules [J].
Cheng, Tiejun ;
Wang, Yanli ;
Bryant, Stephen H. .
BIOINFORMATICS, 2010, 26 (22) :2881-2888
[10]   In silico prediction of drug toxicity [J].
Dearden, JC .
JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2003, 17 (02) :119-127