Utilizing random Forest QSAR models with optimized parameters for target identification and its application to target-fishing server

被引：40

作者：

Lee, Kyoungyeul ^{[2
]}

Lee, Minho ^{[1
]}

Kim, Dongsup ^{[2
]}

机构：

[1] Catholic Univ Korea, Coll Med, Catholic Precis Med Res Ctr, 222 Banpo Daero, Seoul 06591, South Korea

[2] Korea Adv Inst Sci & Technol, Dept Bio & Brain Engn, 291 Daehak Ro, Daejeon 34141, South Korea

来源：

BMC BIOINFORMATICS | 2017年 / 18卷

基金：

新加坡国家研究基金会;

关键词：

Virtual screening; Target identification; SAR modeling; Random forest; Extended connectivity fingerprint; Target fishing server; MOLECULES; POLYPHARMACOLOGY; PHARMACOLOGY; PREDICTION; PARADIGM; TOOL;

D O I：

10.1186/s12859-017-1960-x

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Background: The identification of target molecules is important for understanding the mechanism of "target deconvolution" in phenotypic screening and "polypharmacology" of drugs. Because conventional methods of identifying targets require time and cost, in-silico target identification has been considered an alternative solution. One of the well-known in-silico methods of identifying targets involves structure activity relationships (SARs). SARs have advantages such as low computational cost and high feasibility; however, the data dependency in the SAR approach causes imbalance of active data and ambiguity of inactive data throughout targets. Results: We developed a ligand-based virtual screening model comprising 1121 target SAR models built using a random forest algorithm. The performance of each target model was tested by employing the ROC curve and the mean score using an internal five-fold cross validation. Moreover, recall rates for top-k targets were calculated to assess the performance of target ranking. A benchmark model using an optimized sampling method and parameters was examined via external validation set. The result shows recall rates of 67.6% and 73.9% for top-11 (1% of the total targets) and top-33, respectively. We provide a website for users to search the top-k targets for query ligands available publicly at http://rfqsar.kaist.ac.kr. Conclusions: The target models that we built can be used for both predicting the activity of ligands toward each target and ranking candidate targets for a query ligand using a unified scoring scheme. The scores are additionally fitted to the probability so that users can estimate how likely a ligand-target interaction is active. The user interface of our web site is user friendly and intuitive, offering useful information and cross references.

引用

页数：12

共 35 条

[1]

[Anonymous], GETTING STARTED RDKI

[2]

[Anonymous], RANDOMFORESTCLASSIFI

[3] The ChEMBL bioactivity database: an update [J].