StackHCV: a web-based integrative machine-learning framework for large-scale identification of hepatitis C virus NS5B inhibitors

被引:22
作者
Malik, Aijaz Ahmad [1 ]
Chotpatiwetchkul, Warot [2 ]
Phanus-umporn, Chuleeporn [1 ]
Nantasenamat, Chanin [1 ]
Charoenkwan, Phasit [3 ]
Shoombuatong, Watshara [1 ]
机构
[1] Mahidol Univ, Fac Med Technol, Ctr Data Min & Biomed Informat, Bangkok 10700, Thailand
[2] King Mongkuts Inst Technol Ladkrabang, Sch Sci, Dept Chem, Appl Computat Chem Res Unit, Bangkok 10520, Thailand
[3] Chiang Mai Univ, Coll Arts Media & Technol, Modern Management & Informat Technol, Chiang Mai 50200, Thailand
关键词
Flavivirus; Hepatitis C virus; NS5B; Cheminformatics; Machine learning; Stacking strategy; Feature representation learning; LIFE-CYCLE; POLYMERASE; SUBSTRUCTURES; SERVER;
D O I
10.1007/s10822-021-00418-1
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Fast and accurate identification of inhibitors with potency against HCV NS5B polymerase is currently a challenging task. As conventional experimental methods is the gold standard method for the design and development of new HCV inhibitors, they often require costly investment of time and resources. In this study, we develop a novel machine learning-based meta-predictor (termed StackHCV) for accurate and large-scale identification of HCV inhibitors. Unlike the existing method, which is based on single-feature-based approach, we first constructed a pool of various baseline models by employing a wide range of heterogeneous molecular fingerprints with five popular machine learning algorithms (k-nearest neighbor, multi-layer perceptron, partial least squares, random forest and support vectors machine). Secondly, we integrated these baseline models in order to develop the final meta-based model by means of the stacking strategy. Extensive benchmarking experiments showed that StackHCV achieved a more accurate and stable performance as compared to its constituent baseline models on the training dataset and also outperformed the existing predictor on the independent test dataset. To facilitate the high-throughput identification of HCV inhibitors, we built a web server that can be freely accessed at . It is expected that StackHCV could be a useful tool for fast and precise identification of potential drugs against HCV NS5B particularly for liver cancer therapy and other clinical applications.
引用
收藏
页码:1037 / 1053
页数:17
相关论文
共 67 条
[1]   Crystal structure of the RNA-dependent RNA polymerase of hepatitis C virus [J].
Ago, H ;
Adachi, T ;
Yoshida, A ;
Yamamoto, M ;
Habuka, N ;
Yatsunami, K ;
Miyano, M .
STRUCTURE, 1999, 7 (11) :1417-1426
[2]   ATOM PAIRS AS MOLECULAR-FEATURES IN STRUCTURE ACTIVITY STUDIES - DEFINITION AND APPLICATIONS [J].
CARHART, RE ;
SMITH, DH ;
VENKATARAGHAVAN, R .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1985, 25 (02) :64-73
[3]   StackIL6: a stacking ensemble model for improving the prediction of IL-6 inducing peptides [J].
Charoenkwan, Phasit ;
Chiangjong, Wararat ;
Nantasenamat, Chanin ;
Hasan, Md Mehedi ;
Manavalan, Balachandran ;
Shoombuatong, Watshara .
BRIEFINGS IN BIOINFORMATICS, 2021, 22 (06)
[4]   BERT4Bitter: a bidirectional encoder representations from transformers (BERT)-based model for improving the prediction of bitter peptides [J].
Charoenkwan, Phasit ;
Nantasenamat, Chanin ;
Hasan, Md Mehedi ;
Manavalan, Balachandran ;
Shoombuatong, Watshara .
BIOINFORMATICS, 2021, 37 (17) :2556-2562
[5]   Improved prediction and characterization of anticancer activities of peptides using a novel flexible scoring card method [J].
Charoenkwan, Phasit ;
Chiangjong, Wararat ;
Lee, Vannajan Sanghiran ;
Nantasenamat, Chanin ;
Hasan, Md Mehedi ;
Shoombuatong, Watshara .
SCIENTIFIC REPORTS, 2021, 11 (01)
[6]   iUmami-SCM: A Novel Sequence-Based Predictor for Prediction and Analysis of Umami Peptides Using a Scoring Card Method with Propensity Scores of Dipeptides [J].
Charoenkwan, Phasit ;
Yana, Janchai ;
Nantasenamat, Chanin ;
Hasan, Mehedi ;
Shoombuatong, Watshara .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2020, 60 (12) :6666-6678
[7]   iAMY-SCM: Improved prediction and analysis of amyloid proteins using a scoring card method with propensity scores of dipeptides [J].
Charoenkwan, Phasit ;
Kanthawong, Sakawrat ;
Nantasenamat, Chanin ;
Hasan, Md Mehedi ;
Shoombuatong, Watshara .
GENOMICS, 2021, 113 (01) :689-698
[8]   iDPPIV-SCM: A Sequence-Based Predictor for Identifying and Analyzing Dipeptidyl Peptidase IV (DPP-IV) Inhibitory Peptides Using a Scoring Card Method [J].
Charoenkwan, Phasit ;
Kanthawong, Sakawrat ;
Nantasenamat, Chanin ;
Hasan, Mehedi ;
Shoombuatong, Watshara .
JOURNAL OF PROTEOME RESEARCH, 2020, 19 (10) :4125-4136
[9]   Meta-iPVP: a sequence-based meta-predictor for improving the prediction of phage virion proteins using effective feature representation [J].
Charoenkwan, Phasit ;
Nantasenamat, Chanin ;
Hasan, Md. Mehedi ;
Shoombuatong, Watshara .
JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2020, 34 (10) :1105-1116
[10]   iBitter-SCM: Identi fication and characterization of bitter peptides using a scoring card method with propensity scores of dipeptides [J].
Charoenkwan, Phasit ;
Yana, Janchai ;
Schaduangrat, Nalini ;
Nantasenamat, Chanin ;
Hasan, Md Mehedi ;
Shoombuatong, Watshara .
GENOMICS, 2020, 112 (04) :2813-2822