A comparison of machine learning algorithms for the prediction of Hepatitis C NS3 protease cleavage sites

被引:4
作者
Chown, Harry [1 ]
机构
[1] Univ Exeter, Dept Biosci, Stocker Rd, Exeter EX4 4QD, Devon, England
关键词
Hepatitis C; NS3; protease; peptide cleavage; machine learning; VIRUS; INHIBITORS; DISCOVERY; SCH-503034;
D O I
10.2478/ebtj-2019-0020
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Hepatitis is a global disease that is on the rise and is currently the cause of more deaths than the human immunodeficiency virus each year. As a result, there is an increasing need for antivirals. Previously, effective antivirals have been found in the form of substrate-mimetic antiviral protease inhibitors. The application of machine learning has been used to predict cleavage patterns of viral proteases to provide information for future drug design. This study has successfully applied and compared several machine learning algorithms to hepatitis C viral NS3 serine protease cleavage data. Results have found that differences in sequence-extraction methods can outweigh differences in algorithm choice. Models produced from pseudo-coded datasets all performed with high accuracy and outperformed models created with orthogonal-coded datasets. However, no single pseudo-model performed significantly better than any other. Evaluation of performance measures also show that the correct choice of model scoring system is essential for unbiased model assessment.
引用
收藏
页码:167 / 174
页数:8
相关论文
共 50 条
[1]  
[Anonymous], INT J COMPUTER SCI M
[2]   How cryo-EM is revolutionizing structural biology [J].
Bai, Xiao-Chen ;
McMullan, Greg ;
Scheres, Sjors H. W. .
TRENDS IN BIOCHEMICAL SCIENCES, 2015, 40 (01) :49-57
[3]   Support Vector Machines and Kernels for Computational Biology [J].
Ben-Hur, Asa ;
Ong, Cheng Soon ;
Sonnenburg, Soeren ;
Schoelkopf, Bernhard ;
Raetsch, Gunnar .
PLOS COMPUTATIONAL BIOLOGY, 2008, 4 (10)
[4]  
BISHOP C. M., 2006, Pattern recognition and machine learning, DOI [DOI 10.1117/1.2819119, 10.1007/978-0-387-45528-0]
[5]   Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric [J].
Boughorbel, Sabri ;
Jarray, Fethi ;
El-Anbari, Mohammed .
PLOS ONE, 2017, 12 (06)
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]   Locally adaptive dimensionality reduction for indexing large time series databases [J].
Chakrabarti, K ;
Keogh, E ;
Mehrotra, S ;
Pazzani, M .
ACM TRANSACTIONS ON DATABASE SYSTEMS, 2002, 27 (02) :188-228
[8]   EVIDENCE THAT THE N-TERMINAL DOMAIN OF NONSTRUCTURAL PROTEIN NS3 FROM YELLOW-FEVER VIRUS IS A SERINE PROTEASE RESPONSIBLE FOR SITE-SPECIFIC CLEAVAGES IN THE VIRAL POLYPROTEIN [J].
CHAMBERS, TJ ;
WEIR, RC ;
GRAKOUI, A ;
MCCOURT, DW ;
BAZAN, JF ;
FLETTERICK, RJ ;
RICE, CM .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1990, 87 (22) :8898-8902
[9]   Synthesis and biological activity of macrocyclic inhibitors of hepatitis C virus (HCV) NS3 protease [J].
Chen, KX ;
Njoroge, FG ;
Prongay, A ;
Pichardo, J ;
Madison, V ;
Girilavallabhan, V .
BIOORGANIC & MEDICINAL CHEMISTRY LETTERS, 2005, 15 (20) :4475-4478
[10]   Ten quick tips for machine learning in computational biology [J].
Chicco, Davide .
BIODATA MINING, 2017, 10