Machine Learning Model for Identifying Antioxidant Proteins Using Features Calculated from Primary Sequences

被引:59
作者
Ho Thanh Lam, Luu [1 ,2 ]
Le, Ngoc Hoang [3 ]
Van Tuan, Le [4 ]
Tran Ban, Ho [5 ]
Nguyen Khanh Hung, Truong [1 ,4 ]
Nguyen, Ngan Thi Kim [6 ]
Huu Dang, Luong [7 ]
Le, Nguyen Quoc Khanh [1 ,8 ,9 ]
机构
[1] Taipei Med Univ, Coll Med, Med, Taipei 110, Taiwan
[2] Childrens Hosp 2, Ho Chi Minh City 70000, Vietnam
[3] Taipei Med Univ, Coll Biomed Engn, Grad Inst Biomed Mat & Tissue Engn, Taipei 110, Taiwan
[4] Cho Ray Hosp, Orthoped & Trauma Dept, Ho Chi Minh City 70000, Vietnam
[5] Univ Med & Pharm, Dept Pediat Surg, Ho Chi Minh City 70000, Vietnam
[6] Taipei Med Univ, Sch Nutr & Hlth Sci, Taipei 110, Taiwan
[7] Univ Med & Pharm, Dept Otolaryngol, Ho Chi Minh City 70000, Vietnam
[8] Taipei Med Univ, Coll Med, Profess Master Program Artificial Intelligence Me, Taipei 106, Taiwan
[9] Taipei Med Univ, Res Ctr Artificial Intelligence Med, Taipei 106, Taiwan
来源
BIOLOGY-BASEL | 2020年 / 9卷 / 10期
关键词
antioxidant proteins; machine learning; Random Forest; protein sequencing; feature selection; computational modeling; OXIDATIVE STRESS; FREE-RADICALS; PREDICTION; CLASSIFICATION; INFLAMMATION; CANCER;
D O I
10.3390/biology9100325
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Simple Summary Antioxidant compounds protect the human body from many kinds of diseases as well as the degeneration of age. Several micronutrients that were found in the last century such as vitamins A, C, and E have become popular in our life. Scientists are trying to find more and more antioxidant compounds not only from experimenting in the laboratory but also from assisting by the computer. Our research utilized a computational method for the swift and economic identification of antioxidant compounds. The research presents a predictor that got a high accuracy of 84.6% for the detection of antioxidants. Therefore, our predictor is promising to be a useful tool to discover a new antioxidant compound. Antioxidant proteins are involved importantly in many aspects of cellular life activities. They protect the cell and DNA from oxidative substances (such as peroxide, nitric oxide, oxygen-free radicals, etc.) which are known as reactive oxygen species (ROS). Free radical generation and antioxidant defenses are opposing factors in the human body and the balance between them is necessary to maintain a healthy body. An unhealthy routine or the degeneration of age can break the balance, leading to more ROS than antioxidants, causing damage to health. In general, the antioxidant mechanism is the combination of antioxidant molecules and ROS in a one-electron reaction. Creating computational models to promptly identify antioxidant candidates is essential in supporting antioxidant detection experiments in the laboratory. In this study, we proposed a machine learning-based model for this prediction purpose from a benchmark set of sequencing data. The experiments were conducted by using 10-fold cross-validation on the training process and validated by three different independent datasets. Different machine learning and deep learning algorithms have been evaluated on an optimal set of sequence features. Among them, Random Forest has been identified as the best model to identify antioxidant proteins with the highest performance. Our optimal model achieved high accuracy of 84.6%, as well as a balance in sensitivity (81.5%) and specificity (85.1%) for antioxidant protein identification on the training dataset. The performance results from different independent datasets also showed the significance in our model compared to previously published works on antioxidant protein identification.
引用
收藏
页码:1 / 13
页数:13
相关论文
共 37 条
[1]  
[Anonymous], 2013, COMPUT MATH METHODS
[2]  
[Anonymous], 2013, COMPUT MATH METHODS, DOI DOI 10.1155/2013/567529
[3]  
[Anonymous], 2018, INT J MOL SCI
[4]   Classification of nuclear receptors based on amino acid composition and dipeptide composition [J].
Bhasin, M ;
Raghava, GPS .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2004, 279 (22) :23262-23266
[5]   Metabolic Syndrome, Aging and Involvement of Oxidative Stress [J].
Bonomini, Francesca ;
Rodella, Luigi Fabrizio ;
Rezzani, Rita .
AGING AND DISEASE, 2015, 6 (02) :109-120
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]   Prediction of antioxidant proteins by incorporating statistical moments based features into Chou's PseAAC [J].
Butt, Ahmad Hassan ;
Rasool, Nouman ;
Khan, Yaser Daanial .
JOURNAL OF THEORETICAL BIOLOGY, 2019, 473 :1-8
[8]   SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence [J].
Cai, CZ ;
Han, LY ;
Ji, ZL ;
Chen, X ;
Chen, YZ .
NUCLEIC ACIDS RESEARCH, 2003, 31 (13) :3692-3697
[9]   iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data [J].
Chen, Zhen ;
Zhao, Pei ;
Li, Fuyi ;
Marquez-Lago, Tatiana T. ;
Leier, Andre ;
Revote, Jerico ;
Zhu, Yan ;
Powell, David R. ;
Akutsu, Tatsuya ;
Webb, Geoffrey, I ;
Chou, Kuo-Chen ;
Smith, A. Ian ;
Daly, Roger J. ;
Li, Jian ;
Song, Jiangning .
BRIEFINGS IN BIOINFORMATICS, 2020, 21 (03) :1047-1057
[10]   iFeature: a Python']Python package and web server for features extraction and selection from protein and peptide sequences [J].
Chen, Zhen ;
Zhao, Pei ;
Li, Fuyi ;
Leier, Andre ;
Marquez-Lago, Tatiana T. ;
Wang, Yanan ;
Webb, Geoffrey I. ;
Smith, A. Ian ;
Daly, Roger J. ;
Chou, Kuo-Chen ;
Song, Jiangning .
BIOINFORMATICS, 2018, 34 (14) :2499-2502