Machine Learning Model for Identifying Antioxidant Proteins Using Features Calculated from Primary Sequences

被引:55
作者
Ho Thanh Lam, Luu [1 ,2 ]
Le, Ngoc Hoang [3 ]
Van Tuan, Le [4 ]
Tran Ban, Ho [5 ]
Nguyen Khanh Hung, Truong [1 ,4 ]
Nguyen, Ngan Thi Kim [6 ]
Huu Dang, Luong [7 ]
Le, Nguyen Quoc Khanh [1 ,8 ,9 ]
机构
[1] Taipei Med Univ, Coll Med, Med, Taipei 110, Taiwan
[2] Childrens Hosp 2, Ho Chi Minh City 70000, Vietnam
[3] Taipei Med Univ, Coll Biomed Engn, Grad Inst Biomed Mat & Tissue Engn, Taipei 110, Taiwan
[4] Cho Ray Hosp, Orthoped & Trauma Dept, Ho Chi Minh City 70000, Vietnam
[5] Univ Med & Pharm, Dept Pediat Surg, Ho Chi Minh City 70000, Vietnam
[6] Taipei Med Univ, Sch Nutr & Hlth Sci, Taipei 110, Taiwan
[7] Univ Med & Pharm, Dept Otolaryngol, Ho Chi Minh City 70000, Vietnam
[8] Taipei Med Univ, Coll Med, Profess Master Program Artificial Intelligence Me, Taipei 106, Taiwan
[9] Taipei Med Univ, Res Ctr Artificial Intelligence Med, Taipei 106, Taiwan
来源
BIOLOGY-BASEL | 2020年 / 9卷 / 10期
关键词
antioxidant proteins; machine learning; Random Forest; protein sequencing; feature selection; computational modeling; OXIDATIVE STRESS; FREE-RADICALS; PREDICTION; CLASSIFICATION; INFLAMMATION; CANCER;
D O I
10.3390/biology9100325
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Simple Summary Antioxidant compounds protect the human body from many kinds of diseases as well as the degeneration of age. Several micronutrients that were found in the last century such as vitamins A, C, and E have become popular in our life. Scientists are trying to find more and more antioxidant compounds not only from experimenting in the laboratory but also from assisting by the computer. Our research utilized a computational method for the swift and economic identification of antioxidant compounds. The research presents a predictor that got a high accuracy of 84.6% for the detection of antioxidants. Therefore, our predictor is promising to be a useful tool to discover a new antioxidant compound. Antioxidant proteins are involved importantly in many aspects of cellular life activities. They protect the cell and DNA from oxidative substances (such as peroxide, nitric oxide, oxygen-free radicals, etc.) which are known as reactive oxygen species (ROS). Free radical generation and antioxidant defenses are opposing factors in the human body and the balance between them is necessary to maintain a healthy body. An unhealthy routine or the degeneration of age can break the balance, leading to more ROS than antioxidants, causing damage to health. In general, the antioxidant mechanism is the combination of antioxidant molecules and ROS in a one-electron reaction. Creating computational models to promptly identify antioxidant candidates is essential in supporting antioxidant detection experiments in the laboratory. In this study, we proposed a machine learning-based model for this prediction purpose from a benchmark set of sequencing data. The experiments were conducted by using 10-fold cross-validation on the training process and validated by three different independent datasets. Different machine learning and deep learning algorithms have been evaluated on an optimal set of sequence features. Among them, Random Forest has been identified as the best model to identify antioxidant proteins with the highest performance. Our optimal model achieved high accuracy of 84.6%, as well as a balance in sensitivity (81.5%) and specificity (85.1%) for antioxidant protein identification on the training dataset. The performance results from different independent datasets also showed the significance in our model compared to previously published works on antioxidant protein identification.
引用
收藏
页码:1 / 13
页数:13
相关论文
共 37 条
  • [1] [Anonymous], 2013, COMPUT MATH METHODS, DOI DOI 10.1155/2013/475190
  • [2] Classification of nuclear receptors based on amino acid composition and dipeptide composition
    Bhasin, M
    Raghava, GPS
    [J]. JOURNAL OF BIOLOGICAL CHEMISTRY, 2004, 279 (22) : 23262 - 23266
  • [3] Metabolic Syndrome, Aging and Involvement of Oxidative Stress
    Bonomini, Francesca
    Rodella, Luigi Fabrizio
    Rezzani, Rita
    [J]. AGING AND DISEASE, 2015, 6 (02): : 109 - 120
  • [4] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [5] Prediction of antioxidant proteins by incorporating statistical moments based features into Chou's PseAAC
    Butt, Ahmad Hassan
    Rasool, Nouman
    Khan, Yaser Daanial
    [J]. JOURNAL OF THEORETICAL BIOLOGY, 2019, 473 : 1 - 8
  • [6] SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence
    Cai, CZ
    Han, LY
    Ji, ZL
    Chen, X
    Chen, YZ
    [J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (13) : 3692 - 3697
  • [7] iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data
    Chen, Zhen
    Zhao, Pei
    Li, Fuyi
    Marquez-Lago, Tatiana T.
    Leier, Andre
    Revote, Jerico
    Zhu, Yan
    Powell, David R.
    Akutsu, Tatsuya
    Webb, Geoffrey, I
    Chou, Kuo-Chen
    Smith, A. Ian
    Daly, Roger J.
    Li, Jian
    Song, Jiangning
    [J]. BRIEFINGS IN BIOINFORMATICS, 2020, 21 (03) : 1047 - 1057
  • [8] iFeature: a Python']Python package and web server for features extraction and selection from protein and peptide sequences
    Chen, Zhen
    Zhao, Pei
    Li, Fuyi
    Leier, Andre
    Marquez-Lago, Tatiana T.
    Wang, Yanan
    Webb, Geoffrey I.
    Smith, A. Ian
    Daly, Roger J.
    Chou, Kuo-Chen
    Song, Jiangning
    [J]. BIOINFORMATICS, 2018, 34 (14) : 2499 - 2502
  • [9] Prediction of protein subcellular locations by incorporating quasi-sequence-order effect
    Chou, KC
    [J]. BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2000, 278 (02) : 477 - 483
  • [10] Prediction of protein cellular attributes using pseudo-amino acid composition
    Chou, KC
    [J]. PROTEINS-STRUCTURE FUNCTION AND GENETICS, 2001, 43 (03): : 246 - 255