Machine Learning Model for Identifying Antioxidant Proteins Using Features Calculated from Primary Sequences

被引:55
作者
Ho Thanh Lam, Luu [1 ,2 ]
Le, Ngoc Hoang [3 ]
Van Tuan, Le [4 ]
Tran Ban, Ho [5 ]
Nguyen Khanh Hung, Truong [1 ,4 ]
Nguyen, Ngan Thi Kim [6 ]
Huu Dang, Luong [7 ]
Le, Nguyen Quoc Khanh [1 ,8 ,9 ]
机构
[1] Taipei Med Univ, Coll Med, Med, Taipei 110, Taiwan
[2] Childrens Hosp 2, Ho Chi Minh City 70000, Vietnam
[3] Taipei Med Univ, Coll Biomed Engn, Grad Inst Biomed Mat & Tissue Engn, Taipei 110, Taiwan
[4] Cho Ray Hosp, Orthoped & Trauma Dept, Ho Chi Minh City 70000, Vietnam
[5] Univ Med & Pharm, Dept Pediat Surg, Ho Chi Minh City 70000, Vietnam
[6] Taipei Med Univ, Sch Nutr & Hlth Sci, Taipei 110, Taiwan
[7] Univ Med & Pharm, Dept Otolaryngol, Ho Chi Minh City 70000, Vietnam
[8] Taipei Med Univ, Coll Med, Profess Master Program Artificial Intelligence Me, Taipei 106, Taiwan
[9] Taipei Med Univ, Res Ctr Artificial Intelligence Med, Taipei 106, Taiwan
来源
BIOLOGY-BASEL | 2020年 / 9卷 / 10期
关键词
antioxidant proteins; machine learning; Random Forest; protein sequencing; feature selection; computational modeling; OXIDATIVE STRESS; FREE-RADICALS; PREDICTION; CLASSIFICATION; INFLAMMATION; CANCER;
D O I
10.3390/biology9100325
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Simple Summary Antioxidant compounds protect the human body from many kinds of diseases as well as the degeneration of age. Several micronutrients that were found in the last century such as vitamins A, C, and E have become popular in our life. Scientists are trying to find more and more antioxidant compounds not only from experimenting in the laboratory but also from assisting by the computer. Our research utilized a computational method for the swift and economic identification of antioxidant compounds. The research presents a predictor that got a high accuracy of 84.6% for the detection of antioxidants. Therefore, our predictor is promising to be a useful tool to discover a new antioxidant compound. Antioxidant proteins are involved importantly in many aspects of cellular life activities. They protect the cell and DNA from oxidative substances (such as peroxide, nitric oxide, oxygen-free radicals, etc.) which are known as reactive oxygen species (ROS). Free radical generation and antioxidant defenses are opposing factors in the human body and the balance between them is necessary to maintain a healthy body. An unhealthy routine or the degeneration of age can break the balance, leading to more ROS than antioxidants, causing damage to health. In general, the antioxidant mechanism is the combination of antioxidant molecules and ROS in a one-electron reaction. Creating computational models to promptly identify antioxidant candidates is essential in supporting antioxidant detection experiments in the laboratory. In this study, we proposed a machine learning-based model for this prediction purpose from a benchmark set of sequencing data. The experiments were conducted by using 10-fold cross-validation on the training process and validated by three different independent datasets. Different machine learning and deep learning algorithms have been evaluated on an optimal set of sequence features. Among them, Random Forest has been identified as the best model to identify antioxidant proteins with the highest performance. Our optimal model achieved high accuracy of 84.6%, as well as a balance in sensitivity (81.5%) and specificity (85.1%) for antioxidant protein identification on the training dataset. The performance results from different independent datasets also showed the significance in our model compared to previously published works on antioxidant protein identification.
引用
收藏
页码:1 / 13
页数:13
相关论文
共 37 条
  • [11] Role of oxidative stress in cardiovascular diseases
    Dhalla, NS
    Temsah, RM
    Netticadan, T
    [J]. JOURNAL OF HYPERTENSION, 2000, 18 (06) : 655 - 673
  • [12] PREDICTION OF PROTEIN-FOLDING CLASS USING GLOBAL DESCRIPTION OF AMINO-ACID-SEQUENCE
    DUBCHAK, I
    MUCHNIK, I
    HOLBROOK, SR
    KIM, SH
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1995, 92 (19) : 8700 - 8704
  • [13] Using deep neural networks and biological subwords to detect protein S-sulfenylation sites
    Duyen Thi Do
    Thanh Quynh Trang Le
    Nguyen Quoc Khanh Le
    [J]. BRIEFINGS IN BIOINFORMATICS, 2021, 22 (03)
  • [14] Identifying Antioxidant Proteins by Using Optimal Dipeptide Compositions
    Feng, Pengmian
    Chen, Wei
    Lin, Hao
    [J]. INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES, 2016, 8 (02) : 186 - 191
  • [15] German J B, 1999, Adv Exp Med Biol, V459, P23
  • [16] The role of oxidative stress in the pathogenesis of multiple sclerosis: The need for effective antioxidant therapy
    Gilgun-Sherki, Y
    Melamed, E
    Offen, D
    [J]. JOURNAL OF NEUROLOGY, 2004, 251 (03) : 261 - 268
  • [17] Oxidative Stress and Antioxidants in Disease and Cancer: A Review
    Gupta, Rakesh Kumar
    Patel, Amit Kumar
    Shah, Niranjan
    Choudhary, Arun Kumar
    Jha, Uday Kant
    Yadav, Uday Chandra
    Gupta, Pavan Kumar
    Pakuwal, Uttam
    [J]. ASIAN PACIFIC JOURNAL OF CANCER PREVENTION, 2014, 15 (11) : 4405 - 4409
  • [18] Oxidative Stress, Inflammation, and Vascular Aging in Hypertension
    Guzik, Tomasz J.
    Touyz, Rhian M.
    [J]. HYPERTENSION, 2017, 70 (04) : 660 - 667
  • [19] Hall M., 2009, SIGKDD Explorations, V11, P10, DOI DOI 10.1145/1656274.1656278
  • [20] Hall M., 1998, Thesis submitted in partial fulfillment of the requirements of the degree of Doctor of Philosophy