An Enhanced Protein Fold Recognition for Low Similarity Datasets Using Convolutional and Skip-Gram Features With Deep Neural Network

被引:11
作者
Bankapur, Sanjay [1 ]
Patil, Nagamma [1 ]
机构
[1] Natl Inst Technol Karnataka Surathkal, Dept Informat Technol, Mangalore 575025, India
关键词
Feature extraction; Proteins; Hidden Markov models; Benchmark testing; Amino acids; Neural networks; Nanobioscience; Convolutional features; deep neural network; evolutionary profile; hidden Markov model; protein fold recognition; position-specific scoring matrix; skipped bi-gram features; ENSEMBLE CLASSIFIER; STRUCTURAL CLASS; SCORING MATRIX; PREDICTION; DATABASE; PROBABILITIES; ALIGNMENT; ACCURACY; MACHINE; SCOP;
D O I
10.1109/TNB.2020.3022456
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The protein fold recognition is one of the important tasks of structural biology, which helps in addressing further challenges like predicting the protein tertiary structures and its functions. Many machine learning works are published to identify the protein folds effectively. However, very few works have reported the fold recognition accuracy above 80% on benchmark datasets. In this study, an effective set of global and local features are extracted from the proposed Convolutional (Conv) and SkipXGram bi-gram (SXGbg) techniques, and the fold recognition is performed using the proposed deep neural network. The performance of the proposed model reported 91.4% fold accuracy on one of the derived low similarity (< 25%) datasets of latest extended version of SCOPe_2.07. The proposed model is further evaluated on three popular and publicly available benchmark datasets such as DD, EDD, and TG and obtained 85.9%, 95.8%, and 88.8% fold accuracies, respectively. This work is first to report fold recognition accuracy above 85% on all the benchmark datasets. The performance of the proposed model has outperformed the best state-of-the-art models by 5% to 23% on DD, 2% to 19% on EDD, and 3% to 30% on TG dataset.
引用
收藏
页码:42 / 49
页数:8
相关论文
共 49 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] SCOP database in 2004: refinements integrate structure and sequence family data
    Andreeva, A
    Howorth, D
    Brenner, SE
    Hubbard, TJP
    Chothia, C
    Murzin, AG
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 : D226 - D229
  • [3] A two-layer classification framework for protein fold recognition
    Aram, Reza Zohouri
    Charkari, Nasrollah Moghadam
    [J]. JOURNAL OF THEORETICAL BIOLOGY, 2015, 365 : 32 - 39
  • [4] Saliency-Based Defect Detection in Industrial Images by Using Phase Spectrum
    Bai, Xiaolong
    Fang, Yuming
    Lin, Weisi
    Wang, Lipo
    Ju, Bing-Feng
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2014, 10 (04) : 2135 - 2145
  • [5] The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000
    Bairoch, A
    Apweiler, R
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 45 - 48
  • [6] Enhanced Protein Structural Class Prediction Using Effective Feature Modeling and Ensemble of Classifiers
    Bankapur, Sanjay
    Patil, Nagamma
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2021, 18 (06) : 2409 - 2419
  • [7] Protein Secondary Structural Class Prediction using Effective Feature Modeling and Machine Learning Techniques
    Bankapur, Sanjay
    Patil, Nagamma
    [J]. PROCEEDINGS 2018 IEEE 18TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE), 2018, : 18 - 21
  • [8] Mitochondrial uncoupling protein 2 structure determined by NMR molecular fragment searching
    Berardi, Marcelo J.
    Shih, William M.
    Harrison, Stephen C.
    Chou, James J.
    [J]. NATURE, 2011, 476 (7358) : 109 - 113
  • [9] Bouchaffra D, 2006, INT C PATT RECOG, P186
  • [10] Prediction of protein structural classes by neural network
    Cai, YD
    Zhou, GP
    [J]. BIOCHIMIE, 2000, 82 (08) : 783 - 785