Multistage Combination Classifier Augmented Model for Protein Secondary Structure Prediction

被引:1
作者
Zhang, Xu [1 ]
Liu, Yiwei [2 ]
Wang, Yaming [3 ]
Zhang, Liang [4 ]
Feng, Lin [2 ]
Jin, Bo [2 ]
Zhang, Hongzhe [1 ]
机构
[1] Dalian Univ Technol, Coll Mech Engn, Dalian, Peoples R China
[2] Dalian Univ Technol, Sch Innovat & Entrepreneurship, Dalian, Peoples R China
[3] Dalian Med Univ, Affiliated Hosp 1, Dalian, Peoples R China
[4] Dongbei Univ Finance & Econ, Int Business Sch, Dalian, Peoples R China
基金
中国国家自然科学基金;
关键词
genetics; biology; protein secondary structure; deep learning; combination classifier; amino acid sequence; NETWORKS;
D O I
10.3389/fgene.2022.769828
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
In the field of bioinformatics, understanding protein secondary structure is very important for exploring diseases and finding new treatments. Considering that the physical experiment-based protein secondary structure prediction methods are time-consuming and expensive, some pattern recognition and machine learning methods are proposed. However, most of the methods achieve quite similar performance, which seems to reach a model capacity bottleneck. As both model design and learning process can affect the model learning capacity, we pay attention to the latter part. To this end, a framework called Multistage Combination Classifier Augmented Model (MCCM) is proposed to solve the protein secondary structure prediction task. Specifically, first, a feature extraction module is introduced to extract features with different levels of learning difficulties. Second, multistage combination classifiers are proposed to learn decision boundaries for easy and hard samples, respectively, with the latter penalizing the loss value of the hard samples and finally improving the prediction performance of hard samples. Third, based on the Dirichlet distribution and information entropy measurement, a sample difficulty discrimination module is designed to assign samples with different learning difficulty levels to the aforementioned classifiers. The experimental results on the publicly available benchmark CB513 dataset show that our method outperforms most state-of-the-art models.
引用
收藏
页数:11
相关论文
共 38 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] Cao KD, 2019, ADV NEUR IN, V32
  • [3] SMOTE: Synthetic minority over-sampling technique
    Chawla, Nitesh V.
    Bowyer, Kevin W.
    Hall, Lawrence O.
    Kegelmeyer, W. Philip
    [J]. 2002, American Association for Artificial Intelligence (16)
  • [4] Cuff JA, 1999, PROTEINS, V34, P508, DOI 10.1002/(SICI)1097-0134(19990301)34:4<508::AID-PROT10>3.0.CO
  • [5] 2-4
  • [6] Cuff JA, 2000, PROTEINS, V40, P502, DOI 10.1002/1097-0134(20000815)40:3<502::AID-PROT170>3.0.CO
  • [7] 2-Q
  • [8] Class-Balanced Loss Based on Effective Number of Samples
    Cui, Yin
    Jia, Menglin
    Lin, Tsung-Yi
    Song, Yang
    Belongie, Serge
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 9260 - 9269
  • [9] Dempster AP, 2008, STUD FUZZ SOFT COMP, V219, P73
  • [10] Drori I., 2018, PREPRINT