An improved deep learning model for hierarchical classification of protein families

被引:12
作者
Sandaruwan, Pahalage Dhanushka [1 ]
Wannige, Champi Thusangi [1 ]
机构
[1] Univ Ruhuna, Dept Comp Sci, Matara, Sri Lanka
来源
PLOS ONE | 2021年 / 16卷 / 10期
关键词
ROC CURVE; PREDICTION; DATABASE;
D O I
10.1371/journal.pone.0258625
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Although genes carry information, proteins are the main role player in providing all the functionalities of a living organism. Massive amounts of different proteins involve in every function that occurs in a cell. These amino acid sequences can be hierarchically classified into a set of families and subfamilies depending on their evolutionary relatedness and similarities in their structure or function. Protein characterization to identify protein structure and function is done accurately using laboratory experiments. With the rapidly increasing huge amount of novel protein sequences, these experiments have become difficult to carry out since they are expensive, time-consuming, and laborious. Therefore, many computational classification methods are introduced to classify proteins and predict their functional properties. With the progress of the performance of the computational techniques, deep learning plays a key role in many areas. Novel deep learning models such as DeepFam, ProtCNN have been presented to classify proteins into their families recently. However, these deep learning models have been used to carry out the non-hierarchical classification of proteins. In this research, we propose a deep learning neural network model named DeepHiFam with high accuracy to classify proteins hierarchically into different levels simultaneously. The model achieved an accuracy of 98.38% for protein family classification and more than 80% accuracy for the classification of protein subfamilies and sub-subfamilies. Further, DeepHiFam performed well in the non-hierarchical classification of protein families and achieved an accuracy of 98.62% and 96.14% for the popular Pfam dataset and COG dataset respectively.
引用
收藏
页数:15
相关论文
共 40 条
[21]   THE AREA UNDER THE ROC CURVE AND ITS COMPETITORS [J].
HILDEN, J .
MEDICAL DECISION MAKING, 1991, 11 (02) :95-101
[22]   DeepSF: deep convolutional neural network for mapping protein sequences to folds [J].
Hou, Jie ;
Adhikari, Badri ;
Cheng, Jianlin .
BIOINFORMATICS, 2018, 34 (08) :1295-1303
[23]   pHMM-tree: phylogeny of profile hidden Markov models [J].
Huo, Luyang ;
Zhang, Han ;
Huo, Xueting ;
Yang, Yasong ;
Li, Xueqiong ;
Yin, Yanbin .
BIOINFORMATICS, 2017, 33 (07) :1093-1095
[24]   Improving protein fold recognition by random forest [J].
Jo, Taeho ;
Cheng, Jianlin .
BMC BIOINFORMATICS, 2014, 15
[25]   Identification and classification of conopeptides using profile Hidden Markov Models [J].
Laht, Silja ;
Koua, Dominique ;
Kaplinski, Lauris ;
Lisacek, Frederique ;
Stoecklin, Reto ;
Remm, Maido .
BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS, 2012, 1824 (03) :488-492
[26]   Nature of the protein universe [J].
Levitt, Michael .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2009, 106 (27) :11079-11084
[27]   DEEPre: sequence-based enzyme EC number prediction by deep learning [J].
Li, Yu ;
Wang, Sheng ;
Umarov, Ramzan ;
Xie, Bingqing ;
Fan, Ming ;
Li, Lihua ;
Gao, Xin .
BIOINFORMATICS, 2018, 34 (05) :760-769
[28]   Identifying Cancer Targets Based on Machine Learning Methods via Chou's 5-steps Rule and General Pseudo Components [J].
Liang, Ruirui ;
Xie, Jiayang ;
Zhang, Chi ;
Zhang, Mengying ;
Huang, Hai ;
Huo, Haizhong ;
Cao, Xin ;
Niu, Bing .
CURRENT TOPICS IN MEDICINAL CHEMISTRY, 2019, 19 (25) :2301-2317
[29]   pRNAm-PC: Predicting N6-methyladenosine sites in RNA sequences via physical-chemical properties [J].
Liu, Zi ;
Xiao, Xuan ;
Yu, Dong-Jun ;
Jia, Jianhua ;
Qiu, Wang-Ren ;
Chou, Kuo-Chen .
ANALYTICAL BIOCHEMISTRY, 2016, 497 :60-67
[30]   Deep learning in bioinformatics [J].
Min, Seonwoo ;
Lee, Byunghan ;
Yoon, Sungroh .
BRIEFINGS IN BIOINFORMATICS, 2017, 18 (05) :851-869