Beyond Homology Transfer: Deep Learning for Automated Annotation of Proteins

被引:0
作者
Mohammad Nauman
Hafeez Ur Rehman
Gianfranco Politano
Alfredo Benso
机构
[1] FAST National University of Computer and Emerging Sciences,Department of Computer & Control Engineering
[2] Politecnico di Torino,undefined
来源
Journal of Grid Computing | 2019年 / 17卷
关键词
Protein function prediction; Sequence analysis; Deep learning;
D O I
暂无
中图分类号
学科分类号
摘要
Accurate annotation of protein functions is important for a profound understanding of molecular biology. A large number of proteins remain uncharacterized because of the sparsity of available supporting information. For a large set of uncharacterized proteins, the only type of information available is their amino acid sequence. This motivates the need to make sequence based computational techniques that can precisely annotate uncharacterized proteins. In this paper, we propose DeepSeq – a deep learning architecture – that utilizes only the protein sequence information to predict its associated functions. The prediction process does not require handcrafted features; rather, the architecture automatically extracts representations from the input sequence data. Results of our experiments with DeepSeq indicate significant improvements in terms of prediction accuracy when compared with other sequence-based methods. Our deep learning model achieves an overall validation accuracy of 86.72%, with an F1 score of 71.13%. We achieved improved results for protein function prediction problem through DeepSeq, by utilizing sequence only information. Moreover, using the automatically learned features and without any changes to DeepSeq, we successfully solved a different problem i.e. protein function localization, with no human intervention. Finally, we discuss how the same architecture can be used to solve even more complicated problems such as prediction of 2D and 3D structure as well as protein-protein interactions.
引用
收藏
页码:225 / 237
页数:12
相关论文
共 129 条
[1]  
Altschul SF(1990)Basic local alignment search tool (blast) Mol. Biol. 215 403-410
[2]  
Gish W(2013)A combined approach for genome wide protein function annotation/prediction Proteome Sci. 11 1-12
[3]  
Miller W(1998)Predicting function: from genes to genomes and back J. Mol. Biol. 283 707-725
[4]  
Myers EW(2011)Natural language processing (almost) from scratch J. Mach. Learn. Res. 12 2493-2537
[5]  
Lipman DJ(2003)Prediction of protein function using protein-protein interaction data J. Comput. Biol. 10 947-960
[6]  
Benso A(2006)Automated protein function prediction–the genomic challenge Brief. Bioinform. 7 225-242
[7]  
Carlo SD(2011)Phylogenetic-based propagation of functional annotations within the gene ontology consortium Brief. Bioinform. 12 449-462
[8]  
Ur Rehman H(2015)The gene ontology consortium. gene ontology consortium: going forward Nucleic Acids Res. 43 D1049-D1056
[9]  
Politano G(1998)Support vector machines IEEE Intell. Syst. Appl. 13 18-28
[10]  
Savino A(1997)Long short-term memory Neural Comput. 9 1735-1780