Fast Target Set Reduction for Large-Scale Protein Function Prediction: A Multi-class Multi-label Machine Learning Approach

被引:0
作者
Lingner, Thomas [1 ]
Meinicke, Peter [1 ]
机构
[1] Univ Gottingen, Inst Microbiol & Genet, Dept Bioinformat, D-37077 Gottingen, Germany
来源
ALGORITHMS IN BIOINFORMATICS, WABI 2008 | 2008年 / 5251卷
关键词
protein classification; large-scale; multi-class; multi-label; Pfam; homology search; metagenomics; target set reduction; protein function prediction; machine learning;
D O I
暂无
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Large-scale sequencing projects have led to a vast amount of protein sequences, which have to be assigned to functional categories. Currently, profile hidden markov models and kernel-based machine learning methods provide the most accurate results for protein classification. However, the prediction of new sequences with these approaches is computationally expensive. We present an approach for fast scoring of protein sequences by means of feature-based protein sequence representation and multi-class multi-label machine learning techniques. Using the Pfam database, we show that our method provides high computational efficiency and that the approach is well-suitable for pre-filtering of large sequence sets.
引用
收藏
页码:198 / 209
页数:12
相关论文
共 27 条
[1]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[2]   Remote homology detection: a motif based approach [J].
Ben-Hur, Asa ;
Brutlag, Douglas .
BIOINFORMATICS, 2003, 19 :i26-i33
[3]   Learning from imbalanced data in surveillance of nosocomial infection [J].
Cohen, Gilles ;
Hilario, Melanie ;
Sax, Hugo ;
Hugonnet, Stephane ;
Geissbuhler, Antoine .
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2006, 37 (01) :7-18
[4]  
Diplaris S, 2005, LECT NOTES COMPUT SC, V3746, P448
[5]   Profile hidden Markov models [J].
Eddy, SR .
BIOINFORMATICS, 1998, 14 (09) :755-763
[6]  
Elisseeff A, 2002, ADV NEUR IN, V14, P681
[7]   Pfam:: clans, web tools and services [J].
Finn, Robert D. ;
Mistry, Jaina ;
Schuster-Bockler, Benjamin ;
Griffiths-Jones, Sam ;
Hollich, Volker ;
Lassmann, Timo ;
Moxon, Simon ;
Marshall, Mhairi ;
Khanna, Ajay ;
Durbin, Richard ;
Eddy, Sean R. ;
Sonnhammer, Erik L. L. ;
Bateman, Alex .
NUCLEIC ACIDS RESEARCH, 2006, 34 :D247-D251
[8]   Automated protein function prediction - the genomic challenge [J].
Friedberg, Iddo .
BRIEFINGS IN BIOINFORMATICS, 2006, 7 (03) :225-242
[9]   Recent progresses in the application of machine learning approach for predicting protein functional class independent of sequence similarity [J].
Han, Lianyi ;
Cui, Juan ;
Lin, Honghuang ;
Ji, Zhiliang ;
Cao, Zhiwei ;
Li, Yixue ;
Chen, Yuzong .
PROTEOMICS, 2006, 6 (14) :4023-4037
[10]   Gene prediction in metagenomic fragments: A large scale machine learning approach [J].
Hoff, Katharina J. ;
Tech, Maike ;
Lingner, Thomas ;
Daniel, Rolf ;
Morgenstern, Burkhard ;
Meinicke, Peter .
BMC BIOINFORMATICS, 2008, 9 (1)