Remote homology detection based on oligomer distances

被引:51
作者
Lingner, Thomas [1 ]
Meinicke, Peter [1 ]
机构
[1] Univ Gottingen, Abt Bioinformat, Inst Mikrobiol & Genet, D-37077 Gottingen, Germany
关键词
D O I
10.1093/bioinformatics/btl376
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Remote homology detection is among the most intensively researched problems in bioinformatics. Currently discriminative approaches, especially kernel-based methods, provide the most accurate results. However, kernel methods also show several drawbacks: in many cases prediction of new sequences is computationally expensive, often kernels lack an interpretable model for analysis of characteristic sequence features, and finally most approaches make use of so-called hyperparameters which complicate the application of methods across different datasets. Results: We introduce a feature vector representation for protein sequences based on distances between short oligomers. The corresponding feature space arises from distance histograms for any possible pair of K-mers. Our distance-based approach shows important advantages in terms of computational speed while on common test data the prediction performance is highly competitive with state-of-the-art methods for protein remote homology detection. Furthermore the learnt model can easily be analyzed in terms of discriminative features and in contrast to other methods our representation does not require any tuning of kernel hyperparameters.
引用
收藏
页码:2224 / 2231
页数:8
相关论文
共 21 条
[1]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[2]  
[Anonymous], 2002, Proc. of the Intl. Conf. on Research in Computational Molecular Biology
[3]  
[Anonymous], 1998, Encyclopedia of Biostatistics
[4]   Remote homology detection: a motif based approach [J].
Ben-Hur, Asa ;
Brutlag, Douglas .
BIOINFORMATICS, 2003, 19 :i26-i33
[5]   Application of latent semantic analysis to protein remote homology detection [J].
Dong, QW ;
Wang, XL ;
Lin, L .
BIOINFORMATICS, 2006, 22 (03) :285-290
[6]  
HUANG R, 2005, J BIOINFORM COMPUT B, V3, P527
[7]   The PROSITE database [J].
Hulo, Nicolas ;
Bairoch, Amos ;
Bulliard, Virginie ;
Cerutti, Lorenzo ;
De Castro, Edouard ;
Langendijk-Genevaux, Petra S. ;
Pagni, Marco ;
Sigrist, Christian J. A. .
NUCLEIC ACIDS RESEARCH, 2006, 34 :D227-D230
[8]   A discriminative framework for detecting remote protein homologies [J].
Jaakkola, T ;
Diekhans, M ;
Haussler, D .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2000, 7 (1-2) :95-114
[9]   HIDDEN MARKOV-MODELS IN COMPUTATIONAL BIOLOGY - APPLICATIONS TO PROTEIN MODELING [J].
KROGH, A ;
BROWN, M ;
MIAN, IS ;
SJOLANDER, K ;
HAUSSLER, D .
JOURNAL OF MOLECULAR BIOLOGY, 1994, 235 (05) :1501-1531
[10]  
LESLIE C, 2002, P S BIOCOMPUT, P266