Engineering support vector machine kernels that recognize translation initiation sites

被引:268
作者
Zien, A
Rätsch, G
Mika, S
Schölkopf, B
Lengauer, T
Müller, KR
机构
[1] GMD, SCAI, D-53754 St Augustin, Germany
[2] GMD, FIRST, D-12489 Berlin, Germany
[3] Microsoft Res, Cambridge CB2 3NH, England
关键词
D O I
10.1093/bioinformatics/16.9.799
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: In order to extract protein sequences from nucleotide sequences, it is an important step to recognize points at which regions start that code for proteins. These points are called translation initiation sites (TIS). Results: The task of finding TIS can be modeled as a classification problem. We demonstrate the applicability of support vector machines for this task, and show how to incorporate prior biological knowledge by engineering an appropriate kernel function. With the described techniques the recognition performance can be improved by 26% over leading existing approaches. We provide evidence that existing related methods (e.g. ESTScan) could profit from advanced TIS recognition.
引用
收藏
页码:799 / 807
页数:9
相关论文
共 24 条
  • [1] Agarwal P, 1998, Proc Int Conf Intell Syst Mol Biol, V6, P2
  • [2] AGARWAL P, 1998, 2 ANN C RES COMP MOL, V2, P2
  • [3] GenBank
    Benson, DA
    Boguski, MS
    Lipman, DJ
    Ostell, J
    Ouellette, BFF
    [J]. NUCLEIC ACIDS RESEARCH, 1998, 26 (01) : 1 - 7
  • [4] PairWise and SearchWise: Finding the optimal alignment in a simultaneous comparison of a protein profile against all DNA translation frames
    Birney, E
    Thompson, JD
    Gibson, TJ
    [J]. NUCLEIC ACIDS RESEARCH, 1996, 24 (14) : 2730 - 2739
  • [5] Bishop C. M., 1995, NEURAL NETWORKS PATT
  • [6] Boser B. E., 1992, Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, P144, DOI 10.1145/130385.130401
  • [7] Knowledge-based analysis of microarray gene expression data by using support vector machines
    Brown, MPS
    Grundy, WN
    Lin, D
    Cristianini, N
    Sugnet, CW
    Furey, TS
    Ares, M
    Haussler, D
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) : 262 - 267
  • [8] Prediction of complete gene structures in human genomic DNA
    Burge, C
    Karlin, S
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) : 78 - 94
  • [9] CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411
  • [10] IDENTIFICATION OF PROTEIN CODING REGIONS BY DATABASE SIMILARITY SEARCH
    GISH, W
    STATES, DJ
    [J]. NATURE GENETICS, 1993, 3 (03) : 266 - 272