Structural analysis of regulatory DNA sequences using grammar inference and Support Vector Machine

被引:23
作者
Damasevicius, Robertas [1 ]
机构
[1] Kaunas Univ Technol, Software Engn Dept, LT-51368 Kaunas, Lithuania
关键词
DNA sequence analysis; Grammar inference; L-grammar; Support Vector Machine; CLASSIFIER SYSTEM; MODULAR STRUCTURE; PROMOTER; RECOGNITION;
D O I
10.1016/j.neucom.2009.09.018
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Regulatory DNA sequences such as promoters or splicing sites control gene expression and are important for successful gene prediction. Such sequences can be recognized by certain patterns or motifs that are conserved within a species. These patterns have many exceptions which makes the structural analysis of regulatory sequences a complex problem. Grammar rules can be used for describing the structure of regulatory sequences; however, the manual derivation of such rules is not trivial. In this paper, stochastic L-grammar rules are derived automatically from positive examples and counterexamples of regulatory sequences using genetic programming techniques. The fitness of grammar rules is evaluated using a Support Vector Machine (SVM) classifier. SVM is trained on known sequences to obtain a discriminating function which serves for evaluating a candidate grammar ruleset by determining the percentage of generated sequences that are classified correctly. The combination of SVM and grammar rule inference can mitigate the lack of structural insight in machine learning approaches such as SVM. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:633 / 638
页数:6
相关论文
共 46 条
[1]   Fractal properties of DNA walks [J].
Abramson, G ;
Cerdeira, HA ;
Bruschi, C .
BIOSYSTEMS, 1999, 49 (01) :63-70
[2]  
[Anonymous], 1999, Advances in kernel methods: Support vector learning
[3]  
[Anonymous], Machine learning datasets
[4]  
[Anonymous], S PATT FORM SPF 93 C
[5]  
[Anonymous], 2000, Pattern Classification
[6]  
Bajic Vladimir B., 2004, In Silico Biology, V4, P109
[7]  
*BERK DROS GEN PRO, DROS PROM DAT
[8]   MODULAR STRUCTURE OF THE BETA-GLOBIN AND THE TK PROMOTERS [J].
COCHRAN, MD ;
WEISSMANN, C .
EMBO JOURNAL, 1984, 3 (11) :2453-2459
[9]   GRAMMATICAL MODEL OF THE REGULATION OF GENE-EXPRESSION [J].
COLLADOVIDES, J .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1992, 89 (20) :9405-9409
[10]  
DAMASEVICIUS R, 2008, P 11 JOINT INT SCI E, V2, P98