Splice site identification by idlBNs

被引:29
作者
Castelo, Robert [1 ]
Guigo, Roderic [1 ]
机构
[1] Univ Pompeu Fabra, Ctr Regulacio Genom, Inst Municipal Invest Med, Grp Recerca Informat Biomed, Barcelona 08003, Spain
关键词
D O I
10.1093/bioinformatics/bth932
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Computational identification of functional sites in nucleotide sequences is at the core of many algorithms for the analysis of genomic data. This identification is based on the statistical parameters estimated from a training set. Often, because of the huge number of parameters, it is difficult to obtain consistent estimators. To simplify the estimation problem, one imposes independent assumptions between the nucleotides along the site. However, this can potentially limit the minimum value of the estimation error. Results: In this paper, we introduce a novel method in the context of identifying functional sites, that finds a reasonable set of independence assumptions supported by the data, among the nucleotides, and uses it to perform the identification of the sites by their likelihood ratio. More importantly, in many practical situations it is capable of improving its performance as the training sample size increases. We apply the method to the identification of splice sites, and further evaluate its effect within the context of exon and gene prediction.
引用
收藏
页码:69 / 76
页数:8
相关论文
共 21 条
[1]  
Agarwal P., 1998, Proceedings of the Second Annual International Conference on Computational Molecular Biology, RECOMB '98, P2
[2]  
Barash Y., 2003, P 7 ANN INT C COMP M, P28
[3]  
Burge CB, 1998, N COMP BIOC, V32, P129
[4]   Evaluation of gene structure prediction programs [J].
Burset, M ;
Guigo, R .
GENOMICS, 1996, 34 (03) :353-367
[5]   Modeling splice sites with Bayes networks [J].
Cai, DY ;
Delcher, A ;
Kao, B ;
Kasif, S .
BIOINFORMATICS, 2000, 16 (02) :152-158
[6]   On inclusion-driven learning of Bayesian networks [J].
Castelo, R ;
Kocka, T .
JOURNAL OF MACHINE LEARNING RESEARCH, 2004, 4 (04) :527-574
[7]   A MEASURE OF ASYMPTOTIC EFFICIENCY FOR TESTS OF A HYPOTHESIS BASED ON THE SUM OF OBSERVATIONS [J].
CHERNOFF, H .
ANNALS OF MATHEMATICAL STATISTICS, 1952, 23 (04) :493-507
[8]  
Chickering D. M., 2003, Journal of Machine Learning Research, V3, P507, DOI 10.1162/153244303321897717
[9]  
Cover TM, 2006, Elements of Information Theory
[10]  
Dash D., 2001, MODELING DNA SPLICE