Finding Motifs in DNA Sequences Using Low-Dispersion Sequences

被引:49
作者
Wang, Xun [1 ]
Miao, Ying [1 ]
Cheng, Minquan [2 ]
机构
[1] Univ Tsukuba, Grad Sch Syst & Informat Engn, Tsukuba, Ibaraki 3058577, Japan
[2] Guangxi Normal Univ, Dept Math, Guilin, Peoples R China
基金
美国国家科学基金会;
关键词
random projection; developed almost difference family; uniform projection; motif finding; low-dispersion sequence; EXPECTATION MAXIMIZATION; PROJECTION; DISCOVERY; BINDING;
D O I
10.1089/cmb.2013.0054
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motif finding problems, abstracted as the planted (l, d)-motif finding problem, are a major task in molecular biology-finding functioning units and genes. In 2002, the random projection algorithm was introduced to solve the challenging (15, 4)-motif finding problem by using randomly chosen templates. Two years later, a so-called uniform projection algorithm was developed to improve the random projection algorithm by means of low-dispersion sequences generated by coverings. In this article, we introduce an improved projection algorithm called the low-dispersion projection algorithm, which uses low-dispersion sequences generated by developed almost difference families. Compared with the random projection algorithm, the low-dispersion projection algorithm can solve the (l, d)-motif finding problem with fewer templates without decreasing the success rate.
引用
收藏
页码:320 / 329
页数:10
相关论文
共 18 条
[1]  
[Anonymous], 1992, RANDOM NUMBER GENERA
[2]  
BAILEY TL, 1995, MACH LEARN, V21, P51, DOI 10.1007/BF00993379
[3]  
Beth T., 1999, DESIGN THEORY, VI
[4]   Finding motifs using random projections [J].
Buhler, J ;
Tompa, M .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2002, 9 (02) :225-242
[5]   A survey of DNA motif finding algorithms [J].
Das, Modan K. ;
Dai, Ho-Kwok .
BMC BIOINFORMATICS, 2007, 8 (Suppl 7)
[6]  
Fern X.Z., 2003, ICML, P186
[7]   DNAASE FOOTPRINTING - SIMPLE METHOD FOR DETECTION OF PROTEIN-DNA BINDING SPECIFICITY [J].
GALAS, DJ ;
SCHMITZ, A .
NUCLEIC ACIDS RESEARCH, 1978, 5 (09) :3157-3170
[9]  
Gionis A, 1999, PROCEEDINGS OF THE TWENTY-FIFTH INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, P518
[10]   Identifying DNA and protein patterns with statistically significant alignments of multiple sequences [J].
Hertz, GZ ;
Stormo, GD .
BIOINFORMATICS, 1999, 15 (7-8) :563-577