Discovering Motifs in Biological Sequences Using the Micron Automata Processor

被引:27
作者
Roy, Indranil [1 ]
Aluru, Srinivas [1 ]
机构
[1] Georgia Inst Technol, Sch Computat Sci & Engn, Atlanta, GA 30332 USA
基金
美国国家科学基金会;
关键词
Computational biology; finite automaton; graph algorithms; hardware acceleration; motif detection; ALGORITHMS;
D O I
10.1109/TCBB.2015.2430313
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Finding approximately conserved sequences, called motifs, across multiple DNA or protein sequences is an important problem in computational biology. In this paper, we consider the (l, d) motif search problem of identifying one or more motifs of length l present in at least q of the n given sequences, with each occurrence differing from the motif in at most d substitutions. The problem is known to be NP-complete, and the largest solved instance reported to date is (26, 11). We propose a novel algorithm for the (l, d) motif search problem using streaming execution over a large set of non-deterministic finite automata (NFA). This solution is designed to take advantage of the micron automata processor, a new technology close to deployment that can simultaneously execute multiple NFA in parallel. We demonstrate the capability for solving much larger instances of the (l, d) motif search problem using the resources available within a single automata processor board, by estimating run-times for problem instances (39, 18) and (40, 17). The paper serves as a useful guide to solving problems using this new accelerator technology.
引用
收藏
页码:99 / 111
页数:13
相关论文
共 28 条
[1]  
Bailey TL., 1994, P 2 INT C INT SYST M, V2, P28
[2]  
Bandyopadhyay Shibdas, 2012, IEEE Int Conf Comput Adv Bio Med Sci, P1
[3]  
Becchi M., 2009, THESIS WASHINGTON U
[4]  
Carvalho A., 2005, Asia-Pacific Bioinformatics Conference, P273
[5]  
Chin F., 2005, Proceedings of the 3rd Asia-Pacific Bioinformatics Conference, P261
[6]   Fast and practical algorithms for planted (l, d) motif search [J].
Davila, Jaime ;
Balla, Sudha ;
Rajasekaran, Sanguthevar .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2007, 4 (04) :544-552
[7]   qPMS7: A Fast Algorithm for Finding (l, d)-Motifs in DNA and Protein Sequences [J].
Dinh, Hieu ;
Rajasekaran, Sanguthevar ;
Davila, Jaime .
PLOS ONE, 2012, 7 (07)
[8]   PMS5: an efficient exact algorithm for the (l, d)-motif finding problem [J].
Dinh, Hieu ;
Rajasekaran, Sanguthevar ;
Kundeti, Vamsi K. .
BMC BIOINFORMATICS, 2011, 12
[9]  
Dlugosch P., 2014, IEEE T PARALL DISTR, V99, P1
[10]  
Eskin Eleazar, 2002, Bioinformatics, V18 Suppl 1, pS354