Scalable parallel word search in multicore/multiprocessor systems

被引:0
作者
Frank Drews
Jens Lichtenberg
Lonnie Welch
机构
[1] Ohio University,School of Electrical Engineering and Computer Science
来源
The Journal of Supercomputing | 2010年 / 51卷
关键词
Biological word discovery; Parallel algorithms; Cache-awareness; Lock-free data partitioning; Multicore/multiprocessor systems;
D O I
暂无
中图分类号
学科分类号
摘要
This paper presents a parallel algorithm for fast word search to determine the set of biological words of an input DNA sequence. The algorithm is designed to scale well on state-of-the-art multiprocessor/multicore systems for large inputs and large maximum word sizes. The pattern exhibited by many sequential solutions to this problem is a repetitive execution over a large input DNA sequence, and the generation of large amounts of output data to store and retrieve the words determined by the algorithm. As we show, this pattern does not lend itself to straightforward standard parallelization techniques. The proposed algorithm aims to achieve three major goals to overcome the drawbacks of embarrassingly parallel solution techniques: (i) to impose a high degree of cache locality on a problem that, by nature, tends to exhibit nonlocal access patterns, (ii) to be lock free or largely reduce the need for data access locking, and (iii) to enable an even distribution of the overall processing load among multiple threads. We present an implementation and performance evaluation of the proposed algorithm on DNA sequences of various sizes for different organisms on a dual processor quad-core system with a total of 8 cores. We compare the performance of the parallel word search implementation with a sequential implementation and with an embarrassingly parallel implementation. The results show that the proposed algorithm far outperforms the embarrassingly parallel strategy and achieves a speed-up’s of up to 6.9 on our 8-core test system.
引用
收藏
页码:58 / 75
页数:17
相关论文
共 60 条
  • [1] Roth FR(1998)Finding DNA regulatory motifs within unaligned non-coding sequences clustered by whole-genome mRNA quantita Nature Biotechnol 16 939-945
  • [2] Hughes JD(2004)Weeder web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes Nucleic Acids Res 32 W199-W203
  • [3] Church PE(2006)Short blocks from the noncoding parts of the human genome have instances within nearly all known genes and relate to biological processes Proc Nat Acad Sci 103 6605-6610
  • [4] Church GM(2003)Ymf: a program for discovery of novel transcription factor binding sites by statistical overrepresentation Nucleic Acids Res 32 3586-3588
  • [5] Pavesi G(1977)A fast string searching algorithm Commun ACM 20 762-772
  • [6] Mereghetti P(1977)Fast pattern matching in strings SIAM J Comput 6 323-350
  • [7] Mauri G(1986)The Boyer-Moore-Galil string searching strategies revisited SIAM J Comput 15 98-105
  • [8] Pesole G(1975)Efficient string matching: an aid to bibliographic search Commun ACM 18 333-340
  • [9] Rigoutsos I(1976)A space-economical suffix tree construction algorithm J ACM 23 262-272
  • [10] Huynh T(1995)On-line construction of suffix trees Algorithmica 14 249-260