Scalable parallel word search in multicore/multiprocessor systems

被引：7

作者：

Drews, Frank ^{[1
]}

Lichtenberg, Jens ^{[1
]}

Welch, Lonnie ^{[1
]}

机构：

[1] Ohio Univ, Sch Elect Engn & Comp Sci, Athens, OH 45701 USA

来源：

JOURNAL OF SUPERCOMPUTING | 2010年 / 51卷 / 01期

关键词：

Biological word discovery; Parallel algorithms; Cache-awareness; Lock-free data partitioning; Multicore/multiprocessor systems; FACTOR-BINDING-SITES; GENOME; CONSTRUCTION; SEQUENCES;

D O I：

10.1007/s11227-009-0308-3

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper presents a parallel algorithm for fast word search to determine the set of biological words of an input DNA sequence. The algorithm is designed to scale well on state-of-the-art multiprocessor/multicore systems for large inputs and large maximum word sizes. The pattern exhibited by many sequential solutions to this problem is a repetitive execution over a large input DNA sequence, and the generation of large amounts of output data to store and retrieve the words determined by the algorithm. As we show, this pattern does not lend itself to straightforward standard parallelization techniques. The proposed algorithm aims to achieve three major goals to overcome the drawbacks of embarrassingly parallel solution techniques: (i) to impose a high degree of cache locality on a problem that, by nature, tends to exhibit nonlocal access patterns, (ii) to be lock free or largely reduce the need for data access locking, and (iii) to enable an even distribution of the overall processing load among multiple threads. We present an implementation and performance evaluation of the proposed algorithm on DNA sequences of various sizes for different organisms on a dual processor quad-core system with a total of 8 cores. We compare the performance of the parallel word search implementation with a sequential implementation and with an embarrassingly parallel implementation. The results show that the proposed algorithm far outperforms the embarrassingly parallel strategy and achieves a speed-up’s of up to 6.9 on our 8-core test system.

引用

页码：58 / 75

页数：18

共 44 条

[1] Adjeroh D., 2008, The Burrows-Wheeler Transform: Data Compression, Suffix Arrays, and Pattern Matching
[2] EFFICIENT STRING MATCHING - AID TO BIBLIOGRAPHIC SEARCH
AHO, AV
CORASICK, MJ
[J]. COMMUNICATIONS OF THE ACM, 1975, 18 (06) : 333 - 340
[3] ALGORITHMS FOR TRIE COMPACTION
ALSUWAIYEL, M
HOROWITZ, E
[J]. ACM TRANSACTIONS ON DATABASE SYSTEMS, 1984, 9 (02): : 243 - 263
[4] [Anonymous], FITTING MIXTURE MODE
[5] [Anonymous], NUCL ACIDS RES
[6] [Anonymous], 1998, SORTING SEARCHING
[7] [Anonymous], 1997, ACM SIGACT NEWS
[8] [Anonymous], P ACM SIGMOD INT C M
[9] THE BOYER-MOORE-GALIL STRING SEARCHING STRATEGIES REVISITED
APOSTOLICO, A
GIANCARLO, R
[J]. SIAM JOURNAL ON COMPUTING, 1986, 15 (01) : 98 - 105
[10] ASKITIS N, 2007, P 30 AUSTR C COMP SC, V62, P97

← 1 2 3 4 5 →