An efficient similarity search based on indexing in large DNA databases

被引：7

作者：

Jeong, In-Seon ^{[1
]}

Park, Kyoung-Wook ^{[1
]}

Kang, Seung-Ho ^{[1
]}

Lim, Hyeong-Seok ^{[1
]}

机构：

[1] Chonnam Natl Univ, Sch Elect & Comp Eng, Kwangju 500757, South Korea

来源：

COMPUTATIONAL BIOLOGY AND CHEMISTRY | 2010年 / 34卷 / 02期

关键词：

Similarity search; Approximate string matching; Indexing; DNA sequence; HOMOLOGY SEARCH;

D O I：

10.1016/j.compbiolchem.2010.03.007

中图分类号：

Q [生物科学];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Index-based search algorithms are an important part of a genomic search, and how to construct indices is the key to an index-based search algorithm to compute similarities between two DNA sequences. In this paper, we propose an efficient query processing method that uses special transformations to construct an index. It uses small storage and it rapidly finds the similarity between two sequences in a DNA sequence database. At first, a sequence is partitioned into equal length windows. We select the likely subsequences by computing Hamming distance to query sequence. The algorithm then transforms the subsequences in each window into a multidimensional vector space by indexing the frequencies of the characters, including the positional information of the characters in the subsequences. The result of our experiments shows that the algorithm has faster run time than other heuristic algorithms based on index structure. Also, the algorithm is as accurate as those heuristic algorithms. (C) 2010 Elsevier Ltd. All rights reserved.

引用

页码：131 / 136

页数：6

共 50 条

[31] Efficient similarity search on multidimensional space of biometric databases
Jayaraman, Umarani
Gupta, Phalguni
NEUROCOMPUTING, 2021, 452 : 623 - 652
[32] Graph similarity search on large uncertain graph databases
Yuan, Ye
Wang, Guoren
Chen, Lei
Wang, Haixun
VLDB JOURNAL, 2015, 24 (02): : 271 - 296
[33] Parallelization of similarity search in large time series databases
Qiao, Jonathan
Ye, Yang
Zhang, Chaoyang
FIRST INTERNATIONAL MULTI-SYMPOSIUMS ON COMPUTER AND COMPUTATIONAL SCIENCES (IMSCCS 2006), PROCEEDINGS, VOL 1, 2006, : 355 - +
[34] Scalable Graph Similarity Search in Large Graph Databases
Kiran, P.
Sivadasan, Naveen
PROCEEDINGS OF THE 2015 IEEE RECENT ADVANCES IN INTELLIGENT COMPUTATIONAL SYSTEMS (RAICS), 2015, : 207 - 211
[35] Graph similarity search on large uncertain graph databases
Ye Yuan
Guoren Wang
Lei Chen
Haixun Wang
The VLDB Journal, 2015, 24 : 271 - 296
[36] Efficient Indexing for Large Scale Visual Search
Zhang, Xiao
Li, Zhiwei
Zhang, Lei
Ma, Wei-Ying
Shum, Heung-Yeung
2009 IEEE 12TH INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2009, : 1103 - 1110
[37] Hierarchical indexing structure for efficient similarity search in video retrieval
Lu, Hong
Ooi, Beng Chin
Shen, Heng Tao
Xue, Xiangyang
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2006, 18 (11) : 1544 - 1559
[38] Indexing of Motion Capture Data for Efficient and Fast Similarity Search
Li, Chuanjun
Prabhakaran, B.
JOURNAL OF COMPUTERS, 2006, 1 (03) : 35 - 42
[39] Indexing Dense Nested Metric Spaces for Efficient Similarity Search
Brisaboa, Nieves R.
Luaces, Miguel R.
Pedreira, Oscar
Places, Angeles S.
Seco, Diego
PERSPECTIVES OF SYSTEMS INFORMATICS, 2010, 5947 : 98 - 109
[40] Dynamic Partition Forest: An Efficient and Distributed Indexing Scheme for Similarity Search based on Hashing
Lu, Yangdi
Bo, Yang
He, Wenbo
Nabatchian, Amir
2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 1059 - 1064

← 1 2 3 4 5 →