An efficient similarity search based on indexing in large DNA databases

被引:7
|
作者
Jeong, In-Seon [1 ]
Park, Kyoung-Wook [1 ]
Kang, Seung-Ho [1 ]
Lim, Hyeong-Seok [1 ]
机构
[1] Chonnam Natl Univ, Sch Elect & Comp Eng, Kwangju 500757, South Korea
关键词
Similarity search; Approximate string matching; Indexing; DNA sequence; HOMOLOGY SEARCH;
D O I
10.1016/j.compbiolchem.2010.03.007
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Index-based search algorithms are an important part of a genomic search, and how to construct indices is the key to an index-based search algorithm to compute similarities between two DNA sequences. In this paper, we propose an efficient query processing method that uses special transformations to construct an index. It uses small storage and it rapidly finds the similarity between two sequences in a DNA sequence database. At first, a sequence is partitioned into equal length windows. We select the likely subsequences by computing Hamming distance to query sequence. The algorithm then transforms the subsequences in each window into a multidimensional vector space by indexing the frequencies of the characters, including the positional information of the characters in the subsequences. The result of our experiments shows that the algorithm has faster run time than other heuristic algorithms based on index structure. Also, the algorithm is as accurate as those heuristic algorithms. (C) 2010 Elsevier Ltd. All rights reserved.
引用
收藏
页码:131 / 136
页数:6
相关论文
共 50 条
  • [31] Efficient similarity search on multidimensional space of biometric databases
    Jayaraman, Umarani
    Gupta, Phalguni
    NEUROCOMPUTING, 2021, 452 : 623 - 652
  • [32] Graph similarity search on large uncertain graph databases
    Yuan, Ye
    Wang, Guoren
    Chen, Lei
    Wang, Haixun
    VLDB JOURNAL, 2015, 24 (02): : 271 - 296
  • [33] Parallelization of similarity search in large time series databases
    Qiao, Jonathan
    Ye, Yang
    Zhang, Chaoyang
    FIRST INTERNATIONAL MULTI-SYMPOSIUMS ON COMPUTER AND COMPUTATIONAL SCIENCES (IMSCCS 2006), PROCEEDINGS, VOL 1, 2006, : 355 - +
  • [34] Scalable Graph Similarity Search in Large Graph Databases
    Kiran, P.
    Sivadasan, Naveen
    PROCEEDINGS OF THE 2015 IEEE RECENT ADVANCES IN INTELLIGENT COMPUTATIONAL SYSTEMS (RAICS), 2015, : 207 - 211
  • [35] Graph similarity search on large uncertain graph databases
    Ye Yuan
    Guoren Wang
    Lei Chen
    Haixun Wang
    The VLDB Journal, 2015, 24 : 271 - 296
  • [36] Efficient Indexing for Large Scale Visual Search
    Zhang, Xiao
    Li, Zhiwei
    Zhang, Lei
    Ma, Wei-Ying
    Shum, Heung-Yeung
    2009 IEEE 12TH INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2009, : 1103 - 1110
  • [37] Hierarchical indexing structure for efficient similarity search in video retrieval
    Lu, Hong
    Ooi, Beng Chin
    Shen, Heng Tao
    Xue, Xiangyang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2006, 18 (11) : 1544 - 1559
  • [38] Indexing of Motion Capture Data for Efficient and Fast Similarity Search
    Li, Chuanjun
    Prabhakaran, B.
    JOURNAL OF COMPUTERS, 2006, 1 (03) : 35 - 42
  • [39] Indexing Dense Nested Metric Spaces for Efficient Similarity Search
    Brisaboa, Nieves R.
    Luaces, Miguel R.
    Pedreira, Oscar
    Places, Angeles S.
    Seco, Diego
    PERSPECTIVES OF SYSTEMS INFORMATICS, 2010, 5947 : 98 - 109
  • [40] Dynamic Partition Forest: An Efficient and Distributed Indexing Scheme for Similarity Search based on Hashing
    Lu, Yangdi
    Bo, Yang
    He, Wenbo
    Nabatchian, Amir
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 1059 - 1064