An efficient similarity search based on indexing in large DNA databases

被引:7
|
作者
Jeong, In-Seon [1 ]
Park, Kyoung-Wook [1 ]
Kang, Seung-Ho [1 ]
Lim, Hyeong-Seok [1 ]
机构
[1] Chonnam Natl Univ, Sch Elect & Comp Eng, Kwangju 500757, South Korea
关键词
Similarity search; Approximate string matching; Indexing; DNA sequence; HOMOLOGY SEARCH;
D O I
10.1016/j.compbiolchem.2010.03.007
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Index-based search algorithms are an important part of a genomic search, and how to construct indices is the key to an index-based search algorithm to compute similarities between two DNA sequences. In this paper, we propose an efficient query processing method that uses special transformations to construct an index. It uses small storage and it rapidly finds the similarity between two sequences in a DNA sequence database. At first, a sequence is partitioned into equal length windows. We select the likely subsequences by computing Hamming distance to query sequence. The algorithm then transforms the subsequences in each window into a multidimensional vector space by indexing the frequencies of the characters, including the positional information of the characters in the subsequences. The result of our experiments shows that the algorithm has faster run time than other heuristic algorithms based on index structure. Also, the algorithm is as accurate as those heuristic algorithms. (C) 2010 Elsevier Ltd. All rights reserved.
引用
收藏
页码:131 / 136
页数:6
相关论文
共 50 条
  • [41] An Indexing Framework for Efficient Visual Exploratory Subgraph Search in Graph Databases
    Wang, Chaohui
    Xie, Miao
    Bhowmick, Sourav S.
    Choi, Byron
    Xiao, Xiaokui
    Zhou, Shuigeng
    2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 1666 - 1669
  • [42] Efficient geometry-based similarity search of 3D spatial databases
    Keim, DA
    SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999: SIGMOD99: PROCEEDINGS OF THE 1999 ACM SIGMOD - INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 1999, : 419 - 430
  • [43] ISIS: A New Approach for Efficient Similarity Search in Sparse Databases
    Cui, Bin
    Zhao, Jiakui
    Cong, Gao
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PT II, PROCEEDINGS, 2010, 5982 : 231 - +
  • [44] A Hierarchical Bitmap Indexing Method for Similarity Search in High-Dimensional Multimedia Databases
    Nang, Jongho
    Park, Joohyoun
    Yang, Jihoon
    Kim, Saejoon
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2010, 26 (02) : 393 - 407
  • [45] Anticipatory DTW for Efficient Similarity Search in Time Series Databases
    Assent, Ira
    Wichterich, Marc
    Krieger, Ralph
    Kremer, Hardy
    Seidl, Thomas
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2009, 2 (01):
  • [46] An adaptive index structure for similarity search in large image databases
    Wu, P
    Manjunath, BS
    INTERNET MULTIMEDIA MANAGEMENT SYSTEMS II, 2001, 4519 : 32 - 41
  • [47] MidiFind: Similarity Search and Popularity Mining in Large MIDI Databases
    Xia, Guangyu
    Huang, Tongbo
    Ma, Yifei
    Dannenberg, Roger
    Faloutsos, Christos
    SOUND, MUSIC, AND MOTION, 2014, 8905 : 259 - 276
  • [48] Efficient indexing in trajectory databases
    Cha, Chang-Il
    Kim, Sang-Wook
    Won, Jung-Im
    Lee, Junghoon
    Bae, Duck-Ho
    International Journal of Database Theory and Application, 2008, 1 (01): : 21 - 28
  • [49] A novel indexing approach for efficient and fast similarity search of captured motions
    Li, Chuanjun
    Prabhakaran, B.
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2006, 3918 : 689 - 698
  • [50] Indexing expensive functions for efficient multi-dimensional similarity search
    Chen, Hanxiong
    Liu, Jianquan
    Furuse, Kazutaka
    Yu, Jeffrey Xu
    Ohbo, Nobuo
    KNOWLEDGE AND INFORMATION SYSTEMS, 2011, 27 (02) : 165 - 192