Alignment-free sequence comparison using absent words

被引:17
|
作者
Charalampopoulos, Panagiotis [1 ]
Crochemore, Maxime [1 ]
Fici, Gabriele [2 ]
Mercas, Robert [3 ]
Pissis, Solon R. [1 ]
机构
[1] Kings Coll London, Dept Informat, London, England
[2] Univ Palermo, Dipt Matemat & Informat, Palermo, Italy
[3] Loughborough Univ, Dept Comp Sci, Loughborough, Leics, England
关键词
Sequence comparison; Absent words; Forbidden words; Circular words; q-grams;
D O I
10.1016/j.ic.2018.06.002
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Sequence comparison is a prerequisite to virtually all comparative genomic analyses. It is often realised by sequence alignment techniques, which are computationally expensive. This has led to increased research into alignment-free techniques, which are based on measures referring to the composition of sequences in terms of their constituent patterns. These measures, such as q-gram distance, are usually computed in time linear with respect to the length of the sequences. In this paper, we focus on the complementary idea: how two sequences can be efficiently compared based on information that does not occur in the sequences. A word is an absent word of some sequence if it does not occur in the sequence. An absent word is minimal if all its proper factors occur in the sequence. Here we present the first linear-time and linear-space algorithm to compare two sequences by considering all their minimal absent words. In the process, we present results of combinatorial interest, and also extend the proposed techniques to compare circular sequences. We also present an algorithm that, given a word x of length n, computes the largest integer for which all factors of x of that length occur in some minimal absent word of x in time and space O(n). Finally, we show that the known asymptotic upper bound on the number of minimal absent words of a word is tight. (C) 2018 Elsevier Inc. All rights reserved.
引用
收藏
页码:57 / 68
页数:12
相关论文
共 50 条
  • [1] Alignment-free sequence comparison - a review
    Vinga, S
    Almeida, J
    BIOINFORMATICS, 2003, 19 (04) : 513 - 523
  • [2] Alignment-free sequence comparison using joint frequency and position information of k-words
    Han, Gyu-Bum
    Chung, Byung Chang
    Cho, Dong-Ho
    2017 39TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2017, : 3880 - 3883
  • [3] Multiple alignment-free sequence comparison
    Ren, Jie
    Song, Kai
    Sun, Fengzhu
    Deng, Minghua
    Reinert, Gesine
    BIOINFORMATICS, 2013, 29 (21) : 2690 - 2698
  • [4] Extraction of high quality k-words for alignment-free sequence comparison
    Gunasinghe, Upuli
    Alahakoon, Damminda
    Bedingfield, Susan
    JOURNAL OF THEORETICAL BIOLOGY, 2014, 358 : 31 - 51
  • [5] An Algorithm for Alignment-free Sequence Comparison using Logical Match
    Shanker, Sanil
    Austin, Jim
    Sherly, Elizabeth
    2010 2ND INTERNATIONAL CONFERENCE ON COMPUTER AND AUTOMATION ENGINEERING (ICCAE 2010), VOL 3, 2010, : 536 - 538
  • [6] A probabilistic measure for alignment-free sequence comparison
    Pham, TD
    Zuegg, J
    BIOINFORMATICS, 2004, 20 (18) : 3455 - 3461
  • [7] Benchmarking of alignment-free sequence comparison methods
    Zielezinski, Andrzej
    Girgis, Hani Z.
    Bernard, Guillaume
    Leimeister, Chris-Andre
    Tang, Kujin
    Dencker, Thomas
    Lau, Anna Katharina
    Roehling, Sophie
    Choi, Jae Jin
    Waterman, Michael S.
    Comin, Matteo
    Kim, Sung-Hou
    Vinga, Susana
    Almeida, Jonas S.
    Chan, Cheong Xin
    James, Benjamin T.
    Sun, Fengzhu
    Morgenstern, Burkhard
    Karlowski, Wojciech M.
    GENOME BIOLOGY, 2019, 20 (1)
  • [8] Benchmarking of alignment-free sequence comparison methods
    Andrzej Zielezinski
    Hani Z. Girgis
    Guillaume Bernard
    Chris-Andre Leimeister
    Kujin Tang
    Thomas Dencker
    Anna Katharina Lau
    Sophie Röhling
    Jae Jin Choi
    Michael S. Waterman
    Matteo Comin
    Sung-Hou Kim
    Susana Vinga
    Jonas S. Almeida
    Cheong Xin Chan
    Benjamin T. James
    Fengzhu Sun
    Burkhard Morgenstern
    Wojciech M. Karlowski
    Genome Biology, 20
  • [9] Alignment-free genomic sequence comparison using FCGR and signal processing
    Lichtblau, Daniel
    BMC BIOINFORMATICS, 2019, 20 (01)
  • [10] Alignment-free genomic sequence comparison using FCGR and signal processing
    Daniel Lichtblau
    BMC Bioinformatics, 20