Alignment-free sequence comparison using absent words

被引:17
|
作者
Charalampopoulos, Panagiotis [1 ]
Crochemore, Maxime [1 ]
Fici, Gabriele [2 ]
Mercas, Robert [3 ]
Pissis, Solon R. [1 ]
机构
[1] Kings Coll London, Dept Informat, London, England
[2] Univ Palermo, Dipt Matemat & Informat, Palermo, Italy
[3] Loughborough Univ, Dept Comp Sci, Loughborough, Leics, England
关键词
Sequence comparison; Absent words; Forbidden words; Circular words; q-grams;
D O I
10.1016/j.ic.2018.06.002
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Sequence comparison is a prerequisite to virtually all comparative genomic analyses. It is often realised by sequence alignment techniques, which are computationally expensive. This has led to increased research into alignment-free techniques, which are based on measures referring to the composition of sequences in terms of their constituent patterns. These measures, such as q-gram distance, are usually computed in time linear with respect to the length of the sequences. In this paper, we focus on the complementary idea: how two sequences can be efficiently compared based on information that does not occur in the sequences. A word is an absent word of some sequence if it does not occur in the sequence. An absent word is minimal if all its proper factors occur in the sequence. Here we present the first linear-time and linear-space algorithm to compare two sequences by considering all their minimal absent words. In the process, we present results of combinatorial interest, and also extend the proposed techniques to compare circular sequences. We also present an algorithm that, given a word x of length n, computes the largest integer for which all factors of x of that length occur in some minimal absent word of x in time and space O(n). Finally, we show that the known asymptotic upper bound on the number of minimal absent words of a word is tight. (C) 2018 Elsevier Inc. All rights reserved.
引用
收藏
页码:57 / 68
页数:12
相关论文
共 50 条
  • [41] An effective extension of the applicability of alignment-free biological sequence comparison algorithms with Hadoop
    Giuseppe Cattaneo
    Umberto Ferraro Petrillo
    Raffaele Giancarlo
    Gianluca Roscigno
    The Journal of Supercomputing, 2017, 73 : 1467 - 1483
  • [42] Alignment-Free Sequence Comparison Based on Next-Generation Sequencing Reads
    Song, Kai
    Ren, Jie
    Zhai, Zhiyuan
    Liu, Xuemei
    Deng, Minghua
    Sun, Fengzhu
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2013, 20 (02) : 64 - 79
  • [43] Alignment-Free Sequence Comparison: A Systematic Survey From a Machine Learning Perspective
    Bohnsack, Katrin Sophie
    Kaden, Marika
    Abel, Julia
    Villmann, Thomas
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2023, 20 (01) : 119 - 135
  • [44] Numerical Characterization of DNA Sequences for Alignment-free Sequence Comparison-A Review
    Ramanathan, Natarajan
    Ramamurthy, Jayalakshmi
    Natarajan, Ganapathy
    COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING, 2022, 25 (03) : 365 - 380
  • [45] Study on the Relation between Virus and Host Cell by Alignment-free Sequence Comparison
    Liu Xue-mei
    Zang Xiang
    Huang Tian-lai
    Yang Zhe
    Li Wen
    Ye Yu-zhong
    Hu Shan
    Li Jing
    PROCEEDINGS OF 2016 12TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS), 2016, : 391 - 394
  • [46] An alignment-free model for comparison of regulatory sequences
    Koohy, Hashem
    Dyer, Nigel P.
    Reid, John E.
    Koentges, Georgy
    Ott, Sascha
    BIOINFORMATICS, 2010, 26 (19) : 2391 - 2397
  • [47] Local decoding of sequences and alignment-free comparison
    Didier, Gilles
    Laprevotte, Ivan
    Pupin, Maude
    Henaut, Alain
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2006, 13 (08) : 1465 - 1476
  • [48] Alignment-free protein interaction network comparison
    Ali, Waqar
    Rito, Tiago
    Reinert, Gesine
    Sun, Fengzhu
    Deane, Charlotte M.
    BIOINFORMATICS, 2014, 30 (17) : I430 - I437
  • [49] Alignment-free method for DNA sequence clustering using Fuzzy integral similarity
    Saw, Ajay Kumar
    Raj, Garima
    Das, Manashi
    Talukdar, Narayan Chandra
    Tripathy, Binod Chandra
    Nandi, Soumyadeep
    SCIENTIFIC REPORTS, 2019, 9 (1)
  • [50] Positional difference and Frequency (PdF) based alignment-free technique for genome sequence comparison
    Dey, Sudeshna
    Ghosh, Papri
    Das, Subhram
    JOURNAL OF BIOMOLECULAR STRUCTURE & DYNAMICS, 2024, 42 (23): : 12660 - 12688