Efficient Computation of Sequence Mappability

被引:3
|
作者
Alzamel, Mai [1 ,2 ]
Charalampopoulos, Panagiotis [1 ]
Iliopoulos, Costas S. [1 ]
Kociumaka, Tomasz [3 ]
Pissis, Solon P. [1 ]
Radoszewski, Jakub [3 ]
Straszynski, Juliusz [3 ]
机构
[1] Kings Coll London, Dept Informat, London, England
[2] King Saud Univ, Dept Comp Sci, Riyadh, Saudi Arabia
[3] Univ Warsaw, Inst Informat, Warsaw, Poland
来源
STRING PROCESSING AND INFORMATION RETRIEVAL, SPIRE 2018 | 2018年 / 11147卷
关键词
Sequence mappability; Hamming distance; Genome sequencing; Longest common substring with k mismatches; Suffix tree; SUFFIX TREE CONSTRUCTION; ALGORITHM;
D O I
10.1007/978-3-030-00479-8_2
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sequence mappability is an important task in genome resequencing. In the (k, m)-mappability problem, for a given sequence T of length n, our goal is to compute a table whose ith entry is the number of indices j not equal i such that length-m substrings of T starting at positions i and j have at most k mismatches. Previous works on this problem focused on heuristic approaches to compute a rough approximation of the result or on the case of k = 1. We present several efficient algorithms for the general case of the problem. Our main result is an algorithm that works in O(n min{m(k), log(k+1)n}) time and O(n) space for k = O(1). It requires a careful adaptation of the technique of Cole et al. [STOC 2004] to avoid multiple counting of pairs of substrings. We also show O(n(2))- time algorithms to compute all results for a fixed m and all k = 0, ..., m or a fixed k and all m = k, ..., n - 1. Finally we show that the (k, m)mappability problem cannot be solved in strongly subquadratic time for k,m = circle minus(log n) unless the Strong Exponential Time Hypothesis fails.
引用
收藏
页码:12 / 26
页数:15
相关论文
共 50 条
  • [1] Efficient Computation of Sequence Mappability
    Charalampopoulos, Panagiotis
    Iliopoulos, Costas S.
    Kociumaka, Tomasz
    Pissis, Solon P.
    Radoszewski, Jakub
    Straszynski, Juliusz
    ALGORITHMICA, 2022, 84 (05) : 1418 - 1440
  • [2] Efficient Computation of Sequence Mappability
    Panagiotis Charalampopoulos
    Costas S. Iliopoulos
    Tomasz Kociumaka
    Solon P. Pissis
    Jakub Radoszewski
    Juliusz Straszyński
    Algorithmica, 2022, 84 : 1418 - 1440
  • [3] Faster Computation of Genome Mappability
    Hooshmand, Sahar
    Abedin, Paniz
    Gibney, Daniel
    Aluru, Srinivas
    Thankachan, Sharma, V
    ACM-BCB'18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS, 2018, : 537 - 537
  • [4] Faster Computation of Genome Mappability with one Mismatch
    Hooshmand, Sahar
    Abedin, Paniz
    Gibney, Daniel
    Aluru, Srinivas
    Thankachan, Sharma V.
    2018 IEEE 8TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL ADVANCES IN BIO AND MEDICAL SCIENCES (ICCABS), 2018,
  • [5] Fast Computation and Applications of Genome Mappability
    Derrien, Thomas
    Estelle, Jordi
    Marco Sola, Santiago
    Knowles, David G.
    Raineri, Emanuele
    Guigo, Roderic
    Ribeca, Paolo
    PLOS ONE, 2012, 7 (01):
  • [6] GenMap: ultra-fast computation of genome mappability
    Pockrandt, Christopher
    Alzamel, Mai
    Iliopoulos, Costas S.
    Reinert, Knut
    BIOINFORMATICS, 2020, 36 (12) : 3687 - 3692
  • [7] Faster Algorithms for 1-Mappability of a Sequence
    Alzamel, Mai
    Charalampopoulos, Panagiotis
    Iliopoulos, Costas S.
    Pissis, Solon P.
    Radoszewski, Jakub
    Sung, Wing-Kin
    COMBINATORIAL OPTIMIZATION AND APPLICATIONS, COCOA 2017, PT II, 2017, 10628 : 109 - 121
  • [8] Faster algorithms for 1-mappability of a sequence
    Alzamel, Mai
    Charalampopoulos, Panagiotis
    Iliopoulos, Costas S.
    Pissis, Solon P.
    Radoszewski, Jakub
    Sung, Wing-Kin
    THEORETICAL COMPUTER SCIENCE, 2020, 812 (812) : 2 - 12
  • [10] Efficient computation of balancedness in binary sequence generators
    García-Mochales, P
    Fúster-Sabater, A
    STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS, 2004, 3246 : 269 - 270