conLSH: Context based Locality Sensitive Hashing for mapping of noisy SMRT reads

被引:8
|
作者
Chakraborty, Angana [1 ]
Bandyopadhyay, Sanghamitra [1 ]
机构
[1] Indian Stat Inst, Machine Intelligence Unit, Kolkata, India
关键词
Locality Sensitive Hashing; Sequence analysis; Single Molecule Real-Time (SMRT) sequencing; Sequence alignment; PacBio dataset; Algorithm; NEAREST-NEIGHBOR; ALIGNMENT; ALGORITHMS;
D O I
10.1016/j.compbiolchem.2020.107206
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Single Molecule Real-Time (SMRT) sequencing is a recent advancement of Next Gen technology developed by Pacific Bio (PacBio). It comes with an explosion of long and noisy reads demanding cutting edge research to get most out of it. To deal with the high error probability of SMRT data, a novel contextual Locality Sensitive Hashing (conLSH) based algorithm is proposed in this article, which can effectively align the noisy SMRT reads to the reference genome. Here, sequences are hashed together based not only on their closeness, but also on similarity of context. The algorithm has O(n(P+1)) space requirement, where n is the number of sequences in the corpus and p is a constant. The indexing time and querying time are bounded by O(n(p+1).ln n/ln 1/P-2) and O(nP) respectively, where P-2 > O, is a probability value. This algorithm is particularly useful for retrieving similar sequences, a widely used task in biology. The proposed conLSH based aligner is compared with rHAT, popularly used for aligning SMRT reads, and is found to comprehensively beat it in speed as well as in memory requirements. In particular, it takes approximately 24.2% less processing time, while saving about 70.3% in peak memory requirement for H.sapiens PacBio dataset.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] S-conLSH: alignment-free gapped mapping of noisy long reads
    Angana Chakraborty
    Burkhard Morgenstern
    Sanghamitra Bandyopadhyay
    BMC Bioinformatics, 22
  • [2] S-conLSH: alignment-free gapped mapping of noisy long reads
    Chakraborty, Angana
    Morgenstern, Burkhard
    Bandyopadhyay, Sanghamitra
    BMC BIOINFORMATICS, 2021, 22 (01)
  • [3] Locality Sensitive Hashing Based Scalable Collaborative Filtering
    Aytekin, Ahmet Maruf
    Aytekin, Tevfik
    2015 23RD SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2015, : 1030 - 1033
  • [4] kngMap: Sensitive and Fast Mapping Algorithm for Noisy Long Reads Based on the K-Mer Neighborhood Graph
    Wei, Ze-Gang
    Fan, Xing-Guo
    Zhang, Hao
    Zhang, Xiao-Dan
    Liu, Fei
    Qian, Yu
    Zhang, Shao-Wu
    FRONTIERS IN GENETICS, 2022, 13
  • [5] Local Tensor Completion Based on Locality Sensitive Hashing
    Xie, Kun
    Chen, Yuxiang
    Wang, Xin
    Xie, Gaogang
    Wen, Jigang
    Zhang, Dafang
    2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2018, : 1212 - 1215
  • [6] Learning-based distributed locality sensitive hashing
    Shi, Jia
    Liu, Zhaobin
    Li, Zhiyang
    Liu, Chang
    Qu, Wenyu
    COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2017, 32 (06): : 491 - 497
  • [7] LSHSIM: A Locality Sensitive Hashing based method for multiple-point geostatistics
    Moura, Pedro
    Laber, Eduardo
    Lopes, Helio
    Mesejo, Daniel
    Pavanelli, Lucas
    Jardim, Joao
    Thiesen, Francisco
    Pujol, Gabriel
    COMPUTERS & GEOSCIENCES, 2017, 107 : 49 - 60
  • [8] invMap: a sensitive mapping tool for long noisy reads with inversion structural variants
    Wei, Ze-Gang
    Bu, Peng-Yu
    Zhang, Xiao-Dan
    Liu, Fei
    Qian, Yu
    Wu, Fang-Xiang
    BIOINFORMATICS, 2023, 39 (12)
  • [9] Locality Sensitive Hashing for ECG-based Subject Identification
    Alotaiby, Turky N.
    Alhakbani, Alanoud
    Alwhibi, Nujood
    Alotaibi, Gaseb
    Alshebeili, Saleh A.
    2019 INTERNATIONAL CONFERENCE ON ELECTRICAL AND COMPUTING TECHNOLOGIES AND APPLICATIONS (ICECTA), 2019,
  • [10] Locality Sensitive Hashing Index Based on Optimal Linear Order
    Feng X.-K.
    Peng Y.-G.
    Cui J.-T.
    Liu Y.-F.
    Li H.
    Jisuanji Xuebao/Chinese Journal of Computers, 2020, 43 (05): : 930 - 947