conLSH: Context based Locality Sensitive Hashing for mapping of noisy SMRT reads

被引:8
|
作者
Chakraborty, Angana [1 ]
Bandyopadhyay, Sanghamitra [1 ]
机构
[1] Indian Stat Inst, Machine Intelligence Unit, Kolkata, India
关键词
Locality Sensitive Hashing; Sequence analysis; Single Molecule Real-Time (SMRT) sequencing; Sequence alignment; PacBio dataset; Algorithm; NEAREST-NEIGHBOR; ALIGNMENT; ALGORITHMS;
D O I
10.1016/j.compbiolchem.2020.107206
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Single Molecule Real-Time (SMRT) sequencing is a recent advancement of Next Gen technology developed by Pacific Bio (PacBio). It comes with an explosion of long and noisy reads demanding cutting edge research to get most out of it. To deal with the high error probability of SMRT data, a novel contextual Locality Sensitive Hashing (conLSH) based algorithm is proposed in this article, which can effectively align the noisy SMRT reads to the reference genome. Here, sequences are hashed together based not only on their closeness, but also on similarity of context. The algorithm has O(n(P+1)) space requirement, where n is the number of sequences in the corpus and p is a constant. The indexing time and querying time are bounded by O(n(p+1).ln n/ln 1/P-2) and O(nP) respectively, where P-2 > O, is a probability value. This algorithm is particularly useful for retrieving similar sequences, a widely used task in biology. The proposed conLSH based aligner is compared with rHAT, popularly used for aligning SMRT reads, and is found to comprehensively beat it in speed as well as in memory requirements. In particular, it takes approximately 24.2% less processing time, while saving about 70.3% in peak memory requirement for H.sapiens PacBio dataset.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] Accelerating Large Scale Centroid-Based Clustering with Locality Sensitive Hashing
    McConville, Ryan
    Cao, Xin
    Liu, Weiru
    Miller, Paul
    2016 32ND IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2016, : 649 - 660
  • [22] An adaptive mean shift clustering algorithm based on locality-sensitive hashing
    Zhang, Xinhong
    Cui, Yanbin
    Li, Duoyi
    Liu, Xianxing
    Zhang, Fan
    OPTIK, 2012, 123 (20): : 1891 - 1894
  • [23] Query by Humming by Using Locality Sensitive Hashing based on Combination of Pitch and Note
    Wang, Qiang
    Guo, Zhiyuan
    Liu, Gang
    Guo, Jun
    Lu, Yueming
    2012 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), 2012, : 302 - 307
  • [24] Locality-Sensitive Hashing Scheme Based on Heap Sort of Hash Bucket
    Fang, Bo
    Hua, Zhongyun
    Huang, Hejiao
    14TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND EDUCATION (ICCSE 2019), 2019, : 5 - 10
  • [25] Privacy-preserving Distributed Service Recommendation based on Locality-Sensitive Hashing
    Qi, Lianyong
    Xiang, Haolong
    Dou, Wanchun
    Yang, Chi
    Qin, Yongrui
    Zhang, Xuyun
    2017 IEEE 24TH INTERNATIONAL CONFERENCE ON WEB SERVICES (ICWS 2017), 2017, : 49 - 56
  • [26] Chrysanthemum Petal Similarity Evaluation Based on Multi-probe Locality Sensitive Hashing
    Yuan P.
    Zhai Z.
    Qian S.
    Xu H.
    Nongye Jixie Xuebao/Transactions of the Chinese Society for Agricultural Machinery, 2019, 50 (07): : 208 - 215
  • [27] A robust method based on locality sensitive hashing for K-nearest neighbors searching
    Cheng, Dongdong
    Huang, Jinlong
    Zhang, Sulan
    Wu, Quanwang
    WIRELESS NETWORKS, 2024, 30 (05) : 4195 - 4208
  • [28] MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data
    Wang, Jingjing
    Lin, Chen
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2015, 2015
  • [29] Efficient Data Stream Clustering with Sliding Windows based on Locality-Sensitive Hashing
    Youn, Jonghem
    Shim, Junho
    Lee, Sang-Goo
    IEEE ACCESS, 2018, 6 : 63757 - 63776
  • [30] Parallel set similarity join on big data based on Locality-Sensitive Hashing
    Sohrabi, Mohammad Karim
    Azgomi, Hosseion
    SCIENCE OF COMPUTER PROGRAMMING, 2017, 145 : 1 - 12