SALT: a fast, memory-efficient and SNP-aware short read alignment tool

被引:0
作者
Quan, Wei [1 ]
Liu, Bo [1 ]
Wang, Yadong [1 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin, Peoples R China
来源
2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM) | 2019年
关键词
HTS; alignment; SNP-aware; BWT; SPACE;
D O I
暂无
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
DNA sequence alignment tools play an essential role in genomics and genetics. The accuracy of the alignment directly affects the accuracy of downstream analysis, such as variant calling, so it is essential to map reads to the reference genome rapidly and accurately. It has become an essential topic in the field of bioinformatics. Conventional read aligners map reads to a linear reference genome (such as GRCh38 primary). However, the linear reference genome only represents one or a few individuals of genomes, which lacks the variation information in population. It can introduce bias and impact sensitivity and accuracy of mapping. Recently, a few aligners are beginning to map reads to a graph that captures the entire human genome along with a large number of variants. However, compared to linear reference aligners, storing and indexing all genetic variants require costly memory(RAM) space and make extremely long runtime. Aligning reads to a graph model-based index, including the whole set of variants, is ultimately an NP-hard problem in theory. Considering only SNPs information will reduce the complexity of index and improve the speed of alignments. Herein, we present an SNP-aware alignment tool (SALT) that aligns reads to a reference genome that incorporates the SNP database. The SALT is benchmarked both on simulated reads and the real dataset. The results demonstrate that SALT can efficiently map reads to the reference genome, and significantly improve accuracy and sensitivity. Read alignment incorporating SNPs information can improve the sensitivity and accuracy of the read alignment. Moreover, it helps to discover novel variants. SALT is distributed under the GNU General Public License (GPL).
引用
收藏
页码:1774 / 1779
页数:6
相关论文
共 19 条
[1]  
[Anonymous], ALIGNING SEQUENCE RE, DOI DOI 10.48550/ARXIV.1303.3997
[2]  
Burrows M., 1994, Algorithm, Data Compression
[3]   Opportunistic data structures with applications [J].
Ferragina, P ;
Manzini, G .
41ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2000, :390-398
[4]  
Garrison E., 2018, NATURE BIOTECHNOLOGY
[5]  
Holtgrewe Manuel., 2010, MASON READ SIMULATOR
[6]   A space and time efficient algorithm for constructing compressed suffix arrays [J].
Hon, Wing-Kai ;
Lam, Tak-Wah ;
Sadakane, Kunihiko ;
Sung, Wing-Kin ;
Yiu, Siu-Ming .
ALGORITHMICA, 2007, 48 (01) :23-36
[7]   Short read alignment with populations of genomes [J].
Huang, Lin ;
Popic, Victoria ;
Batzoglou, Serafim .
BIOINFORMATICS, 2013, 29 (13) :361-370
[8]   Fast BWT in small space by blockwise suffix sorting [J].
Karkkainen, Juha .
THEORETICAL COMPUTER SCIENCE, 2007, 387 (03) :249-257
[9]   FAST PARALLEL AND SERIAL APPROXIMATE STRING MATCHING [J].
LANDAU, GM ;
VISHKIN, U .
JOURNAL OF ALGORITHMS, 1989, 10 (02) :157-169
[10]  
Langmead B, 2012, NAT METHODS, V9, P357, DOI [10.1038/NMETH.1923, 10.1038/nmeth.1923]