Improving alignment accuracy on homopolymer regions for semiconductor-based sequencing technologies

被引:21
作者
Feng, Weixing [1 ]
Zhao, Sen [1 ]
Xue, Dingkai [1 ]
Song, Fengfei [1 ]
Li, Ziwei [1 ]
Chen, Duojiao [1 ]
He, Bo [1 ]
Hao, Yangyang [2 ]
Wang, Yadong [3 ]
Liu, Yunlong [1 ,2 ]
机构
[1] Harbin Engn Univ, Automat Coll, Harbin 150001, Heilongjiang, Peoples R China
[2] Indiana Univ Sch Med, Ctr Computat Biol & Bioinformat, Indianapolis, IN 46202 USA
[3] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin 150001, Heilongjiang, Peoples R China
关键词
Homopolymer; Ion Torrent/Proton; Bayesian; Alignment; ION TORRENT; GENOME;
D O I
10.1186/s12864-016-2894-9
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Ion Torrent and Ion Proton are semiconductor-based sequencing technologies that feature rapid sequencing speed and low upfront and operating costs, thanks to the avoidance of modified nucleotides and optical measurements. Despite of these advantages, however, Ion semiconductor sequencing technologies suffer much reduced sequencing accuracy at the genomic loci with homopolymer repeats of the same nucleotide. Such limitation significantly reduces its efficiency for the biological applications aiming at accurately identifying various genetic variants. Results: In this study, we propose a Bayesian inference-based method that takes the advantage of the signal distributions of the electrical voltages that are measured for all the homopolymers of a fixed length. By cross-referencing the length of homopolymers in the reference genome and the voltage signal distribution derived from the experiment, the proposed integrated model significantly improves the alignment accuracy around the homopolymer regions. Conclusions: Besides improving alignment accuracy on homopolymer regions for semiconductor-based sequencing technologies with the proposed model, similar strategies can also be used on other high-throughput sequencing technologies that share similar limitations.
引用
收藏
页数:7
相关论文
共 14 条
[1]   Next-generation DNA sequencing techniques [J].
Ansorge, Wilhelm J. .
NEW BIOTECHNOLOGY, 2009, 25 (04) :195-203
[2]   Substantial biases in ultra-short read data sets from high-throughput DNA sequencing [J].
Dohm, Juliane C. ;
Lottaz, Claudio ;
Borodina, Tatiana ;
Himmelbauer, Heinz .
NUCLEIC ACIDS RESEARCH, 2008, 36 (16)
[3]   Mapping short DNA sequencing reads and calling variants using mapping quality scores [J].
Li, Heng ;
Ruan, Jue ;
Durbin, Richard .
GENOME RESEARCH, 2008, 18 (11) :1851-1858
[4]   FAAST: Flow-space Assisted Alignment Search Tool [J].
Lysholm, Fredrik ;
Andersson, Bjorn ;
Persson, Bengt .
BMC BIOINFORMATICS, 2011, 12
[5]   Genome sequencing in microfabricated high-density picolitre reactors [J].
Margulies, M ;
Egholm, M ;
Altman, WE ;
Attiya, S ;
Bader, JS ;
Bemben, LA ;
Berka, J ;
Braverman, MS ;
Chen, YJ ;
Chen, ZT ;
Dewell, SB ;
Du, L ;
Fierro, JM ;
Gomes, XV ;
Godwin, BC ;
He, W ;
Helgesen, S ;
Ho, CH ;
Irzyk, GP ;
Jando, SC ;
Alenquer, MLI ;
Jarvie, TP ;
Jirage, KB ;
Kim, JB ;
Knight, JR ;
Lanza, JR ;
Leamon, JH ;
Lefkowitz, SM ;
Lei, M ;
Li, J ;
Lohman, KL ;
Lu, H ;
Makhijani, VB ;
McDade, KE ;
McKenna, MP ;
Myers, EW ;
Nickerson, E ;
Nobile, JR ;
Plant, R ;
Puc, BP ;
Ronan, MT ;
Roth, GT ;
Sarkis, GJ ;
Simons, JF ;
Simpson, JW ;
Srinivasan, M ;
Tartaro, KR ;
Tomasz, A ;
Vogt, KA ;
Volkmer, GA .
NATURE, 2005, 437 (7057) :376-380
[6]   Progress in Ion Torrent semiconductor chip based sequencing [J].
Merriman, Barry ;
Rothberg, Jonathan M. .
ELECTROPHORESIS, 2012, 33 (23) :3397-3417
[7]   Comparison of Sequencing Platforms for Single Nucleotide Variant Calls in a Human Sample [J].
Ratan, Aakrosh ;
Miller, Webb ;
Guillory, Joseph ;
Stinson, Jeremy ;
Seshagiri, Somasekar ;
Schuster, Stephan C. .
PLOS ONE, 2013, 8 (02)
[8]   The development and impact of 454 sequencing [J].
Rothberg, Jonathan M. ;
Leamon, John H. .
NATURE BIOTECHNOLOGY, 2008, 26 (10) :1117-1124
[9]   Next-generation sequencing transforms today's biology [J].
Schuster, Stephan C. .
NATURE METHODS, 2008, 5 (01) :16-18
[10]   Using quality scores and longer reads improves accuracy of Solexa read mapping [J].
Smith, Andrew D. ;
Xuan, Zhenyu ;
Zhang, Michael Q. .
BMC BIOINFORMATICS, 2008, 9 (1)