Robust and scalable barcoding for massively parallel long-read sequencing

被引:6
作者
Ezpeleta, Joaquin [1 ,2 ]
Garcia Labari, Ignacio [1 ]
Villanova, Gabriela Vanina [3 ,4 ]
Bulacio, Pilar [1 ,2 ]
Lavista-Llanos, Sofia [1 ]
Posner, Victoria [4 ]
Krsticevic, Flavia [5 ]
Arranz, Silvia [4 ]
Tapia, Elizabeth [1 ,2 ]
机构
[1] Ctr Int Franco Argentino Ciencias Informac & Sist, Rosario, Argentina
[2] Univ Nacl Rosario, Fac Ciencias Exactas Ingn & Agr, Rosario, Argentina
[3] Consejo Nacl Invest Cient & Tecn, Rosario, Argentina
[4] Univ Nacl Rosario, Fac Ciencias Bioquim & Farmaceut, Ctr Cient Tecnol Educ Acuario Rio Parana, Lab Mixto Biotecnol Acuat, Rosario, Argentina
[5] Hebrew Univ Jerusalem, Robert H Smith Fac Agr Food & Environm, Jerusalem, Israel
关键词
DNA; POLYMERASE; DIAGNOSIS; CODES; PCR;
D O I
10.1038/s41598-022-11656-0
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Nucleic-acid barcoding is an enabling technique for many applications, but its use remains limited in emerging long-read sequencing technologies with intrinsically low raw accuracy. Here, we apply so-called NS-watermark barcodes, whose error correction capability was previously validated in silico, in a proof of concept where we synthesize 3840 NS-watermark barcodes and use them to asymmetrically tag and simultaneously sequence amplicons from two evolutionarily distant species (namely Bordetella pertussis and Drosophila mojavensis) on the ONT MinION platform. To our knowledge, this is the largest number of distinct, non-random tags ever sequenced in parallel and the first report of microarray-based synthesis as a source for large oligonucleotide pools for barcoding. We recovered the identity of more than 86% of the barcodes, with a crosstalk rate of 0.17% (i.e., one misassignment every 584 reads). This falls in the range of the index hopping rate of established, high-accuracy Illumina sequencing, despite the increased number of tags and the relatively low accuracy of both microarray-based synthesis and long-read sequencing. The robustness of NS-watermark barcodes, together with their scalable design and compatibility with low-cost massive synthesis, makes them promising for present and future sequencing applications requiring massive labeling, such as long-read single-cell RNA-Seq.
引用
收藏
页数:10
相关论文
共 28 条
[1]  
[Anonymous], 1968, Information Theory and Reliable Communications
[2]   Discrimination of primer 3′-nucleotide mismatch by Taq DNA polymerase during polymerase chain reaction [J].
Ayyadevara, S ;
Thaden, JJ ;
Reis, RJS .
ANALYTICAL BIOCHEMISTRY, 2000, 284 (01) :11-18
[3]   Versatile design and synthesis platform for visualizing genomes with Oligopaint FISH probes [J].
Beliveau, Brian J. ;
Joyce, Eric F. ;
Apostolopoulos, Nicholas ;
Yilmaz, Feyza ;
Fonseka, Chamith Y. ;
McCole, Ruth B. ;
Chang, Yiming ;
Li, Jin Billy ;
Senaratne, Tharanga Niroshini ;
Williams, Benjamin R. ;
Rouillard, Jean-Marie ;
Wu, Chao-ting .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2012, 109 (52) :21301-21306
[4]  
Benvenuto C.J., 2012, Galois field in cryptography, V1, P1
[5]   Levenshtein error-correcting barcodes for multiplexed DNA sequencing [J].
Buschmann, Tilo ;
Bystrykh, Leonid V. .
BMC BIOINFORMATICS, 2013, 14
[6]   Reliable communication over channels with insertions, deletions, and substitutions [J].
Davey, MC ;
MacKay, DJC .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2001, 47 (02) :687-698
[7]   Decoding algorithms for nonbinary LDPC codes over GF(q) [J].
Declercq, David ;
Fossorier, Marc .
IEEE TRANSACTIONS ON COMMUNICATIONS, 2007, 55 (04) :633-643
[8]   Designing robust watermark barcodes for multiplex long-read sequencing [J].
Ezpeleta, Joaquin ;
Krsticevic, Flavia J. ;
Bulacio, Pilar ;
Tapia, Elizabeth .
BIOINFORMATICS, 2017, 33 (06) :807-813
[9]   Nested duplex PCR to detect Bordetella pertussis and Bordetella parapertussis and its application in diagnosis of pertussis in nonmetropolitan southeast Queensland, Australia [J].
Farrell, DJ ;
Daggard, G ;
Mukkur, TKS .
JOURNAL OF CLINICAL MICROBIOLOGY, 1999, 37 (03) :606-610
[10]   Single-cell isoform RNA sequencing characterizes isoforms in thousands of cerebellar cells [J].
Gupta, Ishaan ;
Collier, Paul G. ;
Haase, Bettina ;
Mahfouz, Ahmed ;
Joglekar, Anoushka ;
Floyd, Taylor ;
Koopmans, Frank ;
Barres, Ben ;
Smit, August B. ;
Sloan, Steven A. ;
Luo, Wenjie ;
Fedrigo, Olivier ;
Ross, M. Elizabeth ;
Tilgner, Hagen U. .
NATURE BIOTECHNOLOGY, 2018, 36 (12) :1197-+