ReSeq simulates realistic Illumina high-throughput sequencing data

被引:11
作者
Schmeing, Stephan [1 ,2 ]
Robinson, Mark D. [1 ,2 ]
机构
[1] Univ Zurich, Inst Mol Life Sci, Winterthurerstr 190, CH-8057 Zurich, Switzerland
[2] SIB Swiss Inst Bioinformat, Winterthurerstr 190, CH-8057 Zurich, Switzerland
关键词
Simulation; Genomic; High-throughput sequencing; Illumina; ERROR PROFILES; RNA-SEQ; BIAS; QUALITY; BENCHMARKING; DISCOVERY; RESOURCE; GENOMES; SNP;
D O I
10.1186/s13059-021-02265-7
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
In high-throughput sequencing data, performance comparisons between computational tools are essential for making informed decisions at each step of a project. Simulations are a critical part of method comparisons, but for standard Illumina sequencing of genomic DNA, they are often oversimplified, which leads to optimistic results for most tools. ReSeq improves the authenticity of synthetic data by extracting and reproducing key components from real data. Major advancements are the inclusion of systematic errors, a fragment-based coverage model and sampling-matrix estimates based on two-dimensional margins. These improvements lead to more faithful performance evaluations. ReSeq is available at https://github.com/schmeing/ReSeq.
引用
收藏
页数:37
相关论文
共 68 条
[1]   Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries [J].
Aird, Daniel ;
Ross, Michael G. ;
Chen, Wei-Sheng ;
Danielsson, Maxwell ;
Fennell, Timothy ;
Russ, Carsten ;
Jaffe, David B. ;
Nusbaum, Chad ;
Gnirke, Andreas .
GENOME BIOLOGY, 2011, 12 (02)
[2]   A broad survey of DNA sequence data simulation tools [J].
Alosaimi, Shatha ;
Bandiang, Armand ;
van Biljon, Noelle ;
Awany, Denis ;
Thami, Prisca K. ;
Tchamga, Milaine S. S. ;
Kiran, Anmol ;
Messaoud, Olfa ;
Hassan, Radia Ismaeel Mohammed ;
Mugo, Jacquiline ;
Ahmed, Azza ;
Bope, Christian D. ;
Allali, Imane ;
Mazandu, Gaston K. ;
Mulder, Nicola J. ;
Chimusa, Emile R. .
BRIEFINGS IN FUNCTIONAL GENOMICS, 2020, 19 (01) :49-59
[3]  
[Anonymous], Illumina Adapter Sequences (no date). Available at
[4]  
[Anonymous], arXiv
[5]  
[Anonymous], Oligos and Primers for BGISEQ/DNBSEQ/MGISEQ Library Preparation
[6]  
[Anonymous], 2019, READ TRIMMING IS NOT, DOI [10.1101/833962, DOI 10.1101/833962]
[7]  
[Anonymous], 2013, ARXIV13033997V2
[8]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[9]   Summarizing and correcting the GC content bias in high-throughput sequencing [J].
Benjamini, Yuval ;
Speed, Terence P. .
NUCLEIC ACIDS RESEARCH, 2012, 40 (10) :e72
[10]   Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species [J].
Bradnam, Keith R. ;
Fass, Joseph N. ;
Alexandrov, Anton ;
Baranay, Paul ;
Bechner, Michael ;
Birol, Inanc ;
Boisvert, Sebastien ;
Chapman, Jarrod A. ;
Chapuis, Guillaume ;
Chikhi, Rayan ;
Chitsaz, Hamidreza ;
Chou, Wen-Chi ;
Corbeil, Jacques ;
Del Fabbro, Cristian ;
Docking, T. Roderick ;
Durbin, Richard ;
Earl, Dent ;
Emrich, Scott ;
Fedotov, Pavel ;
Fonseca, Nuno A. ;
Ganapathy, Ganeshkumar ;
Gibbs, Richard A. ;
Gnerre, Sante ;
Godzaridis, Elenie ;
Goldstein, Steve ;
Haimel, Matthias ;
Hall, Giles ;
Haussler, David ;
Hiatt, Joseph B. ;
Ho, Isaac Y. ;
Howard, Jason ;
Hunt, Martin ;
Jackman, Shaun D. ;
Jaffe, David B. ;
Jarvis, Erich D. ;
Jiang, Huaiyang ;
Kazakov, Sergey ;
Kersey, Paul J. ;
Kitzman, Jacob O. ;
Knight, James R. ;
Koren, Sergey ;
Lam, Tak-Wah ;
Lavenier, Dominique ;
Laviolette, Francois ;
Li, Yingrui ;
Li, Zhenyu ;
Liu, Binghang ;
Liu, Yue ;
Luo, Ruibang ;
MacCallum, Iain .
GIGASCIENCE, 2013, 2