Mapinsights: deep exploration of quality issues and error profiles in high-throughput sequence data

被引:6
作者
Das, Subrata [1 ]
Biswas, Nidhan K. [1 ]
Basu, Analabha [1 ]
机构
[1] Natl Inst Biomed Genom, Kalyani 741251, W Bengal, India
关键词
GENERATION; DISCOVERY; FRAMEWORK; GENOMICS; BASE;
D O I
10.1093/nar/gkad539
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
High-throughput sequencing (HTS) has revolutionized science by enabling super-fast detection of genomic variants at base-pair resolution. Consequently, it poses the challenging problem of identification of technical artifacts, i.e. hidden non-random error patterns. Understanding the properties of sequencing artifacts holds the key in separating true variants from false positives. Here, we develop Mapinsights, a toolkit that performs quality control (QC) analysis of sequence alignment files, capable of detecting outliers based on sequencing artifacts of HTS data at a deeper resolution compared with existing methods. Mapinsights performs a cluster analysis based on novel and existing QC features derived from the sequence alignment for outlier detection. We applied Mapinsights on community standard open-source datasets and identified various quality issues including technical errors related to sequencing cycles, sequencing chemistry, sequencing libraries and across various orthogonal sequencing platforms. Mapinsights also enables identification of anomalies related to sequencing depth. A logistic regression-based model built on the features of Mapinsights shows high accuracy in detecting 'low-confidence' variant sites. Quantitative estimates and probabilistic arguments provided by Mapinsights can be utilized in identifying errors, bias and outlier samples, and also aid in improving the authenticity of variant calls.
引用
收藏
页码:E75 / E75
页数:18
相关论文
共 64 条
[1]  
Abnizova I., 2017, J Proteomics Bioinform, V10, P1, DOI DOI 10.4172/JPB.1000420
[2]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[3]   A global reference for human genetic variation [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Wang, Jun ;
Wilson, Richard K. ;
Boerwinkle, Eric ;
Doddapaneni, Harsha ;
Han, Yi ;
Korchina, Viktoriya ;
Kovar, Christie ;
Lee, Sandra ;
Muzny, Donna ;
Reid, Jeffrey G. ;
Zhu, Yiming ;
Chang, Yuqi ;
Feng, Qiang ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Lan, Tianming ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Liu, Shengmao ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Tang, Meifang ;
Wang, Bo .
NATURE, 2015, 526 (7571) :68-+
[4]  
[Anonymous], 2017, bioRxiv
[5]   Deep whole-genome sequencing of 3 cancer cell lines on 2 sequencing platforms [J].
Arora, Kanika ;
Shah, Minita ;
Johnson, Molly ;
Sanghvi, Rashesh ;
Shelton, Jennifer ;
Nagulapalli, Kshithija ;
Oschwald, Dayna M. ;
Zody, Michael C. ;
Germer, Soren ;
Jobanputra, Vaidehi ;
Carter, Jade ;
Robine, Nicolas .
SCIENTIFIC REPORTS, 2019, 9 (1)
[6]   Next generation sequencing technology: Advances and applications [J].
Buermans, H. P. J. ;
den Dunnen, J. T. .
BIOCHIMICA ET BIOPHYSICA ACTA-MOLECULAR BASIS OF DISEASE, 2014, 1842 (10) :1932-1941
[7]   High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios [J].
Byrska-Bishop, Marta ;
Evani, Uday S. ;
Zhao, Xuefang ;
Basile, Anna O. ;
Abel, Haley J. ;
Regier, Allison A. ;
Corvelo, Andre ;
Clarke, Wayne E. ;
Musunuri, Rajeeva ;
Nagulapalli, Kshithija ;
Fairley, Susan ;
Runnels, Alexi ;
Winterkorn, Lara ;
Lowy, Ernesto ;
Flicek, Paul ;
Germer, Soren ;
Brand, Harrison ;
Hall, Ira M. ;
Talkowski, Michael E. ;
Narzisi, Giuseppe ;
Zody, Michael C. .
CELL, 2022, 185 (18) :3426-+
[8]   A Review on the Applications of Next Generation Sequencing Technologies as Applied to Food-Related Microbiome Studies [J].
Cao, Yu ;
Fanning, Seamus ;
Proos, Sinead ;
Jordan, Kieran ;
Srikumar, Shabarinath .
FRONTIERS IN MICROBIOLOGY, 2017, 8
[9]   Effective filtering strategies to improve data quality from population-based whole exome sequencing studies [J].
Carson, Andrew R. ;
Smith, Erin N. ;
Matsui, Hiroko ;
Braekkan, Sigrid K. ;
Jepsen, Kristen ;
Hansen, John-Bjarne ;
Frazer, Kelly A. .
BMC BIOINFORMATICS, 2014, 15
[10]   Cytosine Deamination Is a Major Cause of Baseline Noise in Next-Generation Sequencing [J].
Chen, Guoli ;
Mosier, Stacy ;
Gocke, Christopher D. ;
Lin, Ming-Tseh ;
Eshleman, James R. .
MOLECULAR DIAGNOSIS & THERAPY, 2014, 18 (05) :587-593