Improvement of large copy number variant detection by whole genome nanopore sequencing

被引:6
作者
Cuenca-Guardiola, Javier [1 ]
de la Morena-Barrio, Belen [2 ,4 ]
Garcia, Juan L. [3 ]
Sanchis-Juan, Alba [4 ,5 ]
Corral, Javier [2 ]
Fernandez-Breis, Jesualdo T. [1 ]
机构
[1] Univ Murcia, Fac Informat, Dept Informat & Sistemas, CEIR Campus Mare Nostrum,IMIB Arrixaca, Campus Espinardo, Murcia 30100, Spain
[2] Univ Murcia, Hosp Univ Morales Meseguer, Ctr Reg Hemodonac, Serv Hematol & Oncol Med,IMIB Arrixaca,CIBERER, Ronda Garay S-N, Murcia 30003, Spain
[3] Univ Salamanca, Univ Hosp Salamanca, Dept Hematol, Inst Invest Biomed IBSAL,Dept Med,Canc Res Ctr IBM, Salamanca, Spain
[4] Univ Cambridge, Dept Haematol, Cambridge Biomed Campus, Cambridge CB2 0PT, England
[5] Cambridge Univ Hosp NHS Fdn, NIHR BioResource, Cambridge Biomed Campus, Cambridge CB2 0QQ, England
关键词
Nanopore; Structural variant; Third-generation sequencing; SERPINC1; STRUCTURAL VARIATION; HYBRIDIZATION; BROWSER;
D O I
10.1016/j.jare.2022.10.012
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Introduction: Whole-genome sequencing using nanopore technologies can uncover structural variants, which are DNA rearrangements larger than 50 base pairs. Nanopore technologies can also characterize their boundaries with single-base accuracy, owing to the kilobase-long reads that encompass either full variants or their junctions. Other methods, such as next-generation short read sequencing or PCR assays, are limited in their capabilities to detect or characterize structural variants. However, the existing software for nanopore sequencing data analysis still reports incomplete variant sets, which also contain erroneous calls, a considerable obstacle for the molecular diagnosis or accurate genotyping of populations. Methods: We compared multiple factors affecting variant calling, such as reference genome version, aligner (minimap2, NGMLR, and lra) choice, and variant caller combinations (Sniffles, CuteSV, SVIM, and NanoVar), to find the optimal group of tools for calling large (>50 kb) deletions and duplications, using data from seven patients exhibiting gross gene defects on SERPINC1 and from a reference variant set as the control. The goal was to obtain the most complete, yet reasonably specific group of large variants using a single cell of PromethION sequencing, which yielded lower depth coverage than short-read sequencing. We also used a custom method for the statistical analysis of the coverage value to refine the resulting datasets.Results: We found that for large deletions and duplications (>50 kb), the existing software performed worse than for smaller ones, in terms of both sensitivity and specificity, and newer tools had not improved this. Our novel software, disCoverage, could polish variant callers' results, improving specificity by up to 62% and sensitivity by 15%, the latter requiring other data or samples.Conclusion: We analyzed the current situation of >50-kb copy number variants with nanopore sequencing, which could be improved. The methods presented in this work could help to identify the known deletions and duplications in a set of patients, while also helping to filter out erroneous calls for these variants, which might aid the efforts to characterize a not-yet well-known fraction of genetic variability in the human genome.& COPY; 2023 The Authors. Published by Elsevier B.V. on behalf of Cairo University. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
引用
收藏
页码:145 / 158
页数:14
相关论文
共 60 条
  • [1] Opportunities and challenges in long-read sequencing data analysis
    Amarasinghe, Shanika L.
    Su, Shian
    Dong, Xueyi
    Zappia, Luke
    Ritchie, Matthew E.
    Gouil, Quentin
    [J]. GENOME BIOLOGY, 2020, 21 (01)
  • [2] Comparative genomic hybridization using oligonucleotide microarrays and total genomic DNA
    Barrett, MT
    Scheffer, A
    Ben-Dor, A
    Sampas, N
    Lipson, D
    Kincaid, R
    Tsang, P
    Curry, B
    Baird, K
    Meltzer, PS
    Yakhini, Z
    Bruhn, L
    Laderman, S
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (51) : 17765 - 17770
  • [3] Barrett Tyson, 2024, CRAN
  • [4] Long-read sequencing of 3,622 Icelanders provides insight into the role of structural variants in human diseases and other traits
    Beyter, Doruk
    Ingimundardottir, Helga
    Oddsson, Asmundur
    Eggertsson, Hannes P.
    Bjornsson, Eythor
    Jonsson, Hakon
    Atlason, Bjarni A.
    Kristmundsdottir, Snaedis
    Mehringer, Svenja
    Hardarson, Marteinn T.
    Gudjonsson, Sigurjon A.
    Magnusdottir, Droplaug N.
    Jonasdottir, Aslaug
    Jonasdottir, Adalbjorg
    Kristjansson, Ragnar P.
    Sverrisson, Sverrir T.
    Holley, Guillaume
    Palsson, Gunnar
    Stefansson, Olafur A.
    Eyjolfsson, Gudmundur
    Olafsson, Isleifur
    Sigurdardottir, Olof
    Torfason, Bjarni
    Masson, Gisli
    Helgason, Agnar
    Thorsteinsdottir, Unnur
    Holm, Hilma
    Gudbjartsson, Daniel F.
    Sulem, Patrick
    Magnusson, Olafur T.
    Halldorsson, Bjarni, V
    Stefansson, Kari
    [J]. NATURE GENETICS, 2021, 53 (06) : 779 - +
  • [5] Sequencing of human genomes with nanopore technology
    Bowden, Rory
    Davies, Robert W.
    Heger, Andreas
    Pagnamenta, Alistair T.
    de Cesare, Mariateresa
    Oikkonen, Laura E.
    Parkes, Duncan
    Freeman, Colin
    Dhalla, Fatima
    Patel, Smita Y.
    Popitsch, Niko
    Ip, Camilla L. C.
    Roberts, Hannah E.
    Salatino, Silvia
    Lockstone, Helen
    Lunter, Gerton
    Taylor, Jenny C.
    Buck, David
    Simpson, Michael A.
    Donnelly, Peter
    [J]. NATURE COMMUNICATIONS, 2019, 10 (1)
  • [6] Mechanisms underlying structural variant formation in genomic disorders
    Carvalho, Claudia M. B.
    Lupski, James R.
    [J]. NATURE REVIEWS GENETICS, 2016, 17 (04) : 224 - 238
  • [7] Ceulemans S, 2012, METHODS MOL BIOL, V838, P311, DOI 10.1007/978-1-61779-507-7_15
  • [8] Multi-platform discovery of haplotype-resolved structural variation in human genomes
    Chaisson, Mark J. P.
    Sanders, Ashley D.
    Zhao, Xuefang
    Malhotra, Ankit
    Porubsky, David
    Rausch, Tobias
    Gardner, Eugene J.
    Rodriguez, Oscar L.
    Guo, Li
    Collins, Ryan L.
    Fan, Xian
    Wen, Jia
    Handsaker, Robert E.
    Fairley, Susan
    Kronenberg, Zev N.
    Kong, Xiangmeng
    Hormozdiari, Fereydoun
    Lee, Dillon
    Wenger, Aaron M.
    Hastie, Alex R.
    Antaki, Danny
    Anantharaman, Thomas
    Audano, Peter A.
    Brand, Harrison
    Cantsilieris, Stuart
    Cao, Han
    Cerveira, Eliza
    Chen, Chong
    Chen, Xintong
    Chin, Chen-Shan
    Chong, Zechen
    Chuang, Nelson T.
    Lambert, Christine C.
    Church, Deanna M.
    Clarke, Laura
    Farrell, Andrew
    Flores, Joey
    Galeev, Timur
    Gorkin, David U.
    Gujral, Madhusudan
    Guryev, Victor
    Heaton, William Haynes
    Korlach, Jonas
    Kumar, Sushant
    Kwon, Jee Young
    Lam, Ernest T.
    Lee, Jong Eun
    Lee, Joyce
    Lee, Wan-Ping
    Lee, Sau Peng
    [J]. NATURE COMMUNICATIONS, 2019, 10 (1)
  • [9] Chaisson MJP, 2019, DBVAR
  • [10] Chan S, 2018, METHODS MOL BIOL, V1833, P193, DOI 10.1007/978-1-4939-8666-8_16