Adaptive Savitzky-Golay Filters for Analysis of Copy Number Variation Peaks from Whole-Exome Sequencing Data

被引:6
作者
Ochieng, Peter Juma [1 ]
Maroti, Zoltan [2 ,3 ]
Dombi, Jozsef [1 ]
Kresz, Miklos [4 ,5 ,6 ]
Bekesi, Jozsef [1 ]
Kalmar, Tibor [2 ,3 ]
机构
[1] Univ Szeged, Inst Informat, 2 Arpad Ter, H-6720 Szeged, Hungary
[2] Univ Szeged, Albert Szent Gyorgy Hlth Ctr, Dept Pediat, H-6725 Szeged, Hungary
[3] Univ Szeged, Pediat Hlth Ctr, H-6725 Szeged, Hungary
[4] InnoRenew CoE, Livade 6, Izola 6310, Slovenia
[5] Univ Primorska, Andrej Marusic Inst, Muzejski Trg 2, Koper 6000, Slovenia
[6] Univ Szeged, Dept Appl Informat, Boldogasszony Sgt 6, H-6725 Szeged, Hungary
关键词
copy number variation; read depth; adaptive Savitzky-Golay;
D O I
10.3390/info14020128
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Copy number variation (CNV) is a form of structural variation in the human genome that provides medical insight into complex human diseases; while whole-genome sequencing is becoming more affordable, whole-exome sequencing (WES) remains an important tool in clinical diagnostics. Because of its discontinuous nature and unique characteristics of sparse target-enrichment-based WES data, the analysis and detection of CNV peaks remain difficult tasks. The Savitzky-Golay (SG) smoothing is well known as a fast and efficient smoothing method. However, no study has documented the use of this technique for CNV peak detection. It is well known that the effectiveness of the classical SG filter depends on the proper selection of the window length and polynomial degree, which should correspond with the scale of the peak because, in the case of peaks with a high rate of change, the effectiveness of the filter could be restricted. Based on the Savitzky-Golay algorithm, this paper introduces a novel adaptive method to smooth irregular peak distributions. The proposed method ensures high-precision noise reduction by dynamically modifying the results of the prior smoothing to automatically adjust parameters. Our method offers an additional feature extraction technique based on density and Euclidean distance. In comparison to classical Savitzky-Golay filtering and other peer filtering methods, the performance evaluation demonstrates that adaptive Savitzky-Golay filtering performs better. According to experimental results, our method effectively detects CNV peaks across all genomic segments for both short and long tags, with minimal peak height fidelity values (i.e., low estimation bias). As a result, we clearly demonstrate how well the adaptive Savitzky-Golay filtering method works and how its use in the detection of CNV peaks can complement the existing techniques used in CNV peak analysis.
引用
收藏
页数:21
相关论文
共 45 条
  • [31] Whole-genome sequencing of copy number variation analysis in Ethiopian cattle reveals adaptations to diverse environments
    Ayalew, Wondossen
    Wu, Xiaoyun
    Tarekegn, Getinet Mekuriaw
    Tessema, Tesfaye Sisay
    Chu, Min
    Liang, Chunnian
    Naboulsi, Rakan
    Van Damme, Renaud
    Bongcam-Rudloff, Erik
    Ping, Yan
    BMC GENOMICS, 2024, 25 (01):
  • [32] dpGMM: A Dirichlet Process Gaussian Mixture Model for Copy Number Variation Detection in Low-Coverage Whole-Genome Sequencing Data
    Li, Yaoyao
    Zhang, Junying
    Yuan, Xiguo
    Li, Junping
    IEEE ACCESS, 2020, 8 : 27973 - 27985
  • [33] A shortest path-based approach for copy number variation detection from next-generation sequencing data
    Liu, Guojun
    Yang, Hongzhi
    Yuan, Xiguo
    FRONTIERS IN GENETICS, 2023, 13
  • [34] ClinSV: clinical grade structural and copy number variant detection from whole genome sequencing data
    Minoche, Andre E.
    Lundie, Ben
    Peters, Greg B.
    Ohnesorg, Thomas
    Pinese, Mark
    Thomas, David M.
    Zankl, Andreas
    Roscioli, Tony
    Schonrock, Nicole
    Kummerfeld, Sarah
    Burnett, Leslie
    Dinger, Marcel E.
    Cowley, Mark J.
    GENOME MEDICINE, 2021, 13 (01)
  • [35] ClinSV: clinical grade structural and copy number variant detection from whole genome sequencing data
    Andre E. Minoche
    Ben Lundie
    Greg B. Peters
    Thomas Ohnesorg
    Mark Pinese
    David M. Thomas
    Andreas Zankl
    Tony Roscioli
    Nicole Schonrock
    Sarah Kummerfeld
    Leslie Burnett
    Marcel E. Dinger
    Mark J. Cowley
    Genome Medicine, 13
  • [36] Copy number and sequence variation in rDNA of Daphnia pulex from natural populations: insights from whole-genome sequencing
    Elguweidi, Abir
    Crease, Teresa
    G3-GENES GENOMES GENETICS, 2024, 14 (07):
  • [37] SCCNV: A Software Tool for Identifying Copy Number Variation From Single-Cell Whole-Genome Sequencing
    Dong, Xiao
    Zhang, Lei
    Hao, Xiaoxiao
    Wang, Tao
    Vijg, Jan
    FRONTIERS IN GENETICS, 2020, 11
  • [38] Similar genomic proportions of copy number variation within gray wolves and modern dog breeds inferred from whole genome sequencing
    Serres-Armero, Aitor
    Povolotskaya, Inna S.
    Quilez, Javier
    Ramirez, Oscar
    Santpere, Gabriel
    Kuderna, Lukas F. K.
    Hernandez-Rodriguez, Jessica
    Fernandez-Callejo, Marcos
    Gomez-Sanchez, Daniel
    Freedman, Adam H.
    Fan, Zhenxin
    Novembre, John
    Navarro, Arcadi
    Boyko, Adam
    Wayne, Robert
    Vila, Carles
    Lorente-Galdos, Belen
    Marques-Bonet, Tomas
    BMC GENOMICS, 2017, 18
  • [39] Similar genomic proportions of copy number variation within gray wolves and modern dog breeds inferred from whole genome sequencing
    Aitor Serres-Armero
    Inna S. Povolotskaya
    Javier Quilez
    Oscar Ramirez
    Gabriel Santpere
    Lukas F. K. Kuderna
    Jessica Hernandez-Rodriguez
    Marcos Fernandez-Callejo
    Daniel Gomez-Sanchez
    Adam H. Freedman
    Zhenxin Fan
    John Novembre
    Arcadi Navarro
    Adam Boyko
    Robert Wayne
    Carles Vilà
    Belen Lorente-Galdos
    Tomas Marques-Bonet
    BMC Genomics, 18
  • [40] BagGMM: Calling copy number variation by bagging multiple Gaussian mixture models from tumor and matched normal next-generation sequencing data
    Li, Yaoyao
    Zhang, Junying
    Yuan, Xiguo
    DIGITAL SIGNAL PROCESSING, 2019, 88 : 90 - 100