Cnngeno: A high-precision deep learning based strategy for the calling of structural variation genotype

被引:4
作者
Bai, Ruofei [1 ]
Ling, Cheng [1 ]
Cai, Lei [1 ]
Gao, Jingyang [1 ]
机构
[1] Beijing Univ Chem Technol, Dept Comp Sci & Technol, Beijing, Peoples R China
基金
北京市自然科学基金;
关键词
Genotype calling; Structural variations; Next-generation data; Convolutional neural network; Bootstrapping strategy; PAIRED-END; DISCOVERY;
D O I
10.1016/j.compbiolchem.2020.107417
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Genotype plays a significant role in determining characteristics in an organism and genotype calling has been greatly accelerated by sequencing technologies. Furthermore, most parametric statistical models are unable to effectively call genotype, which is influenced by the size of structural variations and the coverage fluctuations of sequencing data. In this study, we propose a new method for calling deletions' genotypes from the nextgeneration data, called Cnngeno. Cnngeno can convert sequencing data into images and classifies the genotypes from these images using the convolutional neural network(CNN). Moreover, Cnngeno adopted the convolutional bootstrapping strategy to improve the anti-noisy label's ability. The results show that Cnngeno performs better in terms of precision for calling genotype when compared with other existing methods. The Cnngeno is an open-source method, available at https://github.com/BRF123/Cnngeno.
引用
收藏
页数:6
相关论文
共 14 条
[1]   CNVnator: An approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing [J].
Abyzov, Alexej ;
Urban, Alexander E. ;
Snyder, Michael ;
Gerstein, Mark .
GENOME RESEARCH, 2011, 21 (06) :974-984
[2]   Dindel: Accurate indel calls from short-read data [J].
Albers, Cornelis A. ;
Lunter, Gerton ;
MacArthur, Daniel G. ;
McVean, Gilean ;
Ouwehand, Willem H. ;
Durbin, Richard .
GENOME RESEARCH, 2011, 21 (06) :961-973
[3]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[4]   Concod: an effective integration framework of consensus-based calling deletions from next-generation sequencing data [J].
Cai, Lei ;
Chu, Chong ;
Zhang, Xiaodong ;
Wu, Yufeng ;
Gao, Jingyang .
INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2017, 17 (02) :153-172
[5]  
Chiang C, 2015, NAT METHODS, V12, P966, DOI [10.1038/nmeth.3505, 10.1038/NMETH.3505]
[6]   GINDEL: Accurate Genotype Calling of Insertions and Deletions from Low Coverage Population Sequence Reads [J].
Chu, Chong ;
Zhang, Jin ;
Wu, Yufeng .
PLOS ONE, 2014, 9 (11)
[7]  
Deep S.V, 2020, BIORXIV PREPRINT, DOI [10.1101/561357, DOI 10.1101/561357]
[8]   BOOTSTRAPPING REGRESSION-MODELS [J].
FREEDMAN, DA .
ANNALS OF STATISTICS, 1981, 9 (06) :1218-1228
[9]   Discovery and genotyping of genome structural polymorphism by sequencing on a population scale [J].
Handsaker, Robert E. ;
Korn, Joshua M. ;
Nemesh, James ;
McCarroll, Steven A. .
NATURE GENETICS, 2011, 43 (03) :269-U126
[10]   LUMPY: a probabilistic framework for structural variant discovery [J].
Layer, Ryan M. ;
Chiang, Colby ;
Quinlan, Aaron R. ;
Hall, Ira M. .
GENOME BIOLOGY, 2014, 15 (06)