Identification of Low-Confidence Regions in the Pig Reference Genome (Sscrofa 10.2)

被引:22
作者
Warr, Amanda [1 ,2 ]
Robert, Christelle [1 ,2 ]
Hume, David [1 ,2 ]
Archibald, Alan L. [1 ,2 ]
Deeb, Nader [3 ]
Watson, Mick [1 ,2 ]
机构
[1] Univ Edinburgh, Roslin Inst, Div Genet & Genom, Edinburgh, Midlothian, Scotland
[2] Univ Edinburgh, Royal Dick Sch Vet Studies, Edinburgh EH9 1QH, Midlothian, Scotland
[3] Genus Plc, Hendersonville, TN USA
基金
英国生物技术与生命科学研究理事会;
关键词
COPY NUMBER VARIATION; ADAPTATION; VARIANTS; DBSNP;
D O I
10.3389/fgene.2015.00338
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Many applications of high throughput sequencing rely on the availability of an accurate reference genome. Variant calling often produces large data sets that cannot be realistically validated and which may contain large numbers of false-positives. Errors in the reference assembly increase the number of false positives. While resources are available to aid in the filtering of variants from human data, for other species these do not yet exist and strict filtering techniques must be employed which are more likely to exclude true-positives. This work assesses the accuracy of the pig reference genome (Sscrofa10.2) using whole genome sequencing reads from the Duroc sow whose genome the assembly was based on. Indicators of structural variation including high regional coverage, unexpected insert sizes, improper pairing and homozygous variants were used to identify low quality (LQ) regions of the assembly. Low coverage (LC) regions were also identified and analyzed separately. The LQ regions covered 13.85% of the genome, the LC regions covered 26.6% of the genome and combined (LQLC) they covered 33.07% of the genome. Over half of dbSNP variants were located in the LQLC regions. Of copy number variable regions identified in a previous study, 86.3% were located in the LQLC regions. The regions were also enriched for gene predictions from RNA-seq data with 42.98% falling in the LQLC regions. Excluding variants in the LQ, LC, or LQLC from future analyses will help reduce the number of false-positive variant calls. Researchers using WGS data should be aware that the current pig reference genome does not give an accurate representation of the copy number of alleles in the original Duroc sow's genome.
引用
收藏
页数:8
相关论文
共 30 条
[1]   Adaptation and possible ancient interspecies introgression in pigs identified by whole-genome sequencing [J].
Ai, Huashui ;
Fang, Xiaodong ;
Yang, Bin ;
Huang, Zhiyong ;
Chen, Hao ;
Mao, Likai ;
Zhang, Feng ;
Zhang, Lu ;
Cui, Leilei ;
He, Weiming ;
Yang, Jie ;
Yao, Xiaoming ;
Zhou, Lisheng ;
Han, Lijuan ;
Li, Jing ;
Sun, Silong ;
Xie, Xianhua ;
Lai, Boxian ;
Su, Ying ;
Lu, Yao ;
Yang, Hui ;
Huang, Tao ;
Deng, Wenjiang ;
Nielsen, Rasmus ;
Ren, Jun ;
Huang, Lusheng .
NATURE GENETICS, 2015, 47 (03) :217-+
[2]   Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries [J].
Aird, Daniel ;
Ross, Michael G. ;
Chen, Wei-Sheng ;
Danielsson, Maxwell ;
Fennell, Timothy ;
Russ, Carsten ;
Jaffe, David B. ;
Nusbaum, Chad ;
Gnirke, Andreas .
GENOME BIOLOGY, 2011, 12 (02)
[3]   Whole-genome sequencing is more powerful than whole-exome sequencing for detecting exome variants [J].
Belkadi, Aziz ;
Bolze, Alexandre ;
Itan, Yuval ;
Cobat, Aurelie ;
Vincent, Quentin B. ;
Antipenko, Alexander ;
Shang, Lei ;
Boisson, Bertrand ;
Casanova, Jean-Laurent ;
Abel, Laurent .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2015, 112 (17) :5473-5478
[4]   A Deep Catalog of Autosomal Single Nucleotide Variation in the Pig [J].
Bianco, Erica ;
Nevado, Bruno ;
Ramos-Onsins, Sebastian E. ;
Perez-Enciso, Miguel .
PLOS ONE, 2015, 10 (03)
[5]   A comprehensive survey of copy number variation in 18 diverse pig populations and identification of candidate copy number variable genes associated with complex traits [J].
Chen, Congying ;
Qiao, Ruimin ;
Wei, Rongxing ;
Guo, Yuanmei ;
Ai, Huashui ;
Ma, Junwu ;
Ren, Jun ;
Huang, Lusheng .
BMC GENOMICS, 2012, 13
[6]   High-resolution mapping of copy-number alterations with massively parallel sequencing [J].
Chiang, Derek Y. ;
Getz, Gad ;
Jaffe, David B. ;
O'Kelly, Michael J. T. ;
Zhao, Xiaojun ;
Carter, Scott L. ;
Russ, Carsten ;
Nusbaum, Chad ;
Meyerson, Matthew ;
Lander, Eric S. .
NATURE METHODS, 2009, 6 (01) :99-103
[7]   Ensembl 2015 [J].
Cunningham, Fiona ;
Amode, M. Ridwan ;
Barrell, Daniel ;
Beal, Kathryn ;
Billis, Konstantinos ;
Brent, Simon ;
Carvalho-Silva, Denise ;
Clapham, Peter ;
Coates, Guy ;
Fitzgerald, Stephen ;
Gil, Laurent ;
Giron, Carlos Garcia ;
Gordon, Leo ;
Hourlier, Thibaut ;
Hunt, Sarah E. ;
Janacek, Sophie H. ;
Johnson, Nathan ;
Juettemann, Thomas ;
Kaehaeri, Andreas K. ;
Keenan, Stephen ;
Martin, Fergal J. ;
Maurel, Thomas ;
McLaren, William ;
Murphy, Daniel N. ;
Nag, Rishi ;
Overduin, Bert ;
Parker, Anne ;
Patricio, Mateus ;
Perry, Emily ;
Pignatelli, Miguel ;
Riat, Harpreet Singh ;
Sheppard, Daniel ;
Taylor, Kieron ;
Thormann, Anja ;
Vullo, Alessandro ;
Wilder, Steven P. ;
Zadissa, Amonida ;
Aken, Bronwen L. ;
Birney, Ewan ;
Harrow, Jennifer ;
Kinsella, Rhoda ;
Muffato, Matthieu ;
Ruffier, Magali ;
Searle, Stephen M. J. ;
Spudich, Giulietta ;
Trevanion, Stephen J. ;
Yates, Andy ;
Zerbino, Daniel R. ;
Flicek, Paul .
NUCLEIC ACIDS RESEARCH, 2015, 43 (D1) :D662-D669
[8]   Analyses of pig genomes provide insight into porcine demography and evolution [J].
Groenen, Martien A. M. ;
Archibald, Alan L. ;
Uenishi, Hirohide ;
Tuggle, Christopher K. ;
Takeuchi, Yasuhiro ;
Rothschild, Max F. ;
Rogel-Gaillard, Claire ;
Park, Chankyu ;
Milan, Denis ;
Megens, Hendrik-Jan ;
Li, Shengting ;
Larkin, Denis M. ;
Kim, Heebal ;
Frantz, Laurent A. F. ;
Caccamo, Mario ;
Ahn, Hyeonju ;
Aken, Bronwen L. ;
Anselmo, Anna ;
Anthon, Christian ;
Auvil, Loretta ;
Badaoui, Bouabid ;
Beattie, Craig W. ;
Bendixen, Christian ;
Berman, Daniel ;
Blecha, Frank ;
Blomberg, Jonas ;
Bolund, Lars ;
Bosse, Mirte ;
Botti, Sara ;
Zhan Bujie ;
Bystrom, Megan ;
Capitanu, Boris ;
Carvalho-Silva, Denise ;
Chardon, Patrick ;
Chen, Celine ;
Cheng, Ryan ;
Choi, Sang-Haeng ;
Chow, William ;
Clark, Richard C. ;
Clee, Christopher ;
Crooijmans, Richard P. M. A. ;
Dawson, Harry D. ;
Dehais, Patrice ;
De Sapio, Fioravante ;
Dibbits, Bert ;
Drou, Nizar ;
Du, Zhi-Qiang ;
Eversole, Kellye ;
Fadista, Joao ;
Fairley, Susan .
NATURE, 2012, 491 (7424) :393-398
[9]   Large-scale whole-genome sequencing of the Icelandic population [J].
Gudbjartsson, Daniel F. ;
Helgason, Hannes ;
Gudjonsson, Sigurjon A. ;
Zink, Florian ;
Oddson, Asmundur ;
Gylfason, Arnaldur ;
Besenbacher, Soren ;
Magnusson, Gisli ;
Halldorsson, Bjarni V. ;
Hjartarson, Eirikur ;
Sigurdsson, Gunnar Th ;
Stacey, Simon N. ;
Frigge, Michael L. ;
Holm, Hilma ;
Saemundsdottir, Jona ;
Helgadottir, Hafdis Th ;
Johannsdottir, Hrefna ;
Sigfusson, Gunnlaugur ;
Thorgeirsson, Gudmundur ;
Sverrisson, Jon Th ;
Gretarsdottir, Solveig ;
Walters, G. Bragi ;
Rafnar, Thorunn ;
Thjodleifsson, Bjarni ;
Bjornsson, Einar S. ;
Olafsson, Sigurdur ;
Thorarinsdottir, Hildur ;
Steingrimsdottir, Thora ;
Gudmundsdottir, Thora S. ;
Theodors, Asgeir ;
Jonasson, Jon G. ;
Sigurdsson, Asgeir ;
Bjornsdottir, Gyda ;
Jonsson, Jon J. ;
Thorarensen, Olafur ;
Ludvigsson, Petur ;
Gudbjartsson, Hakon ;
Eyjolfsson, Gudmundur I. ;
Sigurdardottir, Olof ;
Olafsson, Isleifur ;
Arnar, David O. ;
Magnusson, Olafur Th ;
Kong, Augustine ;
Masson, Gisli ;
Thorsteinsdottir, Unnur ;
Helgason, Agnar ;
Sulem, Patrick ;
Stefansson, Kari .
NATURE GENETICS, 2015, 47 (05) :435-U20
[10]   The UCSC Table Browser data retrieval tool [J].
Karolchik, D ;
Hinrichs, AS ;
Furey, TS ;
Roskin, KM ;
Sugnet, CW ;
Haussler, D ;
Kent, WJ .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D493-D496