High-Accuracy HLA Type Inference from Whole-Genome Sequencing Data Using Population Reference Graphs

被引:65
作者
Dilthey, Alexander T. [1 ,2 ]
Gourraud, Pierre-Antoine [3 ,4 ]
Mentzer, Alexander J. [1 ]
Cereb, Nezih [5 ]
Iqbal, Zamin [1 ]
McVean, Gil [1 ,6 ]
机构
[1] Univ Oxford, Wellcome Trust Ctr Human Genet, Oxford, England
[2] NHGRI, NIH, Bethesda, MD 20892 USA
[3] UCSF, Dept Neurol, San Francisco, CA USA
[4] Univ Nantes, Nantes Univ Hosp, INSERM, Unit ATIP 1064,Avenir Team 6, Nantes, France
[5] Histogenetics, Ossining, NY USA
[6] Univ Oxford, Li Ka Shing Ctr Hlth Informat & Discovery, Oxford, England
基金
欧洲研究理事会; 英国惠康基金;
关键词
HIGH-RESOLUTION HLA; CLASS-I; SUSCEPTIBILITY;
D O I
10.1371/journal.pcbi.1005151
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Genetic variation at the Human Leucocyte Antigen (HLA) genes is associated with many autoimmune and infectious disease phenotypes, is an important element of the immunological distinction between self and non-self, and shapes immune epitope repertoires. Determining the allelic state of the HLA genes (HLA typing) as a by-product of standard whole-genome sequencing data would therefore be highly desirable and enable the immunogenetic characterization of samples in currently ongoing population sequencing projects. Extensive hyperpolymorphism and sequence similarity between the HLA genes, however, pose problems for accurate read mapping and make HLA type inference from whole-genome sequencing data a challenging problem. We describe how to address these challenges in a Population Reference Graph (PRG) framework. First, we construct a PRG for 46 (mostly HLA) genes and pseudogenes, their genomic context and their characterized sequence variants, integrating a database of over 10,000 known allele sequences. Second, we present a sequence-to-PRG paired-end read mapping algorithm that enables accurate read mapping for the HLA genes. Third, we infer the most likely pair of underlying alleles at G group resolution from the IMGT/HLA database at each locus, employing a simple likelihood framework. We show that HLA*PRG, our algorithm, outperforms existing methods by a wide margin. We evaluate HLA*PRG on six classical class I and class II HLA genes (HLA-A, -B, -C, -DQA1, -DQB1, -DRB1) and on a set of 14 samples (3 samples with 2 x 100bp, 11 samples with 2 x 250bp Illumina HiSeq data). Of 158 alleles tested, we correctly infer 157 alleles (99.4%). We also identify and re-type two erroneous alleles in the original validation data. We conclude that HLA*PRG for the first time achieves accuracies comparable to gold-standard reference methods from standard whole-genome sequencing data, though high computational demands (currently similar to 30-250 CPU hours per sample) remain a significant challenge to practical application.
引用
收藏
页数:16
相关论文
共 34 条
[1]   A haplotype map of the human genome [J].
Altshuler, D ;
Brooks, LD ;
Chakravarti, A ;
Collins, FS ;
Daly, MJ ;
Donnelly, P ;
Gibbs, RA ;
Belmont, JW ;
Boudreau, A ;
Leal, SM ;
Hardenbol, P ;
Pasternak, S ;
Wheeler, DA ;
Willis, TD ;
Yu, FL ;
Yang, HM ;
Zeng, CQ ;
Gao, Y ;
Hu, HR ;
Hu, WT ;
Li, CH ;
Lin, W ;
Liu, SQ ;
Pan, H ;
Tang, XL ;
Wang, J ;
Wang, W ;
Yu, J ;
Zhang, B ;
Zhang, QR ;
Zhao, HB ;
Zhao, H ;
Zhou, J ;
Gabriel, SB ;
Barry, R ;
Blumenstiel, B ;
Camargo, A ;
Defelice, M ;
Faggart, M ;
Goyette, M ;
Gupta, S ;
Moore, J ;
Nguyen, H ;
Onofrio, RC ;
Parkin, M ;
Roy, J ;
Stahl, E ;
Winchester, E ;
Ziaugra, L ;
Shen, Y .
NATURE, 2005, 437 (7063) :1299-1320
[2]   An integrated map of genetic variation from 1,092 human genomes [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Schmidt, Jeanette P. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Dinh, Huyen ;
Kovar, Christie ;
Lee, Sandra ;
Lewis, Lora ;
Muzny, Donna ;
Reid, Jeff ;
Wang, Min ;
Wang, Jun ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Li, Zhuo ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Su, Zhe ;
Tai, Shuaishuai ;
Tang, Meifang .
NATURE, 2012, 491 (7422) :56-65
[3]   Inference of high resolution HLA types using genome-wide RNA or DNA sequencing reads [J].
Bai, Yu ;
Ni, Min ;
Cooper, Blerta ;
Wei, Yi ;
Fury, Wen .
BMC GENOMICS, 2014, 15
[4]   Analysis of immune-related loci identifies 48 new susceptibility variants for multiple sclerosis [J].
Beecham, Ashley H. ;
Patsopoulos, Nikolaos A. ;
Xifara, Dionysia K. ;
Davis, Mary F. ;
Kemppinen, Anu ;
Cotsapas, Chris ;
Shah, Tejas S. ;
Spencer, Chris ;
Booth, David ;
Goris, An ;
Oturai, Annette ;
Saarela, Janna ;
Fontaine, Bertrand ;
Hemmer, Bernhard ;
Martin, Claes ;
Zipp, Frauke ;
D'Alfonso, Sandra ;
Martinelli-Boneschi, Filippo ;
Taylor, Bruce ;
Harbo, Hanne F. ;
Kockum, Ingrid ;
Hillert, Jan ;
Olsson, Tomas ;
Ban, Maria ;
Oksenberg, Jorge R. ;
Hintzen, Rogier ;
Barcellos, Lisa F. ;
Agliardi, Cristina ;
Alfredsson, Lars ;
Alizadeh, Mehdi ;
Anderson, Carl ;
Andrews, Robert ;
Sondergaard, Helle Bach ;
Baker, Amie ;
Band, Gavin ;
Baranzini, Sergio E. ;
Barizzone, Nadia ;
Barrett, Jeffrey ;
Bellenguez, Celine ;
Bergamaschi, Laura ;
Bernardinelli, Luisa ;
Berthele, Achim ;
Biberacher, Viola ;
Binder, Thomas M. C. ;
Blackburn, Hannah ;
Bomfim, Izaura L. ;
Brambilla, Paola ;
Broadley, Simon ;
Brochet, Bruno ;
Brundin, Lou .
NATURE GENETICS, 2013, 45 (11) :1353-+
[5]   Human genetic susceptibility to infectious disease [J].
Chapman, Stephen J. ;
Hill, Adrian V. S. .
NATURE REVIEWS GENETICS, 2012, 13 (03) :175-188
[6]   A high-resolution HLA and SNP haplotype map for disease association studies in the extended human MHC [J].
de Bakker, Paul I. W. ;
McVean, Gil ;
Sabeti, Pardis C. ;
Miretti, Marcos M. ;
Green, Todd ;
Marchini, Jonathan ;
Ke, Xiayi ;
Monsuur, Alienke J. ;
Whittaker, Pamela ;
Delgado, Marcos ;
Morrison, Jonathan ;
Richardson, Angela ;
Walsh, Emily C. ;
Gao, Xiaojiang ;
Galver, Luana ;
Hart, John ;
Hafler, David A. ;
Pericak-Vance, Margaret ;
Todd, John A. ;
Daly, Mark J. ;
Trowsdale, John ;
Wijmenga, Cisca ;
Vyse, Tim J. ;
Beck, Stephan ;
Murray, Sarah Shaw ;
Carrington, Mary ;
Gregory, Simon ;
Deloukas, Panos ;
Rioux, John D. .
NATURE GENETICS, 2006, 38 (10) :1166-1172
[7]   Improved genome inference in the MHC using a population reference graph [J].
Dilthey, Alexander ;
Cox, Charles ;
Iqbal, Zamin ;
Nelson, Matthew R. ;
McVean, Gil .
NATURE GENETICS, 2015, 47 (06) :682-688
[8]   Multi-Population Classical HLA Type Imputation [J].
Dilthey, Alexander ;
Leslie, Stephen ;
Moutsianas, Loukas ;
Shen, Judong ;
Cox, Charles ;
Nelson, Matthew R. ;
McVean, Gil .
PLOS COMPUTATIONAL BIOLOGY, 2013, 9 (02)
[9]   MUSCLE: multiple sequence alignment with high accuracy and high throughput [J].
Edgar, RC .
NUCLEIC ACIDS RESEARCH, 2004, 32 (05) :1792-1797
[10]   Next-generation sequencing for HLA typing of class I loci [J].
Erlich, Rachel L. ;
Jia, Xiaoming ;
Anderson, Scott ;
Banks, Eric ;
Gao, Xiaojiang ;
Carrington, Mary ;
Gupta, Namrata ;
DePristo, Mark A. ;
Henn, Matthew R. ;
Lennon, Niall J. ;
de Bakker, Paul I. W. .
BMC GENOMICS, 2011, 12