PGG.Han: the Han Chinese genome database and analysis platform

被引:61
作者
Gao, Yang [1 ,2 ]
Zhang, Chao [1 ]
Yuan, Liyun [1 ]
Ling, YunChao [1 ]
Wang, Xiaoji [1 ]
Liu, Chang [1 ]
Pan, Yuwen [1 ]
Zhang, Xiaoxi [1 ,2 ]
Ma, Xixian [1 ]
Wang, Yuchen [1 ]
Lu, Yan [1 ,3 ]
Yuan, Kai [1 ]
Ye, Wei [1 ]
Qian, Jiaqiang [1 ]
Chang, Huidan [1 ]
Cao, Ruifang [1 ]
Yang, Xiao [1 ]
Ma, Ling [1 ]
Ju, Yuanhu [1 ]
Dai, Long [1 ]
Tang, Yuanyuan [1 ]
Zhang, Guoqing [1 ]
Xu, Shuhua [1 ,2 ,3 ,4 ]
机构
[1] Univ Chinese Acad Sci, Chinese Acad Sci,Key Lab Computat Biol, Biomed Big Data Ctr,CAS MPG Partner Inst Computat, Shanghai Inst Nutr & Hlth,Shanghai Inst Biol Sci, Shanghai 200031, Peoples R China
[2] ShanghaiTech Univ, Sch Life Sci & Technol, Shanghai 201210, Peoples R China
[3] Collaborat Innovat Ctr Genet & Dev, Shanghai 200438, Peoples R China
[4] Chinese Acad Sci, Ctr Excellence Anim Evolut & Genet, Kunming 650223, Yunnan, Peoples R China
基金
中国国家自然科学基金;
关键词
POPULATION; VARIANTS; ANCESTRY; PANEL; RARE;
D O I
10.1093/nar/gkz829
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
As the largest ethnic group in the world, the Han Chinese population is nonetheless underrepresented in global efforts to catalogue the genomic variability of natural populations. Here, we developed the PGG.Han, a population genome database to serve as the central repository for the genomic data of the Han Chinese Genome Initiative (Phase I). In its current version, the PGG.Han archives whole-genome sequences or high-density genome-wide single-nucleotide variants (SNVs) of 114 783 Han Chinese individuals (a.k.a. the Han100K), representing geographical sub-populations covering 33 of the 34 administrative divisions of China, as well as Singapore. The PGG.Han provides: (i) an interactive interface for visualization of the fine-scale genetic structure of the Han Chinese population; (ii) genome-wide allele frequencies of hierarchical sub-populations; (iii) ancestry inference for individual samples and controlling population stratification based on nested ancestry informative markers (AIMs) panels; (iv) populationstructure-aware shared control data for genotypephenotype association studies (e.g. GWASs) and (v) a Han-Chinese-specific reference panel for genotype imputation. Computational tools are implemented into the PGG.Han, and an online user-friendly interface is provided for data analysis and results visualization. The PGG.Han database is freely accessible via http://www.pgghan.org or https://www. hanchinesegenomes.org.
引用
收藏
页码:D971 / D976
页数:6
相关论文
共 44 条
[1]   FlashPCA2: principal component analysis of Biobank-scale genotype datasets [J].
Abraham, Gad ;
Qiu, Yixuan ;
Inouye, Michael .
BIOINFORMATICS, 2017, 33 (17) :2776-2778
[2]   Fast model-based estimation of ancestry in unrelated individuals [J].
Alexander, David H. ;
Novembre, John ;
Lange, Kenneth .
GENOME RESEARCH, 2009, 19 (09) :1655-1664
[3]   A global reference for human genetic variation [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Wang, Jun ;
Wilson, Richard K. ;
Boerwinkle, Eric ;
Doddapaneni, Harsha ;
Han, Yi ;
Korchina, Viktoriya ;
Kovar, Christie ;
Lee, Sandra ;
Muzny, Donna ;
Reid, Jeffrey G. ;
Zhu, Yiming ;
Chang, Yuqi ;
Feng, Qiang ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Lan, Tianming ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Liu, Shengmao ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Tang, Meifang ;
Wang, Bo .
NATURE, 2015, 526 (7571) :68-+
[4]  
Bergstrom A., 2019, INSIGHTS HUMAN GENET, DOI [10.1101/674986, DOI 10.1101/674986]
[5]   Analysis commons, a team approach to discovery in a big-data environment for genetic epidemiology [J].
Brody, Jennifer A. ;
Morrison, Alanna C. ;
Bis, Joshua C. ;
O'Connell, Jeffrey R. ;
Brown, Michael R. ;
Huffman, Jennifer E. ;
Ames, Darren C. ;
Carroll, Andrew ;
Conomos, Matthew P. ;
Gabriel, Stacey ;
Gibbs, Richard A. ;
Gogarten, Stephanie M. ;
Gupta, Namrata ;
Jaquish, Cashell E. ;
Johnson, Andrew D. ;
Lewis, Joshua P. ;
Liu, Xiaoming ;
Manning, Alisa K. ;
Papanicolaou, George J. ;
Pitsillides, Achilleas N. ;
Rice, Kenneth M. ;
Salerno, William ;
Sitlani, Colleen M. ;
Smith, Nicholas L. ;
Heckbert, Susan R. ;
Laurie, Cathy C. ;
Mitchell, Braxton D. ;
Vasan, Ramachandran S. ;
Rich, Stephen S. ;
Rotter, Jerome I. ;
Wilson, James G. ;
Boerwinkle, Eric ;
Psaty, Bruce M. ;
Cupples, L. Adrienne .
NATURE GENETICS, 2017, 49 (11) :1560-1563
[6]   A One-Penny Imputed Genome from Next-Generation Reference Panels [J].
Browning, Brian L. ;
Zhou, Ying ;
Browning, Sharon R. .
AMERICAN JOURNAL OF HUMAN GENETICS, 2018, 103 (03) :338-348
[7]   RETRACTED: Data Descriptor: 11,670 whole-genome sequences representative of the Han Chinese population from the CONVERGE project (Retracted Article) [J].
Cai, Na ;
Bigdeli, Tim B. ;
Kretzschmar, Warren W. ;
Li, Yihan ;
Liang, Jieqin ;
Hu, Jingchu ;
Peterson, Roseann E. ;
Bacanu, Silviu ;
Webb, Bradley Todd ;
Riley, Brien ;
Li, Qibin ;
Marchini, Jonathan ;
Mott, Richard ;
Kendler, Kenneth S. ;
Flint, Jonathan .
SCIENTIFIC DATA, 2017, 4
[8]   Second-generation PLINK: rising to the challenge of larger and richer datasets [J].
Chang, Christopher C. ;
Chow, Carson C. ;
Tellier, Laurent C. A. M. ;
Vattikuti, Shashaank ;
Purcell, Shaun M. ;
Lee, James J. .
GIGASCIENCE, 2015, 4
[9]   Genetic Structure of the Han Chinese Population Revealed by Genome-wide SNP Variation [J].
Chen, Jieming ;
Zheng, Houfeng ;
Bei, Jin-Xin ;
Sun, Liangdan ;
Jia, Wei-hua ;
Li, Tao ;
Zhang, Furen ;
Seielstad, Mark ;
Zeng, Yi-Xin ;
Zhang, Xuejun ;
Liu, Jianjun .
AMERICAN JOURNAL OF HUMAN GENETICS, 2009, 85 (06) :775-785
[10]   Next-generation genotype imputation service and methods [J].
Das, Sayantan ;
Forer, Lukas ;
Schoenherr, Sebastian ;
Sidore, Carlo ;
Locke, Adam E. ;
Kwong, Alan ;
Vrieze, Scott I. ;
Chew, Emily Y. ;
Levy, Shawn ;
McGue, Matt ;
Schlessinger, David ;
Stambolian, Dwight ;
Loh, Po-Ru ;
Iacono, William G. ;
Swaroop, Anand ;
Scott, Laura J. ;
Cucca, Francesco ;
Kronenberg, Florian ;
Boehnke, Michael ;
Abecasis, Goncalo R. ;
Fuchsberger, Christian .
NATURE GENETICS, 2016, 48 (10) :1284-1287