NyuWa Genome resource: A deep whole-genome sequencing-based variation profile and reference panel for the Chinese population

被引:48
作者
Zhang, Peng [1 ]
Li, Yanyan [1 ]
Luo, Huaxia [1 ,2 ]
Wang, You [1 ,2 ]
Wang, Jiajia [1 ,3 ]
Zheng, Yu [1 ,3 ]
Niu, Yiwei [1 ,3 ]
Shi, Yirong [1 ,4 ]
Zhou, Honghong [1 ]
Song, Tingrui [1 ]
Kang, Quan [1 ]
Xu, Tao [2 ]
He, Shunmin [1 ,3 ]
机构
[1] Chinese Acad Sci, Inst Biophys, Ctr Big Data Res Hlth, Key Lab RNA Biol, Beijing 100101, Peoples R China
[2] Chinese Acad Sci, Inst Biophys, CAS Ctr Excellence Biomacromol, Natl Lab Biomacromol, Beijing 100101, Peoples R China
[3] Univ Chinese Acad Sci, Coll Life Sci, Beijing 100049, Peoples R China
[4] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
来源
CELL REPORTS | 2021年 / 37卷 / 07期
基金
国家重点研发计划;
关键词
RARE VARIANTS; HAN CHINESE; DATABASE; IMPUTATION; MUTATIONS; ANCESTRY; PATTERNS; HISTORY; ROBUST; GENE;
D O I
10.1016/j.celrep.2021.110017
中图分类号
Q2 [细胞生物学];
学科分类号
071009 ; 090102 ;
摘要
The lack of haplotype reference panels and whole-genome sequencing resources specific to the Chinese population has greatly hindered genetic studies in the world's largest population. Here, we present the NyuWa genome resource, based on deep (26.2x) sequencing of 2,999 Chinese individuals, and construct a NyuWa reference panel of 5,804 haplotypes and 19.3 million variants, which is a high-quality publicly available Chinese population-specific reference panel with thousands of samples. Compared with other panels, the NyuWa reference panel reduces the Han Chinese imputation error rate by a margin ranging from 30% to 51%. Population structure and imputation simulation tests support the applicability of one integrated reference panel for northern and southern Chinese. In addition, a total of 22,504 loss-of-function variants in coding and noncoding genes are identified, including 11,493 novel variants. These results highlight the value of the NyuWa genome resource in facilitating genetic research in Chinese and Asian populations.
引用
收藏
页数:20
相关论文
共 76 条
[1]   Fast model-based estimation of ancestry in unrelated individuals [J].
Alexander, David H. ;
Novembre, John ;
Lange, Kenneth .
GENOME RESEARCH, 2009, 19 (09) :1655-1664
[2]  
[Anonymous], 2015, Nature, DOI DOI 10.1038/NATURE15393
[3]   The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans [J].
Ardlie, Kristin G. ;
DeLuca, David S. ;
Segre, Ayellet V. ;
Sullivan, Timothy J. ;
Young, Taylor R. ;
Gelfand, Ellen T. ;
Trowbridge, Casandra A. ;
Maller, Julian B. ;
Tukiainen, Taru ;
Lek, Monkol ;
Ward, Lucas D. ;
Kheradpour, Pouya ;
Iriarte, Benjamin ;
Meng, Yan ;
Palmer, Cameron D. ;
Esko, Tonu ;
Winckler, Wendy ;
Hirschhorn, Joel N. ;
Kellis, Manolis ;
MacArthur, Daniel G. ;
Getz, Gad ;
Shabalin, Andrey A. ;
Li, Gen ;
Zhou, Yi-Hui ;
Nobel, Andrew B. ;
Rusyn, Ivan ;
Wright, Fred A. ;
Lappalainen, Tuuli ;
Ferreira, Pedro G. ;
Ongen, Halit ;
Rivas, Manuel A. ;
Battle, Alexis ;
Mostafavi, Sara ;
Monlong, Jean ;
Sammeth, Michael ;
Mele, Marta ;
Reverter, Ferran ;
Goldmann, Jakob M. ;
Koller, Daphne ;
Guigo, Roderic ;
McCarthy, Mark I. ;
Dermitzakis, Emmanouil T. ;
Gamazon, Eric R. ;
Im, Hae Kyung ;
Konkashbaev, Anuar ;
Nicolae, Dan L. ;
Cox, Nancy J. ;
Flutre, Timothee ;
Wen, Xiaoquan ;
Stephens, Matthew .
SCIENCE, 2015, 348 (6235) :648-660
[4]   Imputation of Rare Variants in Next-Generation Association Studies [J].
Asimit, Jennifer L. ;
Zeggini, Eleftheria .
HUMAN HEREDITY, 2012, 74 (3-4) :196-204
[5]   Insights into human genetic variation and population history from 929 diverse genomes [J].
Bergstrom, Anders ;
McCarthy, Shane A. ;
Hui, Ruoyun ;
Almarri, Mohamed A. ;
Ayub, Qasim ;
Danecek, Petr ;
Chen, Yuan ;
Felkel, Sabine ;
Hallast, Pille ;
Kamm, Jack ;
Blanche, Helene ;
Deleuze, Jean-Francois ;
Cann, Howard ;
Mallick, Swapan ;
Reich, David ;
Sandhu, Manjinder S. ;
Skoglund, Pontus ;
Scally, Aylwyn ;
Xue, Yali ;
Durbin, Richard ;
Tyler-Smith, Chris .
SCIENCE, 2020, 367 (6484) :1339-+
[6]   Trimmomatic: a flexible trimmer for Illumina sequence data [J].
Bolger, Anthony M. ;
Lohse, Marc ;
Usadel, Bjoern .
BIOINFORMATICS, 2014, 30 (15) :2114-2120
[7]   The impact of rare and low-frequency genetic variants in common disease [J].
Bomba, Lorenzo ;
Walter, Klaudia ;
Soranzo, Nicole .
GENOME BIOLOGY, 2017, 18
[8]   The ChinaMAP analytics of deep whole genome sequences in 10,588 individuals [J].
Cao, Yanan ;
Li, Lin ;
Xu, Min ;
Feng, Zhimin ;
Sun, Xiaohui ;
Lu, Jieli ;
Xu, Yu ;
Du, Peina ;
Wang, Tiange ;
Hu, Ruying ;
Ye, Zhen ;
Shi, Lixin ;
Tang, Xulei ;
Yan, Li ;
Gao, Zhengnan ;
Chen, Gang ;
Zhang, Yinfei ;
Chen, Lulu ;
Ning, Guang ;
Bi, Yufang ;
Wang, Weiqing .
CELL RESEARCH, 2020, 30 (09) :717-731
[9]   Sequencing an Ashkenazi reference panel supports population-targeted personal genomics and illuminates Jewish and European origins [J].
Carmi, Shai ;
Hui, Ken Y. ;
Kochav, Ethan ;
Liu, Xinmin ;
Xue, James ;
Grady, Fillan ;
Guha, Saurav ;
Upadhyay, Kinnari ;
Ben-Avraham, Dan ;
Mukherjee, Semanti ;
Bowen, B. Monica ;
Thomas, Tinu ;
Vijai, Joseph ;
Cruts, Marc ;
Froyen, Guy ;
Lambrechts, Diether ;
Plaisance, Stephane ;
Van Broeckhoven, Christine ;
Van Damme, Philip ;
Van Marck, Herwig ;
Barzilai, Nir ;
Darvasi, Ariel ;
Offit, Kenneth ;
Bressman, Susan ;
Ozelius, Laurie J. ;
Peter, Inga ;
Cho, Judy H. ;
Ostrer, Harry ;
Atzmon, Gil ;
Clark, Lorraine N. ;
Lencz, Todd ;
Pe'er, Itsik .
NATURE COMMUNICATIONS, 2014, 5
[10]   Second-generation PLINK: rising to the challenge of larger and richer datasets [J].
Chang, Christopher C. ;
Chow, Carson C. ;
Tellier, Laurent C. A. M. ;
Vattikuti, Shashaank ;
Purcell, Shaun M. ;
Lee, James J. .
GIGASCIENCE, 2015, 4