The first gapless, reference-quality, fully annotated genome from a Southern Han Chinese individual

被引:12
作者
Chao, Kuan-Hao [1 ,2 ,5 ]
Zimin, Aleksey, V [2 ,3 ]
Pertea, Mihaela [2 ,3 ]
Salzberg, Steven L. [1 ,2 ,3 ,4 ,6 ]
机构
[1] Johns Hopkins Univ, Dept Comp Sci, Baltimore, MD 21218 USA
[2] Johns Hopkins Univ, Ctr Computat Biol, Baltimore, MD 21218 USA
[3] Johns Hopkins Univ, Dept Biomed Engn, Baltimore, MD 21218 USA
[4] Johns Hopkins Univ, Dept Biostat, Baltimore, MD 21211 USA
[5] 3100 Wyman Pk Dr,Wyman Pk Bldg,Room S217, Baltimore, MD 21211 USA
[6] 3100 Wyman Pk Dr,Wyman Pk Bldg,Room S220, Baltimore, MD 21211 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
genome assembly; annotation; DNA sequencing; reference genome; variant calling; DNA-SEQUENCE; CHROMOSOME; DIVERSITY; POPULATIONS; DATABASE;
D O I
10.1093/g3journal/jkac321
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
We used long-read DNA sequencing to assemble the genome of a Southern Han Chinese male. We organized the sequence into chromosomes and filled in gaps using the recently completed T2T-CHM13 genome as a guide, yielding a gap-free genome, Han1, containing 3,099,707,698 bases. Using the T2T-CHM13 annotation as a reference, we mapped all genes onto the Han1 genome and identified additional gene copies, generating a total of 60,708 putative genes, of which 20,003 are protein-coding. A comprehensive comparison between the genes revealed that 235 protein-coding genes were substantially different between the individuals, with frameshifts or truncations affecting the protein-coding sequence. Most of these were heterozygous variants in which one gene copy was unaffected. This represents the first gene-level comparison between two finished, annotated individual human genomes.
引用
收藏
页数:9
相关论文
共 51 条
[1]   The conservation landscape of the human ribosomal RNA gene repeats [J].
Agrawal, Saumya ;
Ganley, Austen R. D. .
PLOS ONE, 2018, 13 (12)
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]   KrakenUniq: confident and fast metagenomics classification using unique k-mer counts [J].
Breitwieser, F. P. ;
Baker, D. N. ;
Salzberg, S. L. .
GENOME BIOLOGY, 2018, 19
[4]   The Chinese Human Genome Diversity Project [J].
Cavalli-Sforza, LL .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (20) :11501-11503
[5]  
Central Intelligence Agency, 2022, WORLD FACT BOOK TAIW
[6]   The extent of genetic diversity of Epstein-Barr virus and its geographic and disease patterns: A need for reappraisal [J].
Chang, Cindy M. ;
Yu, Kelly J. ;
Mbulaiteye, Sam M. ;
Hildesheim, Allan ;
Bhatia, Kishor .
VIRUS RESEARCH, 2009, 143 (02) :209-221
[7]   Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm [J].
Cheng, Haoyu ;
Concepcion, Gregory T. ;
Feng, Xiaowen ;
Zhang, Haowen ;
Li, Heng .
NATURE METHODS, 2021, 18 (02) :170-+
[8]   Genetic relationship of populations in China [J].
Chu, JY ;
Huang, W ;
Kuang, SQ ;
Wang, JM ;
Xu, JJ ;
Chu, ZT ;
Yang, ZQ ;
Lin, KQ ;
Li, P ;
Wu, M ;
Geng, ZC ;
Tan, CC ;
Du, RF ;
Jin, L .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (20) :11763-11768
[9]   Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments [J].
Daily, Jeff .
BMC BIOINFORMATICS, 2016, 16
[10]   Whole Genome Analyses of Chinese Population and De Novo Assembly of A Northern Han Genome [J].
Du, Zhenglin ;
Ma, Liang ;
Qu, Hongzhu ;
Chen, Wei ;
Zhang, Bing ;
Lu, Xi ;
Zhai, Weibo ;
Sheng, Xin ;
Sun, Yongqiao ;
Li, Wenjie ;
Lei, Meng ;
Qi, Qiuhui ;
Yuan, Na ;
Shi, Shuo ;
Zeng, Jingyao ;
Wang, Jinyue ;
Yang, Yadong ;
Liu, Qi ;
Hong, Yaqiang ;
Dong, Lili ;
Zhang, Zhewen ;
Zou, Dong ;
Wang, Yanqing ;
Song, Shuhui ;
Liu, Fan ;
Fang, Xiangdong ;
Chen, Hua ;
Liu, Xin ;
Xiao, Jingfa ;
Zeng, Changqing .
GENOMICS PROTEOMICS & BIOINFORMATICS, 2019, 17 (03) :229-247