Pan-human consensus genome significantly improves the accuracy of RNA-seq analyses

被引:7
作者
Kaminow, Benjamin [1 ,2 ]
Ballouz, Sara [1 ,3 ,4 ]
Gillis, Jesse [1 ,5 ,6 ]
Dobin, Alexander [1 ]
机构
[1] Cold Spring Harbor Lab, Cold Spring Harbor, NY 11724 USA
[2] Weill Cornell Grad Sch Med Sci, Triinst PhD Program Computat Biol & Med, New York, NY 10065 USA
[3] Garvan Inst Med Res, Garvan Weizmann Ctr Cellular Genom, Darlinghurst, NSW 2010, Australia
[4] Univ New South Wales, Sch Med Sci, Sydney, NSW 2052, Australia
[5] Univ Toronto, Dept Physiol, Toronto, ON M5S 1A8, Canada
[6] Univ Toronto, Terrence Donnelly Ctr Cellular & Biomol Res, Toronto, ON M5S 1A8, Canada
基金
美国国家卫生研究院;
关键词
PANGENOME GRAPHS; VARIANTS; ALIGNMENT; DATABASE; BIAS;
D O I
10.1101/gr.275613.121
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The Human Reference Genome serves as the foundation for modern genomic analyses. However, in its present form, it does not adequately represent the vast genetic diversity of the human population. In this study, we explored the consensus genome as a potential successor of the current reference genome and assessed its effect on the accuracy of RNA-seq read alignment. To find the best haploid genome representation, we constructed consensus genomes at the pan-human, superpopulation, and population levels, using variant information from The 1000 Genomes Project Consortium. Using personal haploid genomes as the ground truth, we compared mapping errors for real RNA-seq reads aligned to the consensus genomes versus the reference genome. For reads overlapping homozygous variants, we found that the mapping error decreased by a factor of approximately two to three when the reference was replaced with the pan-human consensus genome. We also found that using more population-specific consensuses resulted in little to no increase over using the pan-human consensus, suggesting a limit in the utility of incorporating a more specific genomic variation. Replacing the reference with consensus genomes impacts functional analyses, such as differential expressions of isoforms, genes, and splice junctions.
引用
收藏
页码:738 / 749
页数:12
相关论文
共 45 条
[1]   A global reference for human genetic variation [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Wang, Jun ;
Wilson, Richard K. ;
Boerwinkle, Eric ;
Doddapaneni, Harsha ;
Han, Yi ;
Korchina, Viktoriya ;
Kovar, Christie ;
Lee, Sandra ;
Muzny, Donna ;
Reid, Jeffrey G. ;
Zhu, Yiming ;
Chang, Yuqi ;
Feng, Qiang ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Lan, Tianming ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Liu, Shengmao ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Tang, Meifang ;
Wang, Bo .
NATURE, 2015, 526 (7571) :68-+
[2]   Gene inactivation and its implications for annotation in the era of personal genomics [J].
Balasubramanian, Suganthi ;
Habegger, Lukas ;
Frankish, Adam ;
MacArthur, Daniel G. ;
Harte, Rachel ;
Tyler-Smith, Chris ;
Harrow, Jennifer ;
Gerstein, Mark .
GENES & DEVELOPMENT, 2011, 25 (01) :1-10
[3]   Is it time to change the reference genome? [J].
Ballouz, Sara ;
Dobin, Alexander ;
Gillis, Jesse A. .
GENOME BIOLOGY, 2019, 20 (01)
[4]   Catching hidden variation: systematic correction of reference minor allele annotation in clinical variant calling [J].
Barbitoff, Yury A. ;
Bezdvornykh, Igor V. ;
Polev, Dmitrii E. ;
Serebryakova, Elena A. ;
Glotov, Andrey S. ;
Glotov, Oleg S. ;
Predeus, Alexander V. .
GENETICS IN MEDICINE, 2018, 20 (03) :360-364
[5]   Assembling large genomes with single-molecule sequencing and locality-sensitive hashing [J].
Berlin, Konstantin ;
Koren, Sergey ;
Chin, Chen-Shan ;
Drake, James P. ;
Landolin, Jane M. ;
Phillippy, Adam M. .
NATURE BIOTECHNOLOGY, 2015, 33 (06) :623-+
[6]   Removing reference mapping biases using limited or no genotype data identifies allelic differences in protein binding at disease-associated loci [J].
Buchkovich, Martin L. ;
Eklund, Karl ;
Duan, Qing ;
Li, Yun ;
Mohlke, Karen L. ;
Furey, Terrence S. .
BMC MEDICAL GENOMICS, 2015, 8
[7]   Tools and best practices for data processing in allelic expression analysis [J].
Castel, Stephane E. ;
Levy-Moonshine, Ami ;
Mohammadii, Pejman ;
Banks, Eric ;
Lappalainenii, Tuuli .
GENOME BIOLOGY, 2015, 16
[8]   Reference flow: reducing reference bias using multiple population genomes [J].
Chen, Nae-Chyun ;
Solomon, Brad ;
Mun, Taher ;
Iyer, Sheila ;
Langmead, Ben .
GENOME BIOLOGY, 2021, 22 (01)
[9]  
Chen R, 2011, BIOCOMPUT-PAC SYM, P231
[10]   An ethnically relevant consensus Korean reference genome is a step towards personal reference genomes [J].
Cho, Yun Sung ;
Kim, Hyunho ;
Kim, Hak-Min ;
Jho, Sungwoong ;
Jun, JeHoon ;
Lee, Yong Joo ;
Chae, Kyun Shik ;
Kim, Chang Geun ;
Kim, Sangsoo ;
Eriksson, Anders ;
Edwards, Jeremy S. ;
Lee, Semin ;
Kim, Byung Chul ;
Manica, Andrea ;
Oh, Tae-Kwang ;
Church, George M. ;
Bhak, Jong .
NATURE COMMUNICATIONS, 2016, 7