A new rhesus macaque assembly and annotation for next-generation sequencing analyses

被引:135
作者
Zimin, Aleksey V. [1 ]
Cornish, Adam S. [2 ]
Maudhoo, Mnirnal D. [2 ]
Gibbs, Robert M. [2 ]
Zhang, Xiongfei [2 ]
Pandey, Sanjit [2 ]
Meehan, Daniel T. [2 ]
Wipfler, Kristin [2 ]
Bosinger, Steven E. [3 ]
Johnson, Zachary P. [3 ]
Tharp, Gregory K. [3 ]
Marcais, Guillaume [1 ]
Roberts, Michael [1 ]
Ferguson, Betsy [4 ]
Fox, Howard S. [5 ]
Treangen, Todd [6 ,7 ]
Salzberg, Steven L. [6 ,7 ]
Yorke, James A. [1 ]
Norgren, Robert B., Jr. [2 ]
机构
[1] Univ Maryland, Inst Phys Sci & Technol, College Pk, MD 20742 USA
[2] Univ Nebraska Med Ctr, Dept Genet Cell Biol & Anat, Omaha, NE 68198 USA
[3] Emory Univ, Robert W Woodruff Hlth Sci Ctr, Yerkes Natl Primate Res Ctr, Nonhuman Primate Genom Core, Atlanta, GA 30322 USA
[4] Oregon Hlth & Sci Univ, Oregon Natl Primate Res Ctr, Primate Genet Program, Div Neurosci, Beaverton, OR 97006 USA
[5] Univ Nebraska Med Ctr, Dept Pharmacol & Expt Neurosci, Omaha, NE 68198 USA
[6] Johns Hopkins Univ, Sch Med, Ctr Computat Biol, Baltimore, MD 21205 USA
[7] Johns Hopkins Univ, Sch Med, Dept Biomed Engn, Baltimore, MD 21205 USA
来源
BIOLOGY DIRECT | 2014年 / 9卷
关键词
Macaca mulatta; Rhesus macaque; Genome; Assembly; Annotation; Transcriptome; Next-generation sequencing; MAJOR HISTOCOMPATIBILITY COMPLEX; RADIATION HYBRID MAP; RNA-SEQ; NONHUMAN PRIMATE; GENOME ANNOTATIONS; ALIGNMENT PROGRAM; EVOLUTIONARY; PROTEINS; SEARCH; IDENTIFICATION;
D O I
10.1186/1745-6150-9-20
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: The rhesus macaque (Macaca mulatta) is a key species for advancing biomedical research. Like all draft mammalian genomes, the draft rhesus assembly (rheMac2) has gaps, sequencing errors and misassemblies that have prevented automated annotation pipelines from functioning correctly. Another rhesus macaque assembly, CR_1.0, is also available but is substantially more fragmented than rheMac2 with smaller contigs and scaffolds. Annotations for these two assemblies are limited in completeness and accuracy. High quality assembly and annotation files are required for a wide range of studies including expression, genetic and evolutionary analyses. Results: We report a new de novo assembly of the rhesus macaque genome (MacaM) that incorporates both the original Sanger sequences used to assemble rheMac2 and new Illumina sequences from the same animal. MacaM has a weighted average (N50) contig size of 64 kilobases, more than twice the size of the rheMac2 assembly and almost five times the size of the CR_1.0 assembly. The MacaM chromosome assembly incorporates information from previously unutilized mapping data and preliminary annotation of scaffolds. Independent assessment of the assemblies using Ion Torrent read alignments indicates that MacaM is more complete and accurate than rheMac2 and CR_1.0. We assembled messenger RNA sequences from several rhesus tissues into transcripts which allowed us to identify a total of 11,712 complete proteins representing 9,524 distinct genes. Using a combination of our assembled rhesus macaque transcripts and human transcripts, we annotated 18,757 transcripts and 16,050 genes with complete coding sequences in the MacaM assembly. Further, we demonstrate that the new annotations provide greatly improved accuracy as compared to the current annotations of rheMac2. Finally, we show that the MacaM genome provides an accurate resource for alignment of reads produced by RNA sequence expression studies. Conclusions: The MacaM assembly and annotation files provide a substantially more complete and accurate representation of the rhesus macaque genome than rheMac2 or CR_1.0 and will serve as an important resource for investigators conducting next-generation sequencing studies with nonhuman primates. Reviewers: This article was reviewed by Dr. Lutz Walter, Dr. Soojin Yi and Dr. Kateryna Makova.
引用
收藏
页数:15
相关论文
共 45 条
[1]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[2]   Genetic divergence of the rhesus macaque major histocompatibility complex [J].
Daza-Vamenta, R ;
Glusman, G ;
Rowen, L ;
Guthrie, B ;
Geraghty, DE .
GENOME RESEARCH, 2004, 14 (08) :1501-+
[3]   STAR: ultrafast universal RNA-seq aligner [J].
Dobin, Alexander ;
Davis, Carrie A. ;
Schlesinger, Felix ;
Drenkow, Jorg ;
Zaleski, Chris ;
Jha, Sonali ;
Batut, Philippe ;
Chaisson, Mark ;
Gingeras, Thomas R. .
BIOINFORMATICS, 2013, 29 (01) :15-21
[4]   Genome-based analysis of the nonhuman primate Macaca fascicularis as a model for drug safety assessment [J].
Ebeling, Martin ;
Kueng, Erich ;
See, Angela ;
Broger, Clemens ;
Steiner, Guido ;
Berrera, Marco ;
Heckel, Tobias ;
Iniguez, Leonardo ;
Albert, Thomas ;
Schmucki, Roland ;
Biller, Hermann ;
Singer, Thomas ;
Certa, Ulrich .
GENOME RESEARCH, 2011, 21 (10) :1746-1756
[5]   Evolutionary and biomedical insights from the rhesus macaque genome [J].
Gibbs, Richard A. ;
Rogers, Jeffrey ;
Katze, Michael G. ;
Bumgarner, Roger ;
Weinstock, George M. ;
Mardis, Elaine R. ;
Remington, Karin A. ;
Strausberg, Robert L. ;
Venter, J. Craig ;
Wilson, Richard K. ;
Batzer, Mark A. ;
Bustamante, Carlos D. ;
Eichler, Evan E. ;
Hahn, Matthew W. ;
Hardison, Ross C. ;
Makova, Kateryna D. ;
Miller, Webb ;
Milosavljevic, Aleksandar ;
Palermo, Robert E. ;
Siepel, Adam ;
Sikela, James M. ;
Attaway, Tony ;
Bell, Stephanie ;
Bernard, Kelly E. ;
Buhay, Christian J. ;
Chandrabose, Mimi N. ;
Dao, Marvin ;
Davis, Clay ;
Delehaunty, Kimberly D. ;
Ding, Yan ;
Dinh, Huyen H. ;
Dugan-Rocha, Shannon ;
Fulton, Lucinda A. ;
Gabisi, Ramatu Ayiesha ;
Garner, Toni T. ;
Godfrey, Jennifer ;
Hawes, Alicia C. ;
Hernandez, Judith ;
Hines, Sandra ;
Holder, Michael ;
Hume, Jennifer ;
Jhangiani, Shalini N. ;
Joshi, Vandita ;
Khan, Ziad Mohid ;
Kirkness, Ewen F. ;
Cree, Andrew ;
Fowler, R. Gerald ;
Lee, Sandra ;
Lewis, Lora R. ;
Li, Zhangwan .
SCIENCE, 2007, 316 (5822) :222-234
[6]   IDENTIFICATION OF PROTEIN CODING REGIONS BY DATABASE SIMILARITY SEARCH [J].
GISH, W ;
STATES, DJ .
NATURE GENETICS, 1993, 3 (03) :266-272
[7]  
Homer N., TMAP: the Torrent Mapping Alignment Program
[8]   A comprehensive evaluation of assembly scaffolding tools [J].
Hunt, Martin ;
Newbold, Chris ;
Berriman, Matthew ;
Otto, Thomas D. .
GENOME BIOLOGY, 2014, 15 (03)
[9]  
Kalin NH, 2003, J CLIN PSYCHIAT, V64, P41
[10]   A high-resolution radiation hybrid map of rhesus macaque chromosome 5 identifies rearrangements in the genome assembly [J].
Karere, Genesio M. ;
Froenicke, Lutz ;
Millon, Lee ;
Womack, James E. ;
Lyons, Leslie A. .
GENOMICS, 2008, 92 (04) :210-218