GASS: genome structural annotation for Eukaryotes based on species similarity

被引:3
作者
Wang, Ying [1 ]
Chen, Lina [1 ]
Song, Nianfeng [1 ]
Lei, Xiaoye [1 ]
机构
[1] Xiamen Univ, Sch Informat Sci & Technol, Dept Automat, Xiamen 361005, Fujian, Peoples R China
基金
中国国家自然科学基金;
关键词
Structural genome annotation; Computational method; Species similarity; Dynamic programming; Rhesus genome; RHESUS MACAQUE; RNA-SEQ; GENE PREDICTION; DATABASE; EVOLUTIONARY; ALIGNMENT; TRANSCRIPTS; ASSEMBLIES; EXPRESSION; PROJECT;
D O I
10.1186/s12864-015-1353-3
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: With the development of high-throughput sequencing techniques, more and more genomes were sequenced and assembled. However, annotating a genome's structure rapidly and expressly remains challenging. Current eukaryotic genome annotations require various, abundant supporting data, such as: species-specific and cross-species protein sequences, ESTs, cDNA and RNA-Seq data. Collecting those data and merging their analytical results to achieve a consistent complete annotation is a complex, time and cost consuming task. Results: In our study, we proposed a fast and easy-to-use computational tool: GASS (Genome Annotation based on Species Similarity). It annotates a eukaryotic genome based on only the annotations from another similar species. With aligning the exons' sequences of an annotated similar species to the un-annotated genome, GASS detects the optimal transcript annotations with a shortest-path model. In our study, GASS was used to achieve the rhesus annotations based on the human annotations. The produced annotations were evaluated by comparing them to the two existing rhesus annotation databases (RefSeq and Ensembl) directly and being aligned with three RNA-Seq data of rhesus. The experiment results showed that more than 65% RefSeq exons and splicing junctions were exactly found by GASS. GASS's sensitivity was higher than RefSeq's, and was close to Ensembl's. GASS had higher specificities than Ensembl at gene, transcript, exon and splicing junction levels. We also found the mis-assemblies of rheMac3 genome, which led to the 2 bp shifts in annotating position on exons' boundary and then the incomplete splicing canonical sites in Refseq annotations. These detections were further supported by various data sources. Conclusions: GASS quickly produces structural genome annotations in sufficient abundance and accuracy. With simple and rapid running of GASS, small labs can create quick views of genome annotations for an un-annotated species, without the necessity to create, collect, analyze and synthesize extra various data sources, or wait several months for the annotations from professional organizations. GASS can be applied to many study occasions, such as the analysis of RNA-Seq datasets from the unannotated species whose genome drafts are available but the annotations are not.
引用
收藏
页数:14
相关论文
共 34 条
[1]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[2]   Deep Transcriptional Sequencing of Mucosal Challenge Compartment from Rhesus Macaques Acutely Infected with Simian Immunodeficiency Virus Implicates Loss of Cell Adhesion Preceding Immune Activation [J].
Barrenas, Fredrik ;
Palermo, Robert E. ;
Agricola, Brian ;
Agy, Michael B. ;
Aicher, Lauri ;
Carter, Victoria ;
Flanary, Leon ;
Green, Richard R. ;
McLain, Randy ;
Li, Qingsheng ;
Lu, Wuxun ;
Murnane, Robert ;
Peng, Xinxia ;
Thomas, Matthew J. ;
Weiss, Jeffrey M. ;
Anderson, David M. ;
Katze, Michael G. .
JOURNAL OF VIROLOGY, 2014, 88 (14) :7962-7972
[3]   SpliceDB: database of canonical and non-canonical mammalian splice sites [J].
Burset, M ;
Seledtsov, IA ;
Solovyev, VV .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :255-259
[4]   MAKER: An easy-to-use annotation pipeline designed for emerging model organism genomes [J].
Cantarel, Brandi L. ;
Korf, Ian ;
Robb, Sofia M. C. ;
Parra, Genis ;
Ross, Eric ;
Moore, Barry ;
Holt, Carson ;
Alvarado, Alejandro Sanchez ;
Yandell, Mark .
GENOME RESEARCH, 2008, 18 (01) :188-196
[5]   RNA Editome in Rhesus Macaque Shaped by Purifying Selection [J].
Chen, Jia-Yu ;
Peng, Zhiyu ;
Zhang, Rongli ;
Yang, Xin-Zhuang ;
Tan, Bertrand Chin-Ming ;
Fang, Huaying ;
Liu, Chu-Jun ;
Shi, Mingming ;
Ye, Zhi-Qiang ;
Zhang, Yong E. ;
Deng, Minghua ;
Zhang, Xiuqin ;
Li, Chuan-Yun .
PLOS GENETICS, 2014, 10 (04)
[6]   A quantitative atlas of polyadenylation in five mammals [J].
Derti, Adnan ;
Garrett-Engele, Philip ;
MacIsaac, Kenzie D. ;
Stevens, Richard C. ;
Sriram, Shreedharan ;
Chen, Ronghua ;
Rohl, Carol A. ;
Johnson, Jason M. ;
Babak, Tomas .
GENOME RESEARCH, 2012, 22 (06) :1173-1183
[7]   Evolutionary and biomedical insights from the rhesus macaque genome [J].
Gibbs, Richard A. ;
Rogers, Jeffrey ;
Katze, Michael G. ;
Bumgarner, Roger ;
Weinstock, George M. ;
Mardis, Elaine R. ;
Remington, Karin A. ;
Strausberg, Robert L. ;
Venter, J. Craig ;
Wilson, Richard K. ;
Batzer, Mark A. ;
Bustamante, Carlos D. ;
Eichler, Evan E. ;
Hahn, Matthew W. ;
Hardison, Ross C. ;
Makova, Kateryna D. ;
Miller, Webb ;
Milosavljevic, Aleksandar ;
Palermo, Robert E. ;
Siepel, Adam ;
Sikela, James M. ;
Attaway, Tony ;
Bell, Stephanie ;
Bernard, Kelly E. ;
Buhay, Christian J. ;
Chandrabose, Mimi N. ;
Dao, Marvin ;
Davis, Clay ;
Delehaunty, Kimberly D. ;
Ding, Yan ;
Dinh, Huyen H. ;
Dugan-Rocha, Shannon ;
Fulton, Lucinda A. ;
Gabisi, Ramatu Ayiesha ;
Garner, Toni T. ;
Godfrey, Jennifer ;
Hawes, Alicia C. ;
Hernandez, Judith ;
Hines, Sandra ;
Holder, Michael ;
Hume, Jennifer ;
Jhangiani, Shalini N. ;
Joshi, Vandita ;
Khan, Ziad Mohid ;
Kirkness, Ewen F. ;
Cree, Andrew ;
Fowler, R. Gerald ;
Lee, Sandra ;
Lewis, Lora R. ;
Li, Zhangwan .
SCIENCE, 2007, 316 (5822) :222-234
[8]   Full-length transcriptome assembly from RNA-Seq data without a reference genome [J].
Grabherr, Manfred G. ;
Haas, Brian J. ;
Yassour, Moran ;
Levin, Joshua Z. ;
Thompson, Dawn A. ;
Amit, Ido ;
Adiconis, Xian ;
Fan, Lin ;
Raychowdhury, Raktima ;
Zeng, Qiandong ;
Chen, Zehua ;
Mauceli, Evan ;
Hacohen, Nir ;
Gnirke, Andreas ;
Rhind, Nicholas ;
di Palma, Federica ;
Birren, Bruce W. ;
Nusbaum, Chad ;
Lindblad-Toh, Kerstin ;
Friedman, Nir ;
Regev, Aviv .
NATURE BIOTECHNOLOGY, 2011, 29 (07) :644-U130
[9]   Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies [J].
Haas, BJ ;
Delcher, AL ;
Mount, SM ;
Wortman, JR ;
Smith, RK ;
Hannick, LI ;
Maiti, R ;
Ronning, CM ;
Rusch, DB ;
Town, CD ;
Salzberg, SL ;
White, O .
NUCLEIC ACIDS RESEARCH, 2003, 31 (19) :5654-5666
[10]   GENCODE: The reference human genome annotation for The ENCODE Project [J].
Harrow, Jennifer ;
Frankish, Adam ;
Gonzalez, Jose M. ;
Tapanari, Electra ;
Diekhans, Mark ;
Kokocinski, Felix ;
Aken, Bronwen L. ;
Barrell, Daniel ;
Zadissa, Amonida ;
Searle, Stephen ;
Barnes, If ;
Bignell, Alexandra ;
Boychenko, Veronika ;
Hunt, Toby ;
Kay, Mike ;
Mukherjee, Gaurab ;
Rajan, Jeena ;
Despacio-Reyes, Gloria ;
Saunders, Gary ;
Steward, Charles ;
Harte, Rachel ;
Lin, Michael ;
Howald, Cedric ;
Tanzer, Andrea ;
Derrien, Thomas ;
Chrast, Jacqueline ;
Walters, Nathalie ;
Balasubramanian, Suganthi ;
Pei, Baikang ;
Tress, Michael ;
Manuel Rodriguez, Jose ;
Ezkurdia, Iakes ;
van Baren, Jeltje ;
Brent, Michael ;
Haussler, David ;
Kellis, Manolis ;
Valencia, Alfonso ;
Reymond, Alexandre ;
Gerstein, Mark ;
Guigo, Roderic ;
Hubbard, Tim J. .
GENOME RESEARCH, 2012, 22 (09) :1760-1774