GAD: A Python']Python Script for Dividing Genome Annotation Files into Feature-Based Files

被引:1
作者
Yasser, Norhan [1 ]
Karam, Ahmed [1 ]
机构
[1] Agr Res Ctr, Agr Genet Engn Res Inst, Giza, Egypt
关键词
Genome annotation; Extraction; Features; GFF3; GTF; BED; DATABASE; RESOURCE;
D O I
10.1007/s12539-020-00378-4
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Nowadays, the manipulation and analysis of genomic data stored in publicly accessible repositories have become a daily task in genomics and bioinformatics laboratories. Due to the enormous advancement in the field of genome sequencing and the emergence of many projects, bioinformaticians have pushed for the creation of a variety of programs and pipelines that will automatically analyze such big data, in particular the pipelines of gene annotation. Dealing with annotation files using easy and simple programs is very important, particularly for non-developers, enhancing the genomic data analysis acceleration. One of the first tasks required to work with genomic annotation files is to extract different features. In this regard, we have developed GAD () using Python to be a fast, easy, and controlled script that has a high ability to handle annotation files such as GFF3 and GTF. GAD is a cross-platform graphical interface tool used to extract genome features such as intergenic regions, upstream, and downstream genes. Besides, GAD finds all names of ambiguous sequence ontology, and either extracts them or considers them as genes or transcripts. The results are produced in a variety of file formats, such as BED, GTF, GFF3, and FASTA, supported by other bioinformatics programs. The GAD can handle large sizes of different genomes and an infinite number of files with minimal user effort. Therefore, our script could be integrated into various pipelines in all genomic laboratories to accelerate data analysis.
引用
收藏
页码:377 / 381
页数:5
相关论文
共 21 条
[1]   The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update [J].
Afgan, Enis ;
Baker, Dannon ;
Batut, Berenice ;
van den Beek, Marius ;
Bouvier, Dave ;
Cech, Martin ;
Chilton, John ;
Clements, Dave ;
Coraor, Nate ;
Gruening, Bjoern A. ;
Guerler, Aysam ;
Hillman-Jackson, Jennifer ;
Hiltemann, Saskia ;
Jalili, Vahid ;
Rasche, Helena ;
Soranzo, Nicola ;
Goecks, Jeremy ;
Taylor, James ;
Nekrutenko, Anton ;
Blankenberg, Daniel .
NUCLEIC ACIDS RESEARCH, 2018, 46 (W1) :W537-W544
[2]  
Agarwala R, 2018, NUCLEIC ACIDS RES, V46, pD8, DOI [10.1093/nar/gks1189, 10.1093/nar/gkx1095, 10.1093/nar/gkq1172]
[3]   Ensembl 2017 [J].
Aken, Bronwen L. ;
Achuthan, Premanand ;
Akanni, Wasiu ;
Amode, M. Ridwan ;
Bernsdorff, Friederike ;
Bhai, Jyothish ;
Billis, Konstantinos ;
Carvalho-Silva, Denise ;
Cummins, Carla ;
Clapham, Peter ;
Gil, Laurent ;
Giron, Carlos Garcia ;
Gordon, Leo ;
Hourlier, Thibaut ;
Hunt, Sarah E. ;
Janacek, Sophie H. ;
Juettemann, Thomas ;
Keenan, Stephen ;
Laird, Matthew R. ;
Lavidas, Ilias ;
Maurel, Thomas ;
McLaren, William ;
Moore, Benjamin ;
Murphy, Daniel N. ;
Nag, Rishi ;
Newman, Victoria ;
Nuhn, Michael ;
Ong, Chuang Kee ;
Parker, Anne ;
Patricio, Mateus ;
Riat, Harpreet Singh ;
Sheppard, Daniel ;
Sparrow, Helen ;
Taylor, Kieron ;
Thormann, Anja ;
Vullo, Alessandro ;
Walts, Brandon ;
Wilder, Steven P. ;
Zadissa, Amonida ;
Kostadima, Myrto ;
Martin, Fergal J. ;
Muffato, Matthieu ;
Perry, Emily ;
Ruffier, Magali ;
Staines, Daniel M. ;
Trevanion, Stephen J. ;
Cunningham, Fiona ;
Yates, Andrew ;
Zerbino, Daniel R. ;
Flicek, Paul .
NUCLEIC ACIDS RESEARCH, 2017, 45 (D1) :D635-D642
[4]   gff2sequence, a new user friendly tool for the generation of genomic sequences [J].
Camiolo, Salvatore ;
Porceddu, Andrea .
BIODATA MINING, 2013, 6
[5]   Saccharomyces Genome Database: the genomics resource of budding yeast [J].
Cherry, J. Michael ;
Hong, Eurie L. ;
Amundsen, Craig ;
Balakrishnan, Rama ;
Binkley, Gail ;
Chan, Esther T. ;
Christie, Karen R. ;
Costanzo, Maria C. ;
Dwight, Selina S. ;
Engel, Stacia R. ;
Fisk, Dianna G. ;
Hirschman, Jodi E. ;
Hitz, Benjamin C. ;
Karra, Kalpana ;
Krieger, Cynthia J. ;
Miyasato, Stuart R. ;
Nash, Rob S. ;
Park, Julie ;
Skrzypek, Marek S. ;
Simison, Matt ;
Weng, Shuai ;
Wong, Edith D. .
NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) :D700-D705
[6]   FlyBase: introduction of the Drosophila melanogaster Release 6 reference genome assembly and large-scale migration of genome annotations [J].
dos Santos, Gilberto ;
Schroeder, Andrew J. ;
Goodman, Joshua L. ;
Strelets, Victor B. ;
Crosby, Madeline A. ;
Thurmond, Jim ;
Emmert, David B. ;
Gelbart, William M. .
NUCLEIC ACIDS RESEARCH, 2015, 43 (D1) :D690-D697
[7]   The Sequence Ontology: a tool for the unification of genome annotations [J].
Eilbeck, K ;
Lewis, SE ;
Mungall, CJ ;
Yandell, M ;
Stein, L ;
Durbin, R ;
Ashburner, M .
GENOME BIOLOGY, 2005, 6 (05)
[8]   WormBase: a comprehensive resource for nematode research [J].
Harris, Todd W. ;
Antoshechkin, Igor ;
Bieri, Tamberlyn ;
Blasiar, Darin ;
Chan, Juancarlos ;
Chen, Wen J. ;
De La Cruz, Norie ;
Davis, Paul ;
Duesbury, Margaret ;
Fang, Ruihua ;
Fernandes, Jolene ;
Han, Michael ;
Kishore, Ranjana ;
Lee, Raymond ;
Mueller, Hans-Michael ;
Nakamura, Cecilia ;
Ozersky, Philip ;
Petcherski, Andrei ;
Rangarajan, Arun ;
Rogers, Anthony ;
Schindelman, Gary ;
Schwarz, Erich M. ;
Tuli, Mary Ann ;
Van Auken, Kimberly ;
Wang, Daniel ;
Wang, Xiaodong ;
Williams, Gary ;
Yook, Karen ;
Durbin, Richard ;
Stein, Lincoln D. ;
Spieth, John ;
Sternberg, Paul W. .
NUCLEIC ACIDS RESEARCH, 2010, 38 :D463-D467
[9]  
Howe Kevin, 2012, Worm, V1, P15, DOI 10.4161/worm.19574
[10]   The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools [J].
Lamesch, Philippe ;
Berardini, Tanya Z. ;
Li, Donghui ;
Swarbreck, David ;
Wilks, Christopher ;
Sasidharan, Rajkumar ;
Muller, Robert ;
Dreher, Kate ;
Alexander, Debbie L. ;
Garcia-Hernandez, Margarita ;
Karthikeyan, Athikkattuvalasu S. ;
Lee, Cynthia H. ;
Nelson, William D. ;
Ploetz, Larry ;
Singh, Shanker ;
Wensel, April ;
Huala, Eva .
NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) :D1202-D1210