A pipeline of programs for collecting and analyzing group II intron retroelement sequences from GenBank

被引:9
作者
Abebe, Michael [1 ]
Candales, Manuel A. [1 ]
Duong, Adrian [1 ]
Hood, Keyar S. [1 ]
Li, Tony [1 ]
Neufeld, Ryan A. E. [1 ]
Shakenov, Abat [1 ]
Sun, Runda [1 ]
Wu, Li [1 ]
Jarding, Ashley M. [1 ]
Semper, Cameron [1 ]
Zimmerly, Steven [1 ]
机构
[1] Univ Calgary, Dept Biol Sci, Calgary, AB T2N 1N4, Canada
基金
加拿大健康研究院;
关键词
Bacteria; Genomes; Retroelement; Reverse transcriptase; Ribozyme; SELF-SPLICING INTRONS; REVERSE TRANSCRIPTASES; DIVERSITY; BACTERIA; DOMAIN;
D O I
10.1186/1759-8753-4-28
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Background: Accurate and complete identification of mobile elements is a challenging task in the current era of sequencing, given their large numbers and frequent truncations. Group II intron retroelements, which consist of a ribozyme and an intron-encoded protein (IEP), are usually identified in bacterial genomes through their IEP; however, the RNA component that defines the intron boundaries is often difficult to identify because of a lack of strong sequence conservation corresponding to the RNA structure. Compounding the problem of boundary definition is the fact that a majority of group II intron copies in bacteria are truncated. Results: Here we present a pipeline of 11 programs that collect and analyze group II intron sequences from GenBank. The pipeline begins with a BLAST search of GenBank using a set of representative group II IEPs as queries. Subsequent steps download the corresponding genomic sequences and flanks, filter out non-group II introns, assign introns to phylogenetic subclasses, filter out incomplete and/or non-functional introns, and assign IEP sequences and RNA boundaries to the full-length introns. In the final step, the redundancy in the data set is reduced by grouping introns into sets of >= 95% identity, with one example sequence chosen to be the representative. Conclusions: These programs should be useful for comprehensive identification of group II introns in sequence databases as data continue to rapidly accumulate.
引用
收藏
页数:9
相关论文
共 29 条
[1]  
[Anonymous], 2005, PHYLIP (phylogeny inference package) version 3.6
[2]  
Belfort Marlene, 2002, P761
[3]   Domain structure and three-dimensional model of a group II intron-encoded reverse transcriptase [J].
Blocker, FJH ;
Mohr, G ;
Conlan, LH ;
Qi, L ;
Belfort, M ;
Lambowitz, AM .
RNA, 2005, 11 (01) :14-28
[4]   The ins and outs of group II introns [J].
Bonen, L ;
Vogel, J .
TRENDS IN GENETICS, 2001, 17 (06) :322-331
[5]   Database for bacterial group II introns [J].
Candales, Manuel A. ;
Duong, Adrian ;
Hood, Keyar S. ;
Li, Tony ;
Neufeld, Ryan A. E. ;
Sun, Runda ;
McNeil, Bonnie A. ;
Wu, Li ;
Jarding, Ashley M. ;
Zimmerly, Steven .
NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) :D187-D190
[6]   Compilation and analysis of group II intron insertions in bacterial genomes: evidence for retroelement behavior [J].
Dai, LX ;
Zimmerly, S .
NUCLEIC ACIDS RESEARCH, 2002, 30 (05) :1091-1102
[7]   Group II introns: structure, folding and splicing mechanism [J].
Fedorova, Olga ;
Zingler, Nora .
BIOLOGICAL CHEMISTRY, 2007, 388 (07) :665-678
[8]   HMMER web server: interactive sequence similarity searching [J].
Finn, Robert D. ;
Clements, Jody ;
Eddy, Sean R. .
NUCLEIC ACIDS RESEARCH, 2011, 39 :W29-W37
[9]   MULTIPLE EXON-BINDING SITES IN CLASS-II SELF-SPLICING INTRONS [J].
JACQUIER, A ;
MICHEL, F .
CELL, 1987, 50 (01) :17-29
[10]   Group II Introns: Mobile Ribozymes that Invade DNA [J].
Lambowitz, Alan M. ;
Zimmerly, Steven .
COLD SPRING HARBOR PERSPECTIVES IN BIOLOGY, 2011, 3 (08) :1-19