RepeatModeler2 for automated genomic discovery of transposable element families

被引:2088
作者
Flynn, Jullien M. [1 ]
Hubley, Robert [2 ]
Goubert, Clement [1 ]
Rosen, Jeb [2 ]
Clark, Andrew G. [1 ]
Feschotte, Cedric [1 ]
Smit, Arian F. [2 ]
机构
[1] Cornell Univ, Dept Mol Biol & Genet, Ithaca, NY 14853 USA
[2] Inst Syst Biol, Seattle, WA 98109 USA
基金
加拿大自然科学与工程研究理事会;
关键词
genome annotation; mobile genetic elements; transposon families; DE-NOVO IDENTIFICATION; SEQUENCE; CLASSIFICATION; PROGRAM; PLANT;
D O I
10.1073/pnas.1921046117
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The accelerating pace of genome sequencing throughout the tree of life is driving the need for improved unsupervised annotation of genome components such as transposable elements (TEs). Because the types and sequences of TEs are highly variable across species, automated TE discovery and annotation are challenging and time-consuming tasks. A critical first step is the de novo identification and accurate compilation of sequence models representing all of the unique TE families dispersed in the genome. Here we introduce RepeatModeler2, a pipeline that greatly facilitates this process. This program brings substantial improvements over the original version of RepeatModeler, one of the most widely used tools for TE discovery. In particular, this version incorporates a module for structural discovery of complete long terminal repeat (LTR) retroelements, which are widespread in eukaryotic genomes but recalcitrant to automated identification because of their size and sequence complexity. We benchmarked RepeatModeler2 on three model species with diverse TE landscapes and high-quality, manually curated TE libraries: Drosophila melanogaster (fruit fly), Danio rerio (zebrafish), and Oryza sativa (rice). In these three species, RepeatModeler2 identified approximately 3 times more consensus sequences matching with >95% sequence identity and sequence coverage to the manually curated sequences than the original RepeatModeler. As expected, the greatest improvement is for LTR retroelements. Thus, RepeatModeler2 represents a valuable addition to the genome annotation toolkit that will enhance the identification and study of TEs in eukaryotic genome sequences.
引用
收藏
页码:9451 / 9457
页数:7
相关论文
共 53 条
[1]   Shifting the limits in wheat research and breeding using a fully annotated reference genome [J].
Appels, Rudi ;
Eversole, Kellye ;
Feuillet, Catherine ;
Keller, Beat ;
Rogers, Jane ;
Stein, Nils ;
Pozniak, Curtis J. ;
Choulet, Frederic ;
Distelfeld, Assaf ;
Poland, Jesse ;
Ronen, Gil ;
Sharpe, Andrew G. ;
Pozniak, Curtis ;
Barad, Omer ;
Baruch, Kobi ;
Keeble-Gagnere, Gabriel ;
Mascher, Martin ;
Ben-Zvi, Gil ;
Josselin, Ambre-Aurore ;
Himmelbach, Axel ;
Balfourier, Francois ;
Gutierrez-Gonzalez, Juan ;
Hayden, Matthew ;
Koh, ChuShin ;
Muehlbauer, Gary ;
Pasam, Raj K. ;
Paux, Etienne ;
Rigault, Philippe ;
Tibbits, Josquin ;
Tiwari, Vijay ;
Spannagl, Manuel ;
Lang, Daniel ;
Gundlach, Heidrun ;
Haberer, Georg ;
Mayer, Klaus F. X. ;
Ormanbekova, Danara ;
Prade, Verena ;
Simkova, Hana ;
Wicker, Thomas ;
Swarbreck, David ;
Rimbert, Helene ;
Felder, Marius ;
Guilhot, Nicolas ;
Kaithakottil, Gemy ;
Keilwagen, Jens ;
Leroy, Philippe ;
Lux, Thomas ;
Twardziok, Sven ;
Venturini, Luca ;
Juhasz, Angela .
SCIENCE, 2018, 361 (6403) :661-+
[2]   Using bioinformatic and phylogenetic approaches to classify transposable elements and understand their complex evolutionary histories [J].
Arkhipova, Irina R. .
MOBILE DNA, 2017, 8
[3]   Repbase Update, a database of repetitive elements in eukaryotic genomes [J].
Bao, Weidong ;
Kojima, Kenji K. ;
Kohany, Oleksiy .
MOBILE DNA, 2015, 6
[4]   Automated de novo identification of repeat sequence families in sequenced genomes [J].
Bao, ZR ;
Eddy, SR .
GENOME RESEARCH, 2002, 12 (08) :1269-1276
[5]   Ten things you should know about transposable elements [J].
Bourque, Guillaume ;
Burns, Kathleen H. ;
Gehring, Mary ;
Gorbunova, Vera ;
Seluanov, Andrei ;
Hammell, Molly ;
Imbeault, Michael ;
Izsvak, Zsuzsanna ;
Levin, Henry L. ;
Macfarlan, Todd S. ;
Mager, Dixie L. ;
Feschotte, Cedric .
GENOME BIOLOGY, 2018, 19
[6]   Realistic artificial DNA sequences as negative controls for computational genomics [J].
Caballero, Juan ;
Smit, Arian F. A. ;
Hood, Leroy ;
Glusman, Gustavo .
NUCLEIC ACIDS RESEARCH, 2014, 42 (12) :e99
[7]   Heterochromatin-Enriched Assemblies Reveal the Sequence and Organization of the Drosophila melanogaster Y Chromosome [J].
Chang, Ching-Ho ;
Larracuente, Amanda M. .
GENETICS, 2019, 211 (01) :333-348
[8]  
Colot V, 1999, BIOESSAYS, V21, P402, DOI 10.1002/(SICI)1521-1878(199905)21:5<402::AID-BIES7>3.0.CO
[9]  
2-B
[10]   LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons [J].
Ellinghaus, David ;
Kurtz, Stefan ;
Willhoeft, Ute .
BMC BIOINFORMATICS, 2008, 9 (1)