The Dfam database of repetitive DNA families

被引:441
作者
Hubley, Robert [1 ]
Finn, Robert D. [2 ]
Clements, Jody [3 ]
Eddy, Sean R. [4 ]
Jones, Thomas A. [4 ]
Bao, Weidong [5 ]
Smit, Arian F. A. [1 ]
Wheelers, Travis J. [6 ]
机构
[1] Inst Syst Biol, Seattle, WA 98109 USA
[2] European Bioinformat Inst EMBL EBI, European Mol Biol Lab, Wellcome Trust Genome Campus, Cambridge CB10 1RQ, England
[3] HHMI Janelia Res Campus, Ashburn, VA 20147 USA
[4] Harvard Univ, Howard Hughes Med Inst, Cambridge, MA 02138 USA
[5] Genet Informat Res Inst, Los Altos, CA 94022 USA
[6] Univ Montana, Missoula, MT 59812 USA
基金
美国国家卫生研究院;
关键词
DE-NOVO IDENTIFICATION; INTERSPERSED REPEATS; ELEMENTS; ORGANIZATION; MATRICES; REPBASE; SEARCH; MOUSE; SINES;
D O I
10.1093/nar/gkv1272
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Repetitive DNA, especially that due to transposable elements (TEs), makes up a large fraction of many genomes. Dfam is an open access database of families of repetitive DNA elements, in which each family is represented by a multiple sequence alignment and a profile hidden Markov model (HMM). The initial release of Dfam, featured in the 2013 NAR Database Issue, contained 1143 families of repetitive elements found in humans, and was used to produce more than 100 Mb of additional annotation of TE-derived regions in the human genome, with improved speed. Here, we describe recent advances, most notably expansion to 4150 total families including a comprehensive set of known repeat families from four new organisms (mouse, zebrafish, fly and nematode). We describe improvements to coverage, and to our methods for identifying and reducing false annotation. We also describe updates to the website interface. The Dfam website has moved to http://dfam.org. Seed alignments, profile HMMs, hit lists and other underlying data are available for download.
引用
收藏
页码:D81 / D89
页数:9
相关论文
共 30 条
  • [11] Type material in the NCBI Taxonomy Database
    Federhen, Scott
    [J]. NUCLEIC ACIDS RESEARCH, 2015, 43 (D1) : D1086 - D1098
  • [12] Considering Transposable Element Diversification in De Novo Annotation Approaches
    Flutre, Timothee
    Duprat, Elodie
    Feuillet, Catherine
    Quesneville, Hadi
    [J]. PLOS ONE, 2011, 6 (01):
  • [13] The whole alignment and nothing but the alignment: the problem of spurious alignment flanks
    Frith, Martin C.
    Park, Yonil
    Sheetlin, Sergey L.
    Spouge, John L.
    [J]. NUCLEIC ACIDS RESEARCH, 2008, 36 (18) : 5863 - 5871
  • [14] CORE-SINEs: Eukaryotic short interspersed retroposing elements with common sequence motifs
    Gilbert, N
    Labuda, D
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (06) : 2869 - 2874
  • [15] Homologous over-extension: a challenge for iterative similarity searches
    Gonzalez, Mileidy W.
    Pearson, William R.
    [J]. NUCLEIC ACIDS RESEARCH, 2010, 38 (07) : 2177 - 2189
  • [16] PROFILE ANALYSIS - DETECTION OF DISTANTLY RELATED PROTEINS
    GRIBSKOV, M
    MCLACHLAN, AD
    EISENBERG, D
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1987, 84 (13) : 4355 - 4358
  • [17] HAL: a hierarchical format for storing and analyzing multiple genome alignments
    Hickey, Glenn
    Paten, Benedict
    Earl, Dent
    Zerbino, Daniel
    Haussler, David
    [J]. BIOINFORMATICS, 2013, 29 (10) : 1341 - 1342
  • [18] Mammalian-wide interspersed repeat (MIR)-derived enhancers and the regulation of human gene expression
    Jjingo, Daudi
    Conley, Andrew B.
    Wang, Jianrong
    Marino-Ramirez, Leonardo
    Lunyak, Victoria V.
    Jordan, I. King
    [J]. MOBILE DNA, 2014, 5
  • [19] Hidden Markov models for detecting remote protein homologies
    Karplus, K
    Barrett, C
    Hughey, R
    [J]. BIOINFORMATICS, 1998, 14 (10) : 846 - 856
  • [20] Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and Censor
    Kohany, Oleksiy
    Gentles, Andrew J.
    Hankus, Lukasz
    Jurka, Jerzy
    [J]. BMC BIOINFORMATICS, 2006, 7 (1)