High-fidelity (repeat) consensus sequences from short reads using combined read clustering and assembly

被引:3
作者
Mann, Ludwig [1 ]
Balasch, Kristin [1 ]
Schmidt, Nicola [1 ]
Heitkam, Tony [1 ,2 ]
机构
[1] Tech Univ Dresden, Fac Biol, D-01069 Dresden, Germany
[2] Karl Franzens Univ Graz, Inst Biol, NAWI Graz, A-8010 Graz, Austria
关键词
Repetitive DNA; Transposable elements; Consensus sequences; Repeat assembly; Repeat clustering; eccDNA; Ribosomal DNA; rDNA; Non-model organisms; MALE-FERTILE; GENOME; DNA; TRANSCRIPTION; PLANTS;
D O I
10.1186/s12864-023-09948-4
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
BackgroundDespite the many cheap and fast ways to generate genomic data, good and exact genome assembly is still a problem, with especially the repeats being vastly underrepresented and often misassembled. As short reads in low coverage are already sufficient to represent the repeat landscape of any given genome, many read cluster algorithms were brought forward that provide repeat identification and classification. But how can trustworthy, reliable and representative repeat consensuses be derived from unassembled genomes?ResultsHere, we combine methods from repeat identification and genome assembly to derive these robust consensuses. We test several use cases, such as (1) consensus building from clustered short reads of non-model genomes, (2) from genome-wide amplification setups, and (3) specific repeat-centred questions, such as the linked vs. unlinked arrangement of ribosomal genes. In all our use cases, the derived consensuses are robust and representative. To evaluate overall performance, we compare our high-fidelity repeat consensuses to RepeatExplorer2-derived contigs and check, if they represent real transposable elements as found in long reads. Our results demonstrate that it is possible to generate useful, reliable and trustworthy consensuses from short reads by a combination from read cluster and genome assembly methods in an automatable way.ConclusionWe anticipate that our workflow opens the way towards more efficient and less manual repeat characterization and annotation, benefitting all genome studies, but especially those of non-model organisms.
引用
收藏
页数:11
相关论文
共 49 条
  • [1] SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing
    Bankevich, Anton
    Nurk, Sergey
    Antipov, Dmitry
    Gurevich, Alexey A.
    Dvorkin, Mikhail
    Kulikov, Alexander S.
    Lesin, Valery M.
    Nikolenko, Sergey I.
    Son Pham
    Prjibelski, Andrey D.
    Pyshkin, Alexey V.
    Sirotkin, Alexander V.
    Vyahhi, Nikolay
    Tesler, Glenn
    Alekseyev, Max A.
    Pevzner, Pavel A.
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2012, 19 (05) : 455 - 477
  • [2] Ten things you should know about transposable elements
    Bourque, Guillaume
    Burns, Kathleen H.
    Gehring, Mary
    Gorbunova, Vera
    Seluanov, Andrei
    Hammell, Molly
    Imbeault, Michael
    Izsvak, Zsuzsanna
    Levin, Henry L.
    Macfarlan, Todd S.
    Mager, Dixie L.
    Feschotte, Cedric
    [J]. GENOME BIOLOGY, 2018, 19
  • [3] BLAST plus : architecture and applications
    Camacho, Christiam
    Coulouris, George
    Avagyan, Vahram
    Ma, Ning
    Papadopoulos, Jason
    Bealer, Kevin
    Madden, Thomas L.
    [J]. BMC BIOINFORMATICS, 2009, 10
  • [4] Rapid amplification of plasmid and phage DNA using phi29 DNA polymerase and multiply-primed rolling circle amplification
    Dean, FB
    Nelson, JR
    Giesler, TL
    Lasken, RS
    [J]. GENOME RESEARCH, 2001, 11 (06) : 1095 - 1099
  • [5] Genome skimming for next-generation biodiversity analysis
    Dodsworth, Steven
    [J]. TRENDS IN PLANT SCIENCE, 2015, 20 (09) : 525 - 527
  • [6] The genome of the recently domesticated crop plant sugar beet (Beta vulgaris)
    Dohm, Juliane C.
    Minoche, Andre E.
    Holtgraewe, Daniela
    Capella-Gutierrez, Salvador
    Zakrzewski, Falk
    Tafer, Hakim
    Rupp, Oliver
    Soerensen, ThomasRosleff
    Stracke, Ralf
    Reinhardt, Richard
    Goesmann, Alexander
    Kraft, Thomas
    Schulz, Britta
    Stadler, Peter F.
    Schmidt, Thomas
    Gabaldon, Toni
    Lehrach, Hans
    Weisshaar, Bernd
    Himmelbauer, Heinz
    [J]. NATURE, 2014, 505 (7484) : 546 - +
  • [7] Considering Transposable Element Diversification in De Novo Annotation Approaches
    Flutre, Timothee
    Duprat, Elodie
    Feuillet, Catherine
    Quesneville, Hadi
    [J]. PLOS ONE, 2011, 6 (01):
  • [8] Garcia Sonia, 2023, Methods Mol Biol, V2672, P501, DOI 10.1007/978-1-0716-3226-0_30
  • [9] Cytogenetic features of rRNA genes across land plants: analysis of the Plant rDNA database
    Garcia, Sonia
    Kovarik, Ales
    Leitch, Andrew R.
    Garnatje, Teresa
    [J]. PLANT JOURNAL, 2017, 89 (05) : 1020 - 1030
  • [10] Repeated reunions and splits feature the highly dynamic evolution of 5S and 35S ribosomal RNA genes (rDNA) in the Asteraceae family
    Garcia, Sonia
    Panero, Jose L.
    Siroky, Jiri
    Kovarik, Ales
    [J]. BMC PLANT BIOLOGY, 2010, 10