MetaCC allows scalable and integrative analyses of both long-read and short-read metagenomic Hi-C data

被引:6
|
作者
Du, Yuxuan [1 ]
Sun, Fengzhu [1 ]
机构
[1] Univ Southern Calif, Dept Quantitat & Computat Biol, Los Angeles, CA 90007 USA
关键词
SP NOV; GENOME; ALGORITHM;
D O I
10.1038/s41467-023-41209-6
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Metagenomic Hi-C (metaHi-C) can identify contig-to-contig relationships with respect to their proximity within the same physical cell. Shotgun libraries in metaHi-C experiments can be constructed by next-generation sequencing (short-read metaHi-C) or more recent third-generation sequencing (long-read metaHi-C). However, all existing metaHi-C analysis methods are developed and benchmarked on short-read metaHi-C datasets and there exists much room for improvement in terms of more scalable and stable analyses, especially for long-read metaHi-C data. Here we report MetaCC, an efficient and integrative framework for analyzing both short-read and long-read metaHi-C datasets. MetaCC outperforms existing methods on normalization and binning. In particular, the MetaCC normalization module, named NormCC, is more than 3000 times faster than the current state-of-the-art method HiCzin on a complex wastewater dataset. When applied to one sheep gut long-read metaHi-C dataset, MetaCC binning module can retrieve 709 high-quality genomes with the largest species diversity using one single sample, including an expansion of five uncultured members from the order Erysipelotrichales, and is the only binner that can recover the genome of one important species Bacteroides vulgatus. Further plasmid analyses reveal that MetaCC binning is able to capture multi-copy plasmids. The authors develop an integrative and scalable framework to eliminate systematic biases and retrieve high-quality metagenome-assembled genomes using either long-read or short-read metagenomic Hi-C data.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] MetaCC allows scalable and integrative analyses of both long-read and short-read metagenomic Hi-C data
    Yuxuan Du
    Fengzhu Sun
    Nature Communications, 14
  • [2] Startups use short-read data to expand long-read sequencing market
    Eisenstein, Michael
    NATURE BIOTECHNOLOGY, 2015, 33 (05) : 433 - 435
  • [3] Startups use short-read data to expand long-read sequencing market
    Michael Eisenstein
    Nature Biotechnology, 2015, 33 : 433 - 435
  • [4] Polypolish: Short-read polishing of long-read bacterial genome assemblies
    Wick, Ryan R.
    Holt, Kathryn E.
    PLOS COMPUTATIONAL BIOLOGY, 2022, 18 (01)
  • [5] PolyAtailor: measuring poly(A) tail length from short-read and long-read sequencing data
    Liu, Mengfei
    Hao, Linlin
    Yang, Sien
    Wu, Xiaohui
    BRIEFINGS IN BIOINFORMATICS, 2022, 23 (04)
  • [6] The long and the short of it: unlocking nanopore long-read RNA sequencing data with short-read differential expression analysis tools
    Dong, Xueyi
    Tian, Luyi
    Gouil, Quentin
    Kariyawasam, Hasaru
    Su, Shian
    De Paoli-Iseppi, Ricardo
    Prawer, Yair David Joseph
    Clark, Michael B.
    Breslin, Kelsey
    Iminitoff, Megan
    Blewitt, Marnie E.
    Law, Charity W.
    Ritchie, Matthew E.
    NAR GENOMICS AND BIOINFORMATICS, 2021, 3 (02)
  • [7] Integration of Hi-C with short and long-read genome sequencing reveals the structure of germline rearranged genomes
    Robert Schöpflin
    Uirá Souto Melo
    Hossein Moeinzadeh
    David Heller
    Verena Laupert
    Jakob Hertzberg
    Manuel Holtgrewe
    Nico Alavi
    Marius-Konstantin Klever
    Julius Jungnitsch
    Emel Comak
    Seval Türkmen
    Denise Horn
    Yannis Duffourd
    Laurence Faivre
    Patrick Callier
    Damien Sanlaville
    Orsetta Zuffardi
    Romano Tenconi
    Nehir Edibe Kurtas
    Sabrina Giglio
    Bettina Prager
    Anna Latos-Bielenska
    Ida Vogel
    Merete Bugge
    Niels Tommerup
    Malte Spielmann
    Antonio Vitobello
    Vera M. Kalscheuer
    Martin Vingron
    Stefan Mundlos
    Nature Communications, 13
  • [8] Integration of Hi-C with short and long-read genome sequencing reveals the structure of germline rearranged genomes
    Schoepflin, Robert
    Melo, Uira Souto
    Moeinzadeh, Hossein
    Heller, David
    Laupert, Verena
    Hertzberg, Jakob
    Holtgrewe, Manuel
    Alavi, Nico
    Klever, Marius-Konstantin
    Jungnitsch, Julius
    Comak, Emel
    Tuerkmen, Seval
    Horn, Denise
    Duffourd, Yannis
    Faivre, Laurence
    Callier, Patrick
    Sanlaville, Damien
    Zuffardi, Orsetta
    Tenconi, Romano
    Kurtas, Nehir Edibe
    Giglio, Sabrina
    Prager, Bettina
    Latos-Bielenska, Anna
    Vogel, Ida
    Bugge, Merete
    Tommerup, Niels
    Spielmann, Malte
    Vitobello, Antonio
    Kalscheuer, Vera M.
    Vingron, Martin
    Mundlos, Stefan
    NATURE COMMUNICATIONS, 2022, 13 (01)
  • [9] HAT: de novo variant calling for highly accurate short-read and long-read sequencing data
    Ng, Jeffrey K.
    Turner, Tychele N.
    BIOINFORMATICS, 2024, 40 (01)
  • [10] Accurate assembly of the olive baboon (Papio anubis) genome using long-read and Hi-C data
    Batra, Sanjit Singh
    Levy-Sakin, Michal
    Robinson, Jacqueline
    Guillory, Joseph
    Durinck, Steffen
    Vilgalys, Tauras P.
    Kwok, Pui-Yan
    Cox, Laura A.
    Seshagiri, Somasekar
    Song, Yun S.
    Wall, Jeffrey D.
    GIGASCIENCE, 2020, 9 (12):