MetaCC allows scalable and integrative analyses of both long-read and short-read metagenomic Hi-C data

被引:6
|
作者
Du, Yuxuan [1 ]
Sun, Fengzhu [1 ]
机构
[1] Univ Southern Calif, Dept Quantitat & Computat Biol, Los Angeles, CA 90007 USA
关键词
SP NOV; GENOME; ALGORITHM;
D O I
10.1038/s41467-023-41209-6
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Metagenomic Hi-C (metaHi-C) can identify contig-to-contig relationships with respect to their proximity within the same physical cell. Shotgun libraries in metaHi-C experiments can be constructed by next-generation sequencing (short-read metaHi-C) or more recent third-generation sequencing (long-read metaHi-C). However, all existing metaHi-C analysis methods are developed and benchmarked on short-read metaHi-C datasets and there exists much room for improvement in terms of more scalable and stable analyses, especially for long-read metaHi-C data. Here we report MetaCC, an efficient and integrative framework for analyzing both short-read and long-read metaHi-C datasets. MetaCC outperforms existing methods on normalization and binning. In particular, the MetaCC normalization module, named NormCC, is more than 3000 times faster than the current state-of-the-art method HiCzin on a complex wastewater dataset. When applied to one sheep gut long-read metaHi-C dataset, MetaCC binning module can retrieve 709 high-quality genomes with the largest species diversity using one single sample, including an expansion of five uncultured members from the order Erysipelotrichales, and is the only binner that can recover the genome of one important species Bacteroides vulgatus. Further plasmid analyses reveal that MetaCC binning is able to capture multi-copy plasmids. The authors develop an integrative and scalable framework to eliminate systematic biases and retrieve high-quality metagenome-assembled genomes using either long-read or short-read metagenomic Hi-C data.
引用
收藏
页数:12
相关论文
共 50 条
  • [11] Long live the king: chromosome-level assembly of the lion (Panthera leo) using linked-read, Hi-C, and long-read data
    Armstrong, Ellie E.
    Taylor, Ryan W.
    Miller, Danny E.
    Kaelin, Christopher B.
    Barsh, Gregory S.
    Hadly, Elizabeth A.
    Petrov, Dmitri
    BMC BIOLOGY, 2020, 18 (01)
  • [12] Long live the king: chromosome-level assembly of the lion (Panthera leo) using linked-read, Hi-C, and long-read data
    Ellie E. Armstrong
    Ryan W. Taylor
    Danny E. Miller
    Christopher B. Kaelin
    Gregory S. Barsh
    Elizabeth A. Hadly
    Dmitri Petrov
    BMC Biology, 18
  • [13] Filling the gap of short-read next generation sequencing in PGD by long-read approach
    Ho, D. N. Y.
    Au, C. H.
    Lau, J.
    Wong, E. Y. L.
    Rocha, K. A.
    Xue, L.
    Shum, T. W.
    Law, Y. C.
    Ng, Y. Y.
    Lok, I. H.
    Tang, O. S.
    Lam, S. T. S.
    Chan, T. L.
    Ma, E. S. K.
    HUMAN REPRODUCTION, 2018, 33 : 419 - 420
  • [14] Characterization of Fecal Microbiota with Clinical Specimen Using Long-Read and Short-Read Sequencing Platform
    Wei, Po-Li
    Hung, Ching-Sheng
    Kao, Yi-Wei
    Lin, Ying-Chin
    Lee, Cheng-Yang
    Chang, Tzu-Hao
    Shia, Ben-Chang
    Lin, Jung-Chun
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2020, 21 (19) : 1 - 12
  • [15] A comparison of short-read, HiFi long-read, and hybrid strategies for genome-resolved metagenomics
    Eisenhofer, Raphael
    Nesme, Joseph
    Santos-Bay, Luisa
    Koziol, Adam
    Sorensen, Soren Johannes
    Alberdi, Antton
    Aizpurua, Ostaizka
    MICROBIOLOGY SPECTRUM, 2024, 12 (04)
  • [16] Will long-read sequencing technologies replace short-read sequencing technologies in the next 10 years?
    Adewale, Boluwatife A.
    AFRICAN JOURNAL OF LABORATORY MEDICINE, 2020, 9 (01)
  • [17] VILOCA: sequencing quality-aware viral haplotype reconstruction and mutation calling for short-read and long-read data
    Fuhrmann, Lara
    Langer, Benjamin
    Topolsky, Ivan
    Beerenwinkel, Niko
    NAR GENOMICS AND BIOINFORMATICS, 2024, 6 (04)
  • [18] Assessment of read depth requirements for gene and isoform discovery: a comparative study of long-read and short-read RNA sequencing data in human heart
    Gonzaludo, Nina
    Bruand, Jocelyne
    Klegarth, Amy
    Underwood, Jason
    Tseng, Elizabeth
    Aldinger, Kimberly A.
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2024, 32 : 1778 - 1779
  • [19] Assessment of read depth requirements for gene and isoform discovery: a comparative study of long-read and short-read RNA sequencing data in human heart
    Gonzaludo, Nina
    Bruand, Jocelyne
    Klegarth, Amy
    Underwood, Jason
    Tseng, Elizabeth
    Aldinger, Kimberly A.
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2024, 32 : 1778 - 1779
  • [20] BugSeq: a highly accurate cloud platform for long-read metagenomic analyses
    Jeremy Fan
    Steven Huang
    Samuel D. Chorlton
    BMC Bioinformatics, 22