MetaCC allows scalable and integrative analyses of both long-read and short-read metagenomic Hi-C data

被引:6
|
作者
Du, Yuxuan [1 ]
Sun, Fengzhu [1 ]
机构
[1] Univ Southern Calif, Dept Quantitat & Computat Biol, Los Angeles, CA 90007 USA
关键词
SP NOV; GENOME; ALGORITHM;
D O I
10.1038/s41467-023-41209-6
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Metagenomic Hi-C (metaHi-C) can identify contig-to-contig relationships with respect to their proximity within the same physical cell. Shotgun libraries in metaHi-C experiments can be constructed by next-generation sequencing (short-read metaHi-C) or more recent third-generation sequencing (long-read metaHi-C). However, all existing metaHi-C analysis methods are developed and benchmarked on short-read metaHi-C datasets and there exists much room for improvement in terms of more scalable and stable analyses, especially for long-read metaHi-C data. Here we report MetaCC, an efficient and integrative framework for analyzing both short-read and long-read metaHi-C datasets. MetaCC outperforms existing methods on normalization and binning. In particular, the MetaCC normalization module, named NormCC, is more than 3000 times faster than the current state-of-the-art method HiCzin on a complex wastewater dataset. When applied to one sheep gut long-read metaHi-C dataset, MetaCC binning module can retrieve 709 high-quality genomes with the largest species diversity using one single sample, including an expansion of five uncultured members from the order Erysipelotrichales, and is the only binner that can recover the genome of one important species Bacteroides vulgatus. Further plasmid analyses reveal that MetaCC binning is able to capture multi-copy plasmids. The authors develop an integrative and scalable framework to eliminate systematic biases and retrieve high-quality metagenome-assembled genomes using either long-read or short-read metagenomic Hi-C data.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Germline chromothripsis: Integration of Hi-C and long-read sequencing reveals the structure of highly rear-ranged chromosomes
    Schopflin, R.
    Melo, U. Souto
    Heller, D.
    Jungnitsch, J.
    Klever, M.
    Holtgrewe, M.
    Comak, E.
    Heinrich, V.
    Herztberg, J.
    Acuna-Hidalgo, R.
    Turkmen, S.
    Bugge, M.
    Vogel, I.
    Beensen, V.
    Barbi, G.
    Prager, B.
    Latos-Bielenska, A.
    Tommerup, N.
    Kalscheuer, V. M.
    Spielmann, M.
    Vingron, M.
    Mundlos, S.
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2020, 28 (SUPPL 1) : 566 - 567
  • [42] Comparison of short-read and long-read next-generation sequencing technologies for determining HIV-1 drug resistance
    Vellas, Camille
    Doudou, Amira
    Mohamed, Sofiane
    Raymond, Stephanie
    Jeanne, Nicolas
    Latour, Justine
    Demmou, Sofia
    Ranger, Noemie
    Gonzalez, Dimitri
    Delobel, Pierre
    Izopet, Jacques
    JOURNAL OF MEDICAL VIROLOGY, 2024, 96 (10)
  • [43] Pacbio HiFi long-read genomes offer better exomes by unlocking retinal disease variants missed by short-read sequencing
    Karakaya, Kadin
    Kroell-Hermi, Ariane
    Hiersche, Milan
    Decker, Christian
    Liakopoulos, Sandra
    Preising, Markus
    Rohrschneider, Klaus
    Betz, Christian
    Bolz, Hanno
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2024, 32 : 1328 - 1329
  • [44] PSI-Sigma: a comprehensive splicing-detection method for short-read and long-read RNA-seq analysis
    Lin, Kuan-Ting
    Krainer, Adrian R.
    BIOINFORMATICS, 2019, 35 (23) : 5048 - 5054
  • [45] Identification of cell type specific transcript isoforms by integration of bulk short-read, long-read and single cell RNA-seq
    Yamamoto, Ryo
    Zaitlen, Noah
    Xiao, Xinshu
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2024, 32 : 1641 - 1641
  • [46] Mucosal Microbiome in Patients with Early Bowel Polyps: Inferences from Short-Read and Long-Read 16S rRNA Sequencing
    Welham, Zoe
    Li, Jun
    Engel, Alexander F.
    Molloy, Mark P.
    CANCERS, 2023, 15 (20)
  • [47] Illumina short-read and MinION long-read WGS to characterize the molecular epidemiology of an NDM-1 Serratia marcescens outbreak in Romania
    Phan, H. T. T.
    Stoesser, N.
    Maciuca, I. E.
    Toma, F.
    Szekely, E.
    Flonta, M.
    Hubbard, A. T. M.
    Pankhurst, L.
    Do, T.
    Peto, T. E. A.
    Walker, A. S.
    Crook, D. W.
    Timofte, D.
    JOURNAL OF ANTIMICROBIAL CHEMOTHERAPY, 2018, 73 (03) : 672 - 679
  • [48] Combination of long-read and short-read sequencing provides comprehensive transcriptome and new insight for Chrysanthemum morifolium ray-floret colorization
    Kishi-Kaboshi, Mitsuko
    Tanaka, Tsuyoshi
    Sasaki, Katsutomo
    Noda, Naonobu
    Aida, Ryutaro
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [49] UNRAVELING THE ROLE OF DE NOVO STRUCTURAL VARIANTS IN SCHIZOPHRENIA THROUGH COMPREHENSIVE WHOLE GENOME SEQUENCING WITH LONG-READ AND SHORT-READ TECHNOLOGIES
    Zhang, Yamin
    Li, Tong
    Yang, Shaozhong
    Xie, Zhi
    Li, Tao
    EUROPEAN NEUROPSYCHOPHARMACOLOGY, 2024, 87 : 13 - 14
  • [50] Long-read and short-read RNA-seq reveal the transcriptional regulation characteristics of PICK1 in Baoshan pig testis
    Zhang, Xia
    Huo, Hailong
    Fu, Guowen
    Li, Changyao
    Lin, Wan
    Dai, Hongmei
    Xi, Xuemin
    Zhai, Lan
    Yuan, Qingting
    Zhao, Guiying
    Huo, Jinlong
    ANIMAL REPRODUCTION, 2024, 21 (04)