Augmenting bacterial similarity measures using a graph-based genome representation

被引:0
|
作者
Ramanan, Vivek [1 ,2 ]
Sarkar, Indra Neil [1 ,2 ,3 ]
机构
[1] Brown Univ, Ctr Computat Mol Biol, Providence, RI 02912 USA
[2] Brown Univ, Ctr Biomed Informat, Providence, RI 02912 USA
[3] Rhode Isl Qual Inst, Providence, RI 02908 USA
关键词
synteny; genome analysis; microbiome; HELICOBACTER-PYLORI; CANCER; COLONIZATION; MICROBIOTA; INFECTION; DISEASE; RISK;
D O I
10.1128/msystems.00497-24
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
Relationships between bacterial taxa are traditionally defined using 16S rRNA nucleotide similarity or average nucleotide identity. Improvements in sequencing technology provide additional pairwise information on genome sequences, which may provide valuable information on genomic relationships. Mapping orthologous gene locations between genome pairs, known as synteny, is typically implemented in the discovery of new species and has not been systematically applied to bacterial genomes. Using a data set of 378 bacterial genomes, we developed and tested a new measure of synteny similarity between a pair of genomes, which was scaled onto 16S rRNA distance using covariance matrices. Based on the input gene functions used (i.e., core, antibiotic resistance, and virulence), we observed varying topological arrangements of bacterial relationship networks by applying (i) complete linkage hierarchical clustering and (ii) K-nearest neighbor graph structures to synteny-scaled 16S data. Our metric improved clustering quality comparatively to state-of-the-art average nucleotide identity metrics while preserving clustering assignments for the highest similarity relationships. Our findings indicate that syntenic relationships provide more granular and interpretable relationships for within-genera taxa compared to pairwise similarity measures, particularly in functional contexts.IMPORTANCEGiven the prevalence and necessity of the 16S rRNA measure in bacterial identification and analysis, this additional analysis adds a functional and synteny-based layer to the identification of relatives and clustering of bacteria genomes. It is also of computational interest to model the bacterial genome as a graph structure, which presents new avenues of genomic analysis for bacteria and their closely related strains and species. Given the prevalence and necessity of the 16S rRNA measure in bacterial identification and analysis, this additional analysis adds a functional and synteny-based layer to the identification of relatives and clustering of bacteria genomes. It is also of computational interest to model the bacterial genome as a graph structure, which presents new avenues of genomic analysis for bacteria and their closely related strains and species.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] Graph-based Text Representation and Knowledge Discovery
    Jin, Wei
    Srihari, Rohini K.
    APPLIED COMPUTING 2007, VOL 1 AND 2, 2007, : 807 - 811
  • [42] Graph-based Arabic text semantic representation
    Etaiwi, Wael
    Awajan, Arafat
    INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (03)
  • [43] Improving Graph-Based Image Segmentation Using Nonlinear Color Similarity Metrics
    Carvalho, L. E.
    Neto, S. L. Mantelli
    Sobieranski, A. C.
    Comunello, E.
    von Wangenheim, A.
    INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2015, 15 (04)
  • [44] Hyperspectral Image Classification Using Deep Genome Graph-Based Approach
    Tinega, Haron
    Chen, Enqing
    Ma, Long
    Mariita, Richard M.
    Nyasaka, Divinah
    SENSORS, 2021, 21 (19)
  • [45] A GRAPH-BASED SEMANTIC SIMILARITY MEASURE FOR THE GENE ONTOLOGY
    Alvarez, Marco A.
    Yan, Changhui
    JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2011, 9 (06) : 681 - 695
  • [46] An Orthographic Similarity Measure for Graph-Based Text Representations
    Deforche, Maxime
    De Vos, Ilse
    Bronselaer, Antoon
    De Tre, Guy
    FLEXIBLE QUERY ANSWERING SYSTEMS, FQAS 2023, 2023, 14113 : 206 - 218
  • [47] Graph-based Similarity for Document Retrieval in the Biomedical Domain
    Zuluaga, Adelaida A.
    Rosso, Andres A.
    PROCEEDINGS OF 2022 7TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING TECHNOLOGIES, ICMLT 2022, 2022, : 180 - 184
  • [48] Graph-based profile similarity calculation method and evaluation
    Naderi, Hassan
    Rumpler, Beatrice
    ADVANCES IN INFORMATION RETRIEVAL, 2008, 4956 : 637 - 641
  • [49] Learning Similarity Functions in Graph-Based Document Summarization
    Ouyang, You
    Li, Wenjie
    Wei, Furu
    Lu, Qin
    COMPUTER PROCESSING OF ORIENTAL LANGUAGES: LANGUAGE TECHNOLOGY FOR THE KNOWLEDGE-BASED ECONOMY, 2009, 5459 : 189 - 200
  • [50] A Keyphrase Graph-Based Method for Document Similarity Measurement
    Huynh, ThanhThuong T.
    TruongAn PhamNguyen
    Do, Nhon, V
    ENGINEERING LETTERS, 2022, 30 (02) : 692 - 710