Leveraging Clustering Techniques to Facilitate Metagenomic Analysis

被引:1
|
作者
Ennis, Damien [1 ]
Dascalu, Sergiu [1 ]
Harris, Frederick C., Jr. [1 ]
机构
[1] Univ Nevada, Dept Comp Sci & Engn, Reno, NV 89557 USA
基金
美国国家科学基金会;
关键词
Metagenomics; Clustering; K-means; Machine learning; Self-organizing map; SEARCH;
D O I
10.1080/10798587.2015.1073887
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Machine learning clustering algorithms provide excellent methods for conducting metagenomic analysis with efficiency. This study uses two machine learning algorithms, the self-organizing map and the K-means algorithms, to cluster data from an environmental sample collected from a hot springs habitat and to provide a visual analysis of that data. A data processing pipeline is described that uses the clustering algorithms to identify which reference genomes should be included for further analysis in determining possible organisms that are present in a metagenomic sample. The clustering revealed probable candidates for additional analysis, including a thermophilic, anaerobic bacterium, which is likely to be found in a hot springs environment and serves to validate the functionality of these tools. The machine learning techniques discussed here can serve as a launching point for elucidating protein sequences that could serve as possible reference comparisons to a specific metagenomic sample and lead to further study.
引用
收藏
页码:153 / 165
页数:13
相关论文
共 50 条
  • [1] Ultrafast clustering algorithms for metagenomic sequence analysis
    Li, Weizhong
    Fu, Limin
    Niu, Beifang
    Wu, Sitao
    Wooley, John
    BRIEFINGS IN BIOINFORMATICS, 2012, 13 (06) : 656 - 668
  • [2] Rapid analysis of metagenomic data using signature-based clustering
    Chappell, Timothy
    Geva, Shlomo
    Hogan, James M.
    Huygens, Flavia
    Rathnayake, Irani U.
    Rudd, Stephen
    Kelly, Wayne
    Perrin, Dimitri
    BMC BIOINFORMATICS, 2018, 19
  • [3] Disambiguating a Soft Metagenomic Clustering
    Nihalani, Rahul
    Zola, Jaroslaw
    Aluru, Srinivas
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2025,
  • [4] Rapid analysis of metagenomic data using signature-based clustering
    Timothy Chappell
    Shlomo Geva
    James M. Hogan
    Flavia Huygens
    Irani U. Rathnayake
    Stephen Rudd
    Wayne Kelly
    Dimitri Perrin
    BMC Bioinformatics, 19
  • [5] A Systematic Comparative Analysis of Clustering Techniques
    Gupta, Satinder Bal
    Yadav, Rajkumar
    Gupta, Shivani
    APPLIED COMPUTER SYSTEMS, 2020, 25 (02) : 87 - 104
  • [6] Enhanced Cross-Validation Methods Leveraging Clustering Techniques
    Yucelbas, Cuneyt
    Yucelbas, Sule
    TRAITEMENT DU SIGNAL, 2023, 40 (06) : 2649 - 2660
  • [7] An accurate and exact clustering algorithm for next generation sequencing metagenomic sequences
    Bhat, Ashaq Hussain
    Nguyen, Tu N.
    Cengiz, Korhan
    Prabhu, Puniethaa
    MATHEMATICAL METHODS IN THE APPLIED SCIENCES, 2021,
  • [8] A Deep Embedded Clustering Algorithm for the Binning of Metagenomic Sequences
    Bao, Huynh Quang
    Vinh, Le Van
    Hoai, Tran Van
    IEEE ACCESS, 2022, 10 : 54348 - 54357
  • [9] Separating metagenomic short reads into genomes via clustering
    Tanaseichuk, Olga
    Borneman, James
    Jiang, Tao
    ALGORITHMS FOR MOLECULAR BIOLOGY, 2012, 7
  • [10] Clustering of Metagenomic Data by Combining Different Distance Functions
    Bonet, Isis
    Escobar, Adriana
    Mesa-Munera, Andrea
    Alzate, Juan Fernando
    ACTA POLYTECHNICA HUNGARICA, 2017, 14 (03) : 223 - 236