CD-MAWS: An Alignment-Free Phylogeny Estimation Method Using Cosine Distance on Minimal Absent Word Sets

被引:6
|
作者
Anjum, Naser [1 ]
Nabil, Raian Latif [1 ]
Rafi, Rakibul Islam [1 ]
Bayzid, Md. Shamsuzzoha [1 ]
Rahman, M. Saifur [1 ]
机构
[1] Bangladesh Univ Engn & Technol, Dept Comp Sci & Engn, Dhaka 1000, Bangladesh
关键词
Alignment-free methods; absent word; phylogeny; AFProject; EVOLUTIONARY DISTANCES; GENOME; SEQUENCES; ACCURATE; TREES;
D O I
10.1109/TCBB.2021.3136792
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Multiple sequence alignment has been the traditional and well established approach of sequence analysis and comparison, though it is time and memory consuming. As the scale of sequencing data is increasing day by day, the importance of faster yet accurate alignment-free methods is on the rise. Several alignment-free sequence analysis methods have been established in the literature in recent years, which extract numerical features from genomic data to analyze sequences and also to estimate phylogenetic relationship among genes and species. Minimal AbsentWord (MAW) is an effective concept for representing characteristics of a sequence in an alignment-free manner. In this study, we present CD-MAWS, a distance measure based on cosine of the angle between composition vectors constructed using minimal absent words, for sequence analysis in a computationally inexpensive manner. We have benchmarked CD-MAWS using several AFProject datasets, such as Fish mtDNA, E.coli, Plants, Shigella and Yersinia datasets, and found it to perform quite well. Applied on several other biological datasets such as mammal mtDNA, bacterial genomes and viral genomes, CD-MAWS resolved phylogenetic relationships similar to or better than state-of-the-art alignment-free methods such as Mash, Skmer, Co-phylog and kSNP3.
引用
收藏
页码:196 / 205
页数:10
相关论文
共 2 条
  • [1] An alignment-free method for phylogeny estimation using maximum likelihood
    Zahin, Tasfia
    Abrar, Md. Hasin
    Jewel, Mizanur Rahman
    Tasnim, Tahrina
    Bayzid, Md. Shamsuzzoha
    Rahman, Atif
    BMC BIOINFORMATICS, 2025, 26 (01):
  • [2] KINN: An alignment-free accurate phylogeny reconstruction method based on inner distance distributions of k-mer pairs in biological sequences
    Tang, Runbin
    Yu, Zuguo
    Li, Jinyan
    MOLECULAR PHYLOGENETICS AND EVOLUTION, 2023, 179