Alignment-free distance measure based on return time distribution for sequence analysis: Applications to clustering, molecular phylogeny and subtyping

被引:42
|
作者
Kolekar, Pandurang [1 ]
Kale, Mohan [2 ]
Kulkarni-Kale, Urmila [1 ]
机构
[1] Univ Pune, Bioinformat Ctr, Pune 411007, Maharashtra, India
[2] Univ Pune, Dept Stat, Pune 411007, Maharashtra, India
关键词
Return time distribution; Alignment-free method; Molecular phylogeny; Dengue subtyping; Sequence analysis; Bioinformatics; DENGUE VIRUS TYPE-1; NATURAL-POPULATIONS; PROTEIN SEQUENCES; DNA-SEQUENCES; INFERENCE; EVOLUTION; RECOMBINATION; CONSTRUCTION; FLAVIVIRIDAE; SENSITIVITY;
D O I
10.1016/j.ympev.2012.07.003
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The data deluge in post-genomic era demands development of novel data mining tools. Existing molecular phylogeny analyses (MPAs) developed for individual gene/protein sequences are alignment-based. However, the size of genomic data and uncertainties associated with alignments, necessitate development of alignment-free methods for MPA. Derivation of distances between sequences is an important step in both, alignment-dependant and alignment-free methods. Various alignment-free distance measures based on oligo-nucleotide frequencies, information content, compression techniques, etc. have been proposed. However, these distance measures do not account for relative order of components viz. nucleotides or amino acids. A new distance measure, based on the concept of 'return time distribution' (RTD) of k-mers is proposed, which accounts for the sequence composition and their relative orders. Statistical parameters of RTDs are used to derive a distance function. The resultant distance matrix is used for clustering and phylogeny using Neighbor-joining. Its performance for MPA and subtyping was evaluated using simulated data generated by block-bootstrap, receiver operating characteristics and leave-one-out cross validation methods. The proposed method was successfully applied for MPA of family Flaviviridae and subtyping of Dengue viruses. It is observed that method retains resolution for classification and subtyping of viruses at varying levels of sequence similarity and taxonomic hierarchy. (C) 2012 Elsevier Inc. All rights reserved.
引用
收藏
页码:510 / 522
页数:13
相关论文
共 4 条
  • [1] WNV Typer: A server for genotyping of West Nile viruses using an alignment-free method based on a return time distribution
    Kolekar, Pandurang
    Hake, Nilesh
    Kale, Mohan
    Kulkarni-Kale, Urmila
    JOURNAL OF VIROLOGICAL METHODS, 2014, 198 : 41 - 55
  • [2] An alignment-free measure based on physicochemical properties of amino acids for protein sequence comparison
    Zhao, Yunxiu
    Xue, Xiaolong
    Xie, Xiaoli
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2019, 80 : 10 - 15
  • [3] Comparative analysis of alignment-free genome clustering and whole genome alignment-based phylogenomic relationship of coronaviruses
    Kirichenko, Anastasiya D.
    Poroshina, Anastasiya A.
    Sherbakov, Dmitry Yu
    Sadovsky, Michael G.
    Krutovsky, Konstantin, V
    PLOS ONE, 2022, 17 (03):
  • [4] KINN: An alignment-free accurate phylogeny reconstruction method based on inner distance distributions of k-mer pairs in biological sequences
    Tang, Runbin
    Yu, Zuguo
    Li, Jinyan
    MOLECULAR PHYLOGENETICS AND EVOLUTION, 2023, 179