Numerical Characterization of DNA Sequences for Alignment-free Sequence Comparison-A Review

被引:2
作者
Ramanathan, Natarajan [1 ]
Ramamurthy, Jayalakshmi [2 ]
Natarajan, Ganapathy [3 ]
机构
[1] Sri Sarada Niketan Coll Women, Dept Chem, Karur 639005, Tamil Nadu, India
[2] Sri Sarada Niketan Coll Women, Dept Comp Sci, Karur 639005, Tamil Nadu, India
[3] Univ Wisconsin, Dept Mech Engn & Ind Engn, Platteville, WI 53818 USA
关键词
Numerical characterization; DNA sequences; alignment-free; sequence comparison; phylogenetic analysis; peptide-based vaccines; 2-D GRAPHICAL REPRESENTATION; CHAOS-GAME REPRESENTATION; RNA SECONDARY STRUCTURES; SIMILARITY ANALYSIS; GENOMIC SEQUENCES; CODING REGIONS; SIMILARITY/DISSIMILARITY; CLASSIFICATION; DESCRIPTORS; NUCLEOTIDE;
D O I
10.2174/1386207324666210811101437
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Biological macromolecules, namely, DNA, RNA, and protein, have their building blocks organized in a particular sequence and the sequential arrangement encodes the evolutionary history of the organism (species). Hence, biological sequences have been used for studying evolutionary relationships among the species. This is usually carried out by Multiple Sequence Algorithms (MSA). Due to certain limitations of MSA, alignment-free sequence comparison methods were developed. The present review is on alignment-free sequence comparison methods carried out using the numerical characterization of DNA sequences. Discussion: The graphical representation of DNA sequences by chaos game representation and other 2-dimensional and 3-dimensional methods are discussed. The evolution of numerical characterization from the various graphical representations and the application of the DNA invariants thus computed in phylogenetic analysis are presented. The extension of computing molecular descriptors in chemometrics to the calculation of a new set of DNA invariants and their use in alignment-free sequence comparison in an N-dimensional space and construction of phylogenetic trees are also reviewed. Conclusion: The phylogenetic tress constructed by the alignment-free sequence comparison methods using DNA invariants were found to be better than those constructed using alignment-based tools such as PHLYIP and ClustalW. One of the graphical representation methods is now extended to study viral sequences of infectious diseases for the identification of conserved regions to design peptide based vaccines by combining numerical characterization and graphical representation.
引用
收藏
页码:365 / 380
页数:16
相关论文
共 120 条
[1]   Universal sequence map (USM) of arbitrary discrete sequences [J].
Almeida, JS ;
Vinga, S .
BMC BIOINFORMATICS, 2002, 3 (1)
[2]   Analysis of genomic sequences by Chaos Game Representation [J].
Almeida, JS ;
Carriço, JA ;
Maretzek, A ;
Noble, PA ;
Fletcher, M .
BIOINFORMATICS, 2001, 17 (05) :429-437
[3]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[4]   Scalable metagenomic taxonomy classification using a reference genome database [J].
Ames, Sasha K. ;
Hysom, David A. ;
Gardner, Shea N. ;
Lloyd, G. Scott ;
Gokhale, Maya B. ;
Allen, Jonathan E. .
BIOINFORMATICS, 2013, 29 (18) :2253-2260
[5]  
[Anonymous], 1993, Fractals Everywhere
[6]   A representation of DNA primary sequences by random walk [J].
Bai, Feng-lan ;
Liu, Ying-zhao ;
Wang, Tian-ming .
MATHEMATICAL BIOSCIENCES, 2007, 209 (01) :282-291
[7]   On graphical and numerical representation of protein sequences [J].
Bai, FL ;
Wang, TM .
JOURNAL OF BIOMOLECULAR STRUCTURE & DYNAMICS, 2006, 23 (05) :537-545
[8]  
Basak S.C., 1999, Topological Indices and Related Descriptors in QSAR and QSPR, P563
[9]   Chaos game representation of proteins [J].
Basu, S ;
Pan, A ;
Dutta, C ;
Das, J .
JOURNAL OF MOLECULAR GRAPHICS & MODELLING, 1997, 15 (05) :279-289
[10]  
Baxevanis A.D., 2005, BIOINFORMATICS PRACT, V3rd