Using Huffman Coding Method to Visualize and Analyze DNA Sequences

被引:18
作者
Qi, Zhao-Hui [1 ]
Li, Ling [2 ]
Qi, Xiao-Qin [1 ]
机构
[1] Shijiazhuang Tiedao Univ, Coll Informat Sci & Technol, Shijiazhuang 050043, Hebei, Peoples R China
[2] Zhejiang Shuren Univ, Basic Courses Dept, Hangzhou 310015, Zhejiang, Peoples R China
关键词
Huffman coding method; graphical representation; DNA sequence; sequence analysis; 2D GRAPHICAL REPRESENTATION; CHAOS-GAME REPRESENTATION; NUMERICAL CHARACTERIZATION; DUAL NUCLEOTIDES; H-CURVES; SIMILARITY/DISSIMILARITY; CLASSIFICATION; DESCRIPTORS; INVARIANTS; MATRIX;
D O I
10.1002/jcc.21906
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
On the basis of the Huffman coding method, we propose a new graphical representation of DNA sequence. The representation can avoid degeneracy and loss of information in the transfer of data from a DNA sequence to its graphical representation. Then a multicomponent vector from the representation is introduced to characterize quantitatively DNA sequences. The components of the vector are derived from the graphical representation of DNA primary sequence. The examination of similarities and dissimilarities among the complete coding sequences of beta-globin gene of 11 species and six ND6 proteins shows the utility of the scheme. (C) 2011 Wiley Periodicals, Inc. J Comput Chem 32: 3233-3240, 2011
引用
收藏
页码:3233 / 3240
页数:8
相关论文
共 36 条
  • [31] A new 2D graphical representation - Classification curve and the analysis of similarity/dissimilarity of DNA sequences
    Yao, Yu-hua
    Nan, Xu-ying
    Wang, Tian-ming
    [J]. JOURNAL OF MOLECULAR STRUCTURE-THEOCHEM, 2006, 764 (1-3): : 101 - 108
  • [32] Reannotation of Protein-Coding Genes Based on an Improved Graphical Representation of DNA Sequence
    Yu, Jia-Feng
    Sun, Xiao
    [J]. JOURNAL OF COMPUTATIONAL CHEMISTRY, 2010, 31 (11) : 2126 - 2135
  • [33] TN curve: A novel 3D graphical representation of DNA sequence based on trinucleotides and its applications
    Yu, Jia-Feng
    Sun, Xiao
    Wang, Ji-Hua
    [J]. JOURNAL OF THEORETICAL BIOLOGY, 2009, 261 (03) : 459 - 468
  • [34] Normalized Lempel-Ziv complexity and its application in bio-sequence analysis
    Zhang, Yi
    Hao, Junkang
    Zhou, Changjie
    Chang, Kai
    [J]. JOURNAL OF MATHEMATICAL CHEMISTRY, 2009, 46 (04) : 1203 - 1212
  • [35] Invariants of DNA sequences based on 2DD-curves
    Zhang, Yusen
    Chen, Wei
    [J]. JOURNAL OF THEORETICAL BIOLOGY, 2006, 242 (02) : 382 - 388
  • [36] DV-Curve: a novel intuitive tool for visualizing and analyzing DNA sequences
    Zhang, Zhu-Jin
    [J]. BIOINFORMATICS, 2009, 25 (09) : 1112 - 1117