Normalized Lempel-Ziv complexity and its application in bio-sequence analysis

被引:24
作者
Zhang, Yi [1 ]
Hao, Junkang [2 ]
Zhou, Changjie [1 ]
Chang, Kai [3 ]
机构
[1] Hebei Univ Sci & Technol, Dept Math, Shijiazhuang 050018, HeBei, Peoples R China
[2] Hebei Univ Sci & Technol, Dept Phys Educ, Shijiazhuang 050018, HeBei, Peoples R China
[3] Beijing Inst Technol, Dept Automat Control, Sch Informat Sci & Technol, Beijing 100081, Peoples R China
关键词
Lempel-Ziv complexity; Normalized; Similarity; Relationship tree; DNA PRIMARY SEQUENCES; RNA SECONDARY STRUCTURES; SIMILARITY ANALYSIS; GRAPHICAL REPRESENTATION;
D O I
10.1007/s10910-008-9512-2
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
In this article, we propose a new method to measure DNA similarity based on a normalized Lempel-Ziv complexity scheme. The new method can weaken the effect of sequence length on complexity measurement and save computation time. Firstly, a DNA sequence is transformed into three (0,1)-sequences based on a scheme, which considers "A" and "non-A" , "G" and "non-G", "C" and "non-C" bases respectively. Then, the normalized Lempel-Ziv complexity of the three (0,1)-sequences constitute a 3D vector. Finally, by the 3D vector, one may characterize DNA sequences and compute similarity matrix for them. The examination of similarities of two sets of DNA sequences illustrates the utility of the method in local and global similarity analysis.
引用
收藏
页码:1203 / 1212
页数:10
相关论文
共 17 条
  • [1] [Anonymous], 1995, Introduction to computational biology: maps, sequences and genomes
  • [2] Correlation between strand asymmetry and phylogeny in mitochondrial DNA
    Barral, PJ
    Cantini, L
    Hasmy, A
    Jiménez, J
    Marcano, A
    [J]. JOURNAL OF THEORETICAL BIOLOGY, 2005, 236 (04) : 422 - 426
  • [3] Characteristic sequences for DNA primary sequence
    He, PA
    Wang, J
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2002, 42 (05): : 1080 - 1085
  • [4] COMPLEXITY OF FINITE SEQUENCES
    LEMPEL, A
    ZIV, J
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 1976, 22 (01) : 75 - 81
  • [5] On a 3-D representation of DNA primary sequences
    Li, C
    Wang, J
    [J]. COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING, 2004, 7 (01) : 23 - 27
  • [6] Similarity of RNA secondary structures
    Li, Chun
    Wang, Aj-Hua
    Xing, Lili
    [J]. JOURNAL OF COMPUTATIONAL CHEMISTRY, 2007, 28 (02) : 508 - 512
  • [7] New 2D graphical representation of DNA sequences
    Liao, B
    Wang, TM
    [J]. JOURNAL OF COMPUTATIONAL CHEMISTRY, 2004, 25 (11) : 1364 - 1368
  • [8] A relative similarity measure for the similarity analysis of DNA sequences
    Liu, A
    Wang, TM
    [J]. CHEMICAL PHYSICS LETTERS, 2005, 408 (4-6) : 307 - 311
  • [9] A method for rapid similarity analysis of RNA secondary structures
    Liu, Na
    Wang, Tianming
    [J]. BMC BIOINFORMATICS, 2006, 7 (1)
  • [10] Liu Y., 2002, INTERNET ELECT J MOL, V1, P675