Numerical encoding of DNA sequences by chaos game representation with application in similarity comparison

被引:66
作者
Hoang, Tung [1 ]
Yin, Changchuan [1 ]
Yau, Stephen S. -T. [2 ]
机构
[1] Univ Ilinois, Dept Math Stat & Comp Sci, Chicago, IL 60607 USA
[2] Tsinghua Univ, Dept Math Sci, Beijing 100084, Peoples R China
关键词
Chaos game representation; Digital signal processing; Clustal Omega; PHYLOGENETIC ANALYSIS; INFLUENZA-A; PREDICTION; ALIGNMENT; EVOLUTION;
D O I
10.1016/j.ygeno.2016.08.002
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Numerical encoding plays an important role in DNA sequence analysis via computational methods, in which numerical values are associated with corresponding symbolic characters. After numerical representation, digital signal processing methods can be exploited to analyze DNA sequences. To reflect the biological properties of the original sequence, it is vital that the representation is one-to-one. Chaos Game Representation (CGR) is an iterative mapping technique that assigns each nucleotide in a DNA sequence to a respective position on the plane that allows the depiction of the DNA sequence in the form of image. Using CGR, a biological sequence can be transformed one-to-one to a numerical sequence that preserves the main features of the original sequence. In this research, we propose to encode DNA sequences by considering 2D CGR coordinates as complex numbers, and apply digital signal processing methods to analyze their evolutionary relationship. Computational experiments indicate that this approach gives comparable results to the state-of-the-art multiple sequence alignment method, Clustal Omega, and is significantly faster. The MATLAB code for our method can be accessed from: www.mathworks.com/matlabcentral/fileexchange/57152 (C) 2016 Elsevier Inc. All rights reserved.
引用
收藏
页码:134 / 142
页数:9
相关论文
共 40 条
[1]   A review of avian influenza in different bird species [J].
Alexander, DJ .
VETERINARY MICROBIOLOGY, 2000, 74 (1-2) :3-13
[2]   Analysis of genomic sequences by Chaos Game Representation [J].
Almeida, JS ;
Carriço, JA ;
Maretzek, A ;
Noble, PA ;
Fletcher, M .
BIOINFORMATICS, 2001, 17 (05) :429-437
[3]   Frequency-domain analysis of biomolecular sequences [J].
Anastassiou, D .
BIOINFORMATICS, 2000, 16 (12) :1073-1081
[4]   Worldwide burden of cervical cancer in 2008 [J].
Arbyn, M. ;
Castellsague, X. ;
de Sanjose, S. ;
Bruni, L. ;
Saraiya, M. ;
Bray, F. ;
Ferlay, J. .
ANNALS OF ONCOLOGY, 2011, 22 (12) :2675-2686
[5]  
Arniker S. Bai, INT C BIOSC BIOCH BI, P1
[7]   Genomic signature: Characterization and classification of species assessed by chaos game representation of sequences [J].
Deschavanne, PJ ;
Giron, A ;
Vilain, J ;
Fagot, G ;
Fertil, B .
MOLECULAR BIOLOGY AND EVOLUTION, 1999, 16 (10) :1391-1399
[8]   MUSCLE: multiple sequence alignment with high accuracy and high throughput [J].
Edgar, RC .
NUCLEIC ACIDS RESEARCH, 2004, 32 (05) :1792-1797
[9]   Antigenic and Genetic Characteristics of Swine-Origin 2009 A(H1N1) Influenza Viruses Circulating in Humans [J].
Garten, Rebecca J. ;
Davis, C. Todd ;
Russell, Colin A. ;
Shu, Bo ;
Lindstrom, Stephen ;
Balish, Amanda ;
Sessions, Wendy M. ;
Xu, Xiyan ;
Skepner, Eugene ;
Deyde, Varough ;
Okomo-Adhiambo, Margaret ;
Gubareva, Larisa ;
Barnes, John ;
Smith, Catherine B. ;
Emery, Shannon L. ;
Hillman, Michael J. ;
Rivailler, Pierre ;
Smagala, James ;
de Graaf, Miranda ;
Burke, David F. ;
Fouchier, Ron A. M. ;
Pappas, Claudia ;
Alpuche-Aranda, Celia M. ;
Lopez-Gatell, Hugo ;
Olivera, Hiram ;
Lopez, Irma ;
Myers, Christopher A. ;
Faix, Dennis ;
Blair, Patrick J. ;
Yu, Cindy ;
Keene, Kimberly M. ;
Dotson, P. David, Jr. ;
Boxrud, David ;
Sambol, Anthony R. ;
Abid, Syed H. ;
George, Kirsten St. ;
Bannerman, Tammy ;
Moore, Amanda L. ;
Stringer, David J. ;
Blevins, Patricia ;
Demmler-Harrison, Gail J. ;
Ginsberg, Michele ;
Kriner, Paula ;
Waterman, Steve ;
Smole, Sandra ;
Guevara, Hugo F. ;
Belongia, Edward A. ;
Clark, Patricia A. ;
Beatrice, Sara T. ;
Donis, Ruben .
SCIENCE, 2009, 325 (5937) :197-201
[10]   A new method to cluster DNA sequences using Fourier power spectrum [J].
Hoang, Tung ;
Yin, Changchuan ;
Zheng, Hui ;
Yu, Chenglong ;
He, Rong Lucy ;
Yau, Stephen S. -T. .
JOURNAL OF THEORETICAL BIOLOGY, 2015, 372 :135-145