Alignment-free sequence comparison for virus genomes based on location correlation coefficient

被引:7
|
作者
He, Lily [1 ]
Sun, Siyang [2 ]
Zhang, Qianyue [2 ]
Bao, Xiaona [1 ]
Li, Peter K. [3 ]
机构
[1] Beijing Univ Civil Engn & Architecture, Sch Sci, Beijing 102616, Peoples R China
[2] Renmin Univ China, High Sch, Beijing 100080, Peoples R China
[3] Tsinghua Univ, Sch Life Sci, Beijing 100084, Peoples R China
关键词
SARS-CoV-2; Alignment-free; Correlation measure; DNA sequence;
D O I
10.1016/j.meegid.2021.105106
中图分类号
R51 [传染病];
学科分类号
100401 ;
摘要
Coronaviruses (especially SARS-CoV-2) are characterized by rapid mutation and wide spread. As these characteristics easily lead to global pandemics, studying the evolutionary relationship between viruses is essential for clinical diagnosis. DNA sequencing has played an important role in evolutionary analysis. Recent alignment-free methods can overcome the problems of traditional alignment-based methods, which consume both time and space. This paper proposes a novel alignment-free method called the correlation coefficient feature vector (CCFV), which defines a correlation measure of the L-step delay of a nucleotide location from its location in the original DNA sequence. The numerical feature is a 16 xL-dimensional numerical vector describing the distribution characteristics of the nucleotide positions in a DNA sequence. The proposed L-step delay correlation measure is interestingly related to some types of L + 1 spaced mers. Unlike traditional gene comparison, our method avoids the computational complexity of multiple sequence alignment, and hence improves the speed of sequence comparison. Our method is applied to evolutionary analysis of the common human viruses including SARS-CoV-2, Dengue virus, Hepatitis B virus, and human rhinovirus and achieves the same or even better results than alignment-based methods. Especially for SARS-CoV-2, our method also confirms that bats are potential intermediate hosts of SARS-CoV-2.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Alignment-free sequence comparison method based on whole genomes and its application to virus phylogeny
    College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, China
    不详
    Tien Tzu Hsueh Pao, 2006, 2 (277-281):
  • [2] Alignment-free sequence comparison - a review
    Vinga, S
    Almeida, J
    BIOINFORMATICS, 2003, 19 (04) : 513 - 523
  • [3] Multiple alignment-free sequence comparison
    Ren, Jie
    Song, Kai
    Sun, Fengzhu
    Deng, Minghua
    Reinert, Gesine
    BIOINFORMATICS, 2013, 29 (21) : 2690 - 2698
  • [4] A probabilistic measure for alignment-free sequence comparison
    Pham, TD
    Zuegg, J
    BIOINFORMATICS, 2004, 20 (18) : 3455 - 3461
  • [5] Benchmarking of alignment-free sequence comparison methods
    Zielezinski, Andrzej
    Girgis, Hani Z.
    Bernard, Guillaume
    Leimeister, Chris-Andre
    Tang, Kujin
    Dencker, Thomas
    Lau, Anna Katharina
    Roehling, Sophie
    Choi, Jae Jin
    Waterman, Michael S.
    Comin, Matteo
    Kim, Sung-Hou
    Vinga, Susana
    Almeida, Jonas S.
    Chan, Cheong Xin
    James, Benjamin T.
    Sun, Fengzhu
    Morgenstern, Burkhard
    Karlowski, Wojciech M.
    GENOME BIOLOGY, 2019, 20 (1)
  • [6] Benchmarking of alignment-free sequence comparison methods
    Andrzej Zielezinski
    Hani Z. Girgis
    Guillaume Bernard
    Chris-Andre Leimeister
    Kujin Tang
    Thomas Dencker
    Anna Katharina Lau
    Sophie Röhling
    Jae Jin Choi
    Michael S. Waterman
    Matteo Comin
    Sung-Hou Kim
    Susana Vinga
    Jonas S. Almeida
    Cheong Xin Chan
    Benjamin T. James
    Fengzhu Sun
    Burkhard Morgenstern
    Wojciech M. Karlowski
    Genome Biology, 20
  • [7] Study on the Relation between Virus and Host Cell by Alignment-free Sequence Comparison
    Liu Xue-mei
    Zang Xiang
    Huang Tian-lai
    Yang Zhe
    Li Wen
    Ye Yu-zhong
    Hu Shan
    Li Jing
    PROCEEDINGS OF 2016 12TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS), 2016, : 391 - 394
  • [8] Weighted measures based on maximizing deviation for alignment-free sequence comparison
    Qian, Kun
    Luan, Yihui
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2017, 481 : 235 - 242
  • [9] A Geometric Interpretation for Local Alignment-Free Sequence Comparison
    Behnam, Ehsan
    Waterman, Michael S.
    Smith, Andrew D.
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2013, 20 (07) : 471 - 485
  • [10] Alignment-Free Sequence Comparison With Multiple k Values
    Qian, Ying
    Zhang, Yu
    Zhang, Jiongmin
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2021, 18 (05) : 1841 - 1849