Alignment-free inference of hierarchical and reticulate phylogenomic relationships

被引:58
作者
Bernard, Guillaume [2 ]
Chan, Cheong Xin [3 ]
Chan, Yao-ban [4 ]
Chua, Xin-Yi [5 ]
Cong, Yingnan [1 ]
Hogan, James M. [6 ]
Maetschke, Stefan R. [7 ]
Ragan, Mark A. [8 ]
机构
[1] Univ Queensland, Inst Mol Biosci, 306 Carmody Rd, Brisbane, Qld 4072, Australia
[2] Inst Mol Biosci, Brisbane, Qld, Australia
[3] Univ Queensland, Brisbane, Qld, Australia
[4] Univ Melbourne, Melbourne, Vic, Australia
[5] QFAB Bioinformat, Melbourne, Vic, Australia
[6] Queensland Univ Technol, Comp Sci, Brisbane, Qld, Australia
[7] IBM Res Australia, Brisbane, Qld, Australia
[8] Inst Mol Biosci, Computat Gen, Brisbane, Qld, Australia
基金
澳大利亚研究理事会;
关键词
alignment-free; phylogenomics; lateral genetic transfer; k-mer; D2; statistics; TF-IDF; LATERAL GENETIC TRANSFER; FEATURE FREQUENCY PROFILES; MICROBIAL EVOLUTION; MAMMALIAN ENHANCERS; SEQUENCE ALIGNMENT; SURROGATE METHODS; WORD MATCHES; TREE; RECONSTRUCTION; SIMILARITY;
D O I
10.1093/bib/bbx067
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We are amidst an ongoing flood of sequence data arising from the application of high-throughput technologies, and a concomitant fundamental revision in our understanding of how genomes evolve individually and within the biosphere. Workflows for phylogenomic inference must accommodate data that are not only much larger than before, but often more error prone and perhaps misassembled, or not assembled in the first place. Moreover, genomes of microbes, viruses and plasmids evolve not only by tree-like descent with modification but also by incorporating stretches of exogenous DNA. Thus, next-generation phylogenomics must address computational scalability while rethinking the nature of orthogroups, the alignment of multiple sequences and the inference and comparison of trees. New phylogenomic workflows have begun to take shape based on so-called alignment-free (AF) approaches. Here, we review the conceptual foundations of AF phylogenetics for the hierarchical (vertical) and reticulate (lateral) components of genome evolution, focusing on methods based on k-mers. We reflect on what seems to be successful, and on where further development is needed.
引用
收藏
页码:426 / 435
页数:10
相关论文
共 113 条
[11]   The Distribution of Word Matches Between Markovian Sequences with Periodic Boundary Conditions [J].
Burden, Conrad J. ;
Leopardi, Paul ;
Foret, Sylvain .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2014, 21 (01) :41-63
[12]   Alignment-free Sequence Comparison for Biologically Realistic Sequences of Moderate Length [J].
Burden, Conrad J. ;
Jing, Junmei ;
Wilson, Susan R. .
STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2012, 11 (01)
[13]   THE MULTIPLE SEQUENCE ALIGNMENT PROBLEM IN BIOLOGY [J].
CARRILLO, H ;
LIPMAN, D .
SIAM JOURNAL ON APPLIED MATHEMATICS, 1988, 48 (05) :1073-1082
[14]   Detecting recombination in evolving nucleotide sequences [J].
Chan, Cheong Xin ;
Beiko, Robert G. ;
Ragan, Mark A. .
BMC BIOINFORMATICS, 2006, 7 (1)
[15]  
Chan CX, 2017, METHODS MOL BIOL, V1525, P421, DOI 10.1007/978-1-4939-6622-6_16
[16]   Inferring phylogenies of evolving sequences without multiple sequence alignment [J].
Chan, Cheong Xin ;
Bernard, Guillaume ;
Poirion, Olivier ;
Hogan, James M. ;
Ragan, Mark A. .
SCIENTIFIC REPORTS, 2014, 4
[17]   Next-generation phylogenomics [J].
Chan, Cheong Xin ;
Ragan, Mark A. .
BIOLOGY DIRECT, 2013, 8
[18]   Lateral Transfer of Genes and Gene Fragments in Prokaryotes [J].
Chan, Cheong Xin ;
Beiko, Robert G. ;
Darling, Aaron E. ;
Ragan, Mark A. .
GENOME BIOLOGY AND EVOLUTION, 2009, 1 :429-438
[19]   Are Protein Domains Modules of Lateral Genetic Transfer? [J].
Chan, Cheong Xin ;
Darling, Aaron E. ;
Beiko, Robert G. ;
Ragan, Mark A. .
PLOS ONE, 2009, 4 (02)
[20]   Genomic DNA k-mer spectra: models and modalities [J].
Chor, Benny ;
Horn, David ;
Goldman, Nick ;
Levy, Yaron ;
Massingham, Tim .
GENOME BIOLOGY, 2009, 10 (10)