Alignments and Trimmed Alignments: Their Characteristics and Phylogenetic Trees

被引:0
作者
Wu, Guang [1 ,2 ]
Yan, Shaomin [2 ]
机构
[1] Guangxi Acad Sci, Green Biotransformat & Biomfg Innovat Team, Nanning, Peoples R China
[2] Guangxi Acad Sci, Inst Biol Sci & Technol, Guangxi Key Lab Biorefinery, Nanning, Peoples R China
来源
2024 24TH INTERNATIONAL CONFERENCE ON TRANSPARENT OPTICAL NETWORKS, ICTON 2024 | 2024年
关键词
alignment; substitution matrix; phylogenetic tree; trim;
D O I
10.1109/ICTON62926.2024.10647366
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The multiple sequence alignment is important in bioinformatics. In this study, we used (1) MAFFT, MUSCLE, Clustal W, and Clustal X with substitution matrices, BLOSUM, Gonnet and PAM, to align 3228 coronavirus spike proteins; (2) the trimAl to trim the alignments using automated1, gappyout, strict and strictplus; (3) ME, ML, MP, NJ, and UPGMA to untrimmed and trimmed alignment to construct 140 phylogenetic trees. We tested the results using (a) amino acid composition, (b) alignment statistics, (c) alignment order and positions regarding referenced sequence, (d) phylogenetic information of alignment, (e) Robinson-Foulds distance, (f) locality of referenced sequence across phylogenetidc trees, and (g) effects of different methods with the model II one-way ANOVA. The results show (i) the alignment algorithms mainly affect the order of sequences in alignment, (ii) MUSCLE has the longest alignment, (iii) substitution matrices likely affect the alignment length under the same alignment algorithm, (iv) automated1 and strict generate the same result, (v) the longer the alignment, the heavier the trimming, (vi) trimming keeps the phylogenetic information, (vii) BLOSUM and Gonnet likely construct phylogenetic trees with zero Robinson-Foulds distance, (viii) the locality of referenced sequence in branch level varies across phylogenetic trees, and (ix) trimming generate identical sequences.
引用
收藏
页数:6
相关论文
共 16 条
[1]   AMAS: a fast tool for alignment manipulation and computing of summary statistics [J].
Borowiec, Marek L. .
PEERJ, 2016, 4
[2]  
Brooks DR, 2007, REV MEX BIODIVERS, V78, P225
[3]   trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses [J].
Capella-Gutierrez, Salvador ;
Silla-Martinez, Jose M. ;
Gabaldon, Toni .
BIOINFORMATICS, 2009, 25 (15) :1972-1973
[4]   Multiple sequence alignment modeling: methods and applications [J].
Chatzou, Maria ;
Magis, Cedrik ;
Chang, Jia-Ming ;
Kemena, Carsten ;
Bussotti, Giovanni ;
Erb, Ionas ;
Notredame, Cedric .
BRIEFINGS IN BIOINFORMATICS, 2016, 17 (06) :1009-1023
[5]   MUSCLE: multiple sequence alignment with high accuracy and high throughput [J].
Edgar, RC .
NUCLEIC ACIDS RESEARCH, 2004, 32 (05) :1792-1797
[6]   Muscle5: High-accuracy alignment ensembles enable unbiased assessments of sequence homology and phylogeny [J].
Edgar, Robert C. .
NATURE COMMUNICATIONS, 2022, 13 (01)
[7]   IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies [J].
Lam-Tung Nguyen ;
Schmidt, Heiko A. ;
von Haeseler, Arndt ;
Bui Quang Minh .
MOLECULAR BIOLOGY AND EVOLUTION, 2015, 32 (01) :268-274
[8]   Clustal W and clustal X version 2.0 [J].
Larkin, M. A. ;
Blackshields, G. ;
Brown, N. P. ;
Chenna, R. ;
McGettigan, P. A. ;
McWilliam, H. ;
Valentin, F. ;
Wallace, I. M. ;
Wilm, A. ;
Lopez, R. ;
Thompson, J. D. ;
Gibson, T. J. ;
Higgins, D. G. .
BIOINFORMATICS, 2007, 23 (21) :2947-2948
[9]   Evolutionary analysis of a streamlined lineage of surface ocean Roseobacters [J].
Luo, Haiwei ;
Swan, Brandon K. ;
Stepanauskas, Ramunas ;
Hughes, Austin L. ;
Moran, Mary Ann .
ISME JOURNAL, 2014, 8 (07) :1428-1439
[10]   Bioinformatics analysis of large-scale viral sequences From construction of data sets to annotation of a phylogenetic tree [J].
Munir, Muhammad .
VIRULENCE, 2013, 4 (01) :97-106