nRCFV: a new, dataset-size-independent metric to quantify compositional heterogeneity in nucleotide and amino acid datasets

被引:6
作者
Fleming, James F. [1 ]
Struck, Torsten H. [1 ]
机构
[1] Univ Oslo, Nat Hist Museum, Sars Gata 1, Oslo, Norway
关键词
Phylogenetics; Compositional heterogeneity; Bioinformatics software; MODEL SELECTION; EVOLUTION; PHYLOGENETICS; GENE;
D O I
10.1186/s12859-023-05270-8
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation Compositional heterogeneity-when the proportions of nucleotides and amino acids are not broadly similar across the dataset-is a cause of a great number of phylogenetic artefacts. Whilst a variety of methods can identify it post-hoc, few metrics exist to quantify compositional heterogeneity prior to the computationally intensive task of phylogenetic tree reconstruction. Here we assess the efficacy of one such existing, widely used, metric: Relative Composition Frequency Variability (RCFV), using both real and simulated data.Results Our results show that RCFV can be biased by sequence length, the number of taxa, and the number of possible character states within the dataset. However, we also find that missing data does not appear to have an appreciable effect on RCFV. We discuss the theory behind this, the consequences of this for the future of the usage of the RCFV value and propose a new metric, nRCFV, which accounts for these biases. Alongside this, we present a new software that calculates both RCFV and nRCFV, called nRCFV_Reader.
引用
收藏
页数:25
相关论文
共 45 条
  • [1] Matched-pairs tests of homogeneity with applications to homologous nucleotide sequences
    Ababneh, F
    Jermiin, LS
    Ma, CS
    Robinson, J
    [J]. BIOINFORMATICS, 2006, 22 (10) : 1225 - 1231
  • [2] Incomplete lineage sorting and ancient admixture, and speciation without morphological change in ghost-worm cryptic species
    Cerca, Jose
    Rivera-Colon, Angel G.
    Ferreira, Mafalda S.
    Ravinet, Mark
    Nowak, Michael D.
    Catchen, Julian M.
    Struck, Torsten H.
    [J]. PEERJ, 2021, 9
  • [3] Espinosa de los Monteros A., 2020, AVIAN MALAR RELAT PA, DOI [10.1007/978-3-030-51633-8_3, DOI 10.1007/978-3-030-51633-8_3]
  • [4] A Novel Approach to Investigate the Effect of Tree Reconstruction Artifacts in Single-Gene Analysis Clarifies Opsin Evolution in Nonbilaterian Metazoans
    Fleming, James F.
    Feuda, Roberto
    Roberts, Nicholas W.
    Pisani, Davide
    [J]. GENOME BIOLOGY AND EVOLUTION, 2020, 12 (02): : 3906 - 3916
  • [5] Compositional bias may affect both DNA-based and protein-based phylogenetic reconstructions
    Foster, PG
    Hickey, DA
    [J]. JOURNAL OF MOLECULAR EVOLUTION, 1999, 48 (03) : 284 - 290
  • [6] Modeling compositional heterogeneity
    Foster, PG
    [J]. SYSTEMATIC BIOLOGY, 2004, 53 (03) : 485 - 495
  • [7] Tracing the decay of the historical signal in biological sequence data
    Ho, SYW
    Jermiin, LS
    [J]. SYSTEMATIC BIOLOGY, 2004, 53 (04) : 623 - 637
  • [8] Hohna Sebastian, 2017, Curr Protoc Bioinformatics, V57, DOI 10.1002/cpbi.22
  • [9] RY-Coding and Non-Homogeneous Models Can Ameliorate the Maximum-Likelihood Inferences From Nucleotide Sequence Data with Parallel Compositional Heterogeneity
    Ishikawa, Sohta A.
    Inagaki, Yuji
    Hashimoto, Tetsuo
    [J]. EVOLUTIONARY BIOINFORMATICS, 2012, 8 : 357 - 371
  • [10] Jermiin LS, 2017, METHODS MOL BIOL, V1525, P379, DOI 10.1007/978-1-4939-6622-6_15