A comprehensive review and evaluation of species richness estimation

被引:1
作者
Schmitz, Johanna Elena [1 ,2 ,3 ]
Rahmann, Sven [1 ,2 ]
机构
[1] Algorithm Bioinformat, Ctr Bioinformat Saar, Saarland Informat Campus, D-66123 Saarbrucken, Germany
[2] Saarland Univ, Fak MI, Saarland Informat Campus, D-66123 Saarbrucken, Germany
[3] Saarbrucken Grad Sch Comp Sci, Saarland Informat Campus, D-66123 Saarbrucken, Germany
关键词
species richness; diversity estimation; upsampling; immune repertoire; microbiome; comparative evaluation; CAPTURE PROBABILITIES VARY; POPULATION-SIZE; NUMBER; DIVERSITY; SAMPLE; EXTRAPOLATION; RAREFACTION; POISSON; MODELS;
D O I
10.1093/bib/bbaf158
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation The statistical problem of estimating the total number of distinct species in a population (or distinct elements in a multiset), given only a small sample, occurs in various areas, ranging from the unseen species problem in ecology to estimating the diversity of immune repertoires. Accurately estimating the true richness from very small samples is challenging, in particular for highly diverse populations with many rare species. Depending on the application, different estimation strategies have been proposed that incorporate explicit or implicit assumptions about either the species distribution or about the sampling process. These methods are scattered across the literature, and an extensive overview of their assumptions, methodology, and performance is currently lacking.Results We comprehensively review and evaluate a variety of existing methods on real and simulated data with different compositions of rare and abundant species. Our evaluation shows that, depending on species composition, different methods provide the most accurate richness estimates. Simple methods based on the observed number of singletons yield accurate asymptotic lower bounds for several of the tested simulated species compositions, but tend to underestimate the true richness for heterogeneous populations and small samples containing 1% to 5% of the population. When the population size is known, upsampling (extrapolating) estimators such as PreSeq and RichnEst yield accurate estimates of the total species richness in a sample that is up to 10 times larger than the observed sample.Availability Source code for data simulation and richness estimation is available at https://gitlab.com/rahmannlab/speciesrichness.
引用
收藏
页数:16
相关论文
共 47 条
[11]   ESTIMATING THE NUMBER OF CLASSES VIA SAMPLE COVERAGE [J].
CHAO, A ;
LEE, SM .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1992, 87 (417) :210-217
[12]  
Chao A., 2006, Encyclopedia of Statistical Sciences
[13]   Rarefaction and extrapolation with Hill numbers: a framework for sampling and estimation in species diversity studies [J].
Chao, Anne ;
Gotelli, Nicholas J. ;
Hsieh, T. C. ;
Sander, Elizabeth L. ;
Ma, K. H. ;
Colwell, Robert K. ;
Ellison, Aaron M. .
ECOLOGICAL MONOGRAPHS, 2014, 84 (01) :45-67
[14]   A more reliable species richness estimator based on the Gamma-Poisson model [J].
Chiu, Chun-Huo .
PEERJ, 2023, 11
[15]   Estimating and comparing microbial diversity in the presence of sequencing errors [J].
Chiu, Chun-Huo ;
Chao, Anne .
PEERJ, 2016, 4
[16]   ESTIMATING TERRESTRIAL BIODIVERSITY THROUGH EXTRAPOLATION [J].
COLWELL, RK ;
CODDINGTON, JA .
PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 1994, 345 (1311) :101-118
[17]   Models and estimators linking individual-based and sample-based rarefaction, extrapolation and comparison of assemblages [J].
Colwell, Robert K. ;
Chao, Anne ;
Gotelli, Nicholas J. ;
Lin, Shang-Yi ;
Mao, Chang Xuan ;
Chazdon, Robin L. ;
Longino, John T. .
JOURNAL OF PLANT ECOLOGY, 2012, 5 (01) :3-21
[18]   Predicting the molecular complexity of sequencing libraries [J].
Daley, Timothy ;
Smith, Andrew D. .
NATURE METHODS, 2013, 10 (04) :325-+
[19]   Metagenomic applications in microbial diversity, bioremediation, pollution monitoring, enzyme and drug discovery. A review [J].
Datta, Saptashwa ;
Rajnish, K. Narayanan ;
Samuel, Melvin S. ;
Pugazlendhi, Arivalagan ;
Selvarajan, Ethiraj .
ENVIRONMENTAL CHEMISTRY LETTERS, 2020, 18 (04) :1229-1241
[20]   Comparison between 16S rRNA and shotgun sequencing data for the taxonomic characterization of the gut microbiota [J].
Durazzi, Francesco ;
Sala, Claudia ;
Castellani, Gastone ;
Manfreda, Gerardo ;
Remondini, Daniel ;
De Cesare, Alessandra .
SCIENTIFIC REPORTS, 2021, 11 (01)