Optimal sequence similarity thresholds for clustering of molecular operational taxonomic units in DNA metabarcoding studies

被引:31
作者
Bonin, Aurelie [1 ,2 ]
Guerrieri, Alessia [1 ]
Ficetola, Gentile Francesco [1 ,3 ]
机构
[1] Univ Milan, Dept Environm Sci & Policy, Milan, Italy
[2] Argaly, Batiment Clean Space, St Helene Du Lac, France
[3] Univ Grenoble Alpes, Univ Savoie Mt Blanc, Lab Ecol Alpine, LECA,CNRS, Grenoble, France
基金
欧洲研究理事会;
关键词
alpha diversity; COI; metabarcoding marker; MOTU over-merging; MOTU over-splitting; sequence variant; COMMUNITY STRUCTURE; EXTRACELLULAR DNA; SOIL; DIVERSITY;
D O I
10.1111/1755-0998.13709
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Clustering approaches are pivotal to handle the many sequence variants obtained in DNA metabarcoding data sets, and therefore they have become a key step of metabarcoding analysis pipelines. Clustering often relies on a sequence similarity threshold to gather sequences into molecular operational taxonomic units (MOTUs), each of which ideally represents a homogeneous taxonomic entity (e.g., a species or a genus). However, the choice of the clustering threshold is rarely justified, and its impact on MOTU over-splitting or over-merging even less tested. Here, we evaluated clustering threshold values for several metabarcoding markers under different criteria: limitation of MOTU over-merging, limitation of MOTU over-splitting, and trade-off between over-merging and over-splitting. We extracted sequences from a public database for nine markers, ranging from generalist markers targeting Bacteria or Eukaryota, to more specific markers targeting a class or a subclass (e.g., Insecta, Oligochaeta). Based on the distributions of pairwise sequence similarities within species and within genera, and on the rates of over-splitting and over-merging across different clustering thresholds, we were able to propose threshold values minimizing the risk of over-splitting, that of over-merging, or offering a trade-off between the two risks. For generalist markers, high similarity thresholds (0.96-0.99) are generally appropriate, while more specific markers require lower values (0.85-0.96). These results do not support the use of a fixed clustering threshold. Instead, we advocate careful examination of the most appropriate threshold based on the research objectives, the potential costs of over-splitting and over-merging, and the features of the studied markers.
引用
收藏
页码:368 / 381
页数:14
相关论文
共 58 条
[11]   On the unreliability of published DNA sequences [J].
Bridge, PD ;
Roberts, PJ ;
Spooner, BM ;
Panchal, G .
NEW PHYTOLOGIST, 2003, 160 (01) :43-48
[12]   Divergence thresholds and divergent biodiversity estimates: can metabarcoding reliably describe zooplankton communities? [J].
Brown, Emily A. ;
Chain, Frederic J. J. ;
Crease, Teresa J. ;
MacIsaac, Hugh J. ;
Cristescu, Melania E. .
ECOLOGY AND EVOLUTION, 2015, 5 (11) :2234-2251
[13]   From environmental DNA sequences to ecological conclusions: How strong is the influence of methodological choices? [J].
Calderon-Sanou, Irene ;
Munkemuller, Tamara ;
Boyer, Frederic ;
Zinger, Lucie ;
Thuiller, Wilfried .
JOURNAL OF BIOGEOGRAPHY, 2020, 47 (01) :193-206
[14]   Lake Sedimentary DNA Research on Past Terrestrial and Aquatic Biodiversity: Overview and Recommendations [J].
Capo, Eric ;
Giguet-Covex, Charline ;
Rouillard, Alexandra ;
Nota, Kevin ;
Heintzman, Peter D. ;
Vuillemin, Aurele ;
Ariztegui, Daniel ;
Arnaud, Fabien ;
Belle, Simon ;
Bertilsson, Stefan ;
Bigler, Christian ;
Bindler, Richard ;
Brown, Antony G. ;
Clarke, Charlotte L. ;
Crump, Sarah E. ;
Debroas, Didier ;
Englund, Goran ;
Ficetola, Gentile Francesco ;
Garner, Rebecca E. ;
Gauthier, Joanna ;
Gregory-Eaves, Irene ;
Heinecke, Liv ;
Herzschuh, Ulrike ;
Ibrahim, Anan ;
Kisand, Veljo ;
Kjaer, Kurt H. ;
Lammers, Youri ;
Littlefair, Joanne ;
Messager, Erwan ;
Monchamp, Marie-Eve ;
Olajos, Fredrik ;
Orsi, William ;
Pedersen, Mikkel W. ;
Rijal, Dilli P. ;
Rydberg, Johan ;
Spanbauer, Trisha ;
Stoof-Leichsenring, Kathleen R. ;
Taberlet, Pierre ;
Talas, Liisi ;
Thomas, Camille ;
Walsh, David A. ;
Wang, Yucheng ;
Willerslev, Eske ;
van Woerkom, Anne ;
Zimmermann, Heike H. ;
Coolen, Marco J. L. ;
Epp, Laura S. ;
Domaizon, Isabelle ;
Alsos, Inger G. ;
Parducci, Laura .
QUATERNARY, 2021, 4 (01)
[15]  
Chen W., 2020, ENVIRON DNA, V2, P115, DOI DOI 10.1002/EDN3.79
[16]   The effects of parameter choice on defining molecular operational taxonomic units and resulting ecological analyses of metabarcoding data [J].
Clare, Elizabeth L. ;
Chain, Frederic J. J. ;
Littlefair, Joanne E. ;
Cristescu, Melania E. .
GENOME, 2016, 59 (11) :981-990
[17]   High-throughput sequencing on preservative ethanol is effective at jointly examining infraspecific and taxonomic diversity, although bioinformatics pipelines do not perform equally [J].
Couton, Marjorie ;
Baud, Aurelien ;
Daguin-Thiebaut, Claire ;
Corre, Erwan ;
Comtet, Thierry ;
Viard, Frederique .
ECOLOGY AND EVOLUTION, 2021, 11 (10) :5533-5546
[18]   Towards robust and repeatable sampling methods in eDNA-based studies [J].
Dickie, Ian A. ;
Boyer, Stephane ;
Buckley, Hannah L. ;
Duncan, Richard P. ;
Gardner, Paul P. ;
Hogg, Ian D. ;
Holdaway, Robert J. ;
Lear, Gavin ;
Makiola, Andreas ;
Morales, Sergio E. ;
Powell, Jeff R. ;
Weaver, Louise .
MOLECULAR ECOLOGY RESOURCES, 2018, 18 (05) :940-952
[19]   Optimizing techniques to capture and extract environmental DNA for detection and quantification of fish [J].
Eichmiller, Jessica J. ;
Miller, Loren M. ;
Sorensen, Peter W. .
MOLECULAR ECOLOGY RESOURCES, 2016, 16 (01) :56-68
[20]   Validation and Development of COI Metabarcoding Primers for Freshwater Macroinvertebrate Bioassessment [J].
Elbrecht, Vasco ;
Leese, Florian .
FRONTIERS IN ENVIRONMENTAL SCIENCE, 2017, 5