Optimal sequence similarity thresholds for clustering of molecular operational taxonomic units in DNA metabarcoding studies

被引:31
作者
Bonin, Aurelie [1 ,2 ]
Guerrieri, Alessia [1 ]
Ficetola, Gentile Francesco [1 ,3 ]
机构
[1] Univ Milan, Dept Environm Sci & Policy, Milan, Italy
[2] Argaly, Batiment Clean Space, St Helene Du Lac, France
[3] Univ Grenoble Alpes, Univ Savoie Mt Blanc, Lab Ecol Alpine, LECA,CNRS, Grenoble, France
基金
欧洲研究理事会;
关键词
alpha diversity; COI; metabarcoding marker; MOTU over-merging; MOTU over-splitting; sequence variant; COMMUNITY STRUCTURE; EXTRACELLULAR DNA; SOIL; DIVERSITY;
D O I
10.1111/1755-0998.13709
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Clustering approaches are pivotal to handle the many sequence variants obtained in DNA metabarcoding data sets, and therefore they have become a key step of metabarcoding analysis pipelines. Clustering often relies on a sequence similarity threshold to gather sequences into molecular operational taxonomic units (MOTUs), each of which ideally represents a homogeneous taxonomic entity (e.g., a species or a genus). However, the choice of the clustering threshold is rarely justified, and its impact on MOTU over-splitting or over-merging even less tested. Here, we evaluated clustering threshold values for several metabarcoding markers under different criteria: limitation of MOTU over-merging, limitation of MOTU over-splitting, and trade-off between over-merging and over-splitting. We extracted sequences from a public database for nine markers, ranging from generalist markers targeting Bacteria or Eukaryota, to more specific markers targeting a class or a subclass (e.g., Insecta, Oligochaeta). Based on the distributions of pairwise sequence similarities within species and within genera, and on the rates of over-splitting and over-merging across different clustering thresholds, we were able to propose threshold values minimizing the risk of over-splitting, that of over-merging, or offering a trade-off between the two risks. For generalist markers, high similarity thresholds (0.96-0.99) are generally appropriate, while more specific markers require lower values (0.85-0.96). These results do not support the use of a fixed clustering threshold. Instead, we advocate careful examination of the most appropriate threshold based on the research objectives, the potential costs of over-splitting and over-merging, and the features of the studied markers.
引用
收藏
页码:368 / 381
页数:14
相关论文
共 58 条
[1]   Scrutinizing key steps for reliable metabarcoding of environmental samples [J].
Alberdi, Antton ;
Aizpurua, Ostaizka ;
Gilbert, M. Thomas P. ;
Bohmann, Kristine .
METHODS IN ECOLOGY AND EVOLUTION, 2018, 9 (01) :134-147
[2]   To denoise or to cluster, that is not the question: optimizing pipelines for COI metabarcoding and metaphylogeography [J].
Antich, Adria ;
Palacin, Creu ;
Wangensteen, Owen S. ;
Turon, Xavier .
BMC BIOINFORMATICS, 2021, 22 (01)
[3]   Assessment of the Food Habits of the Moroccan Dorcas Gazelle in M'Sabih Talaa, West Central Morocco, Using the trnL Approach [J].
Baamrane, Moulay Abdeljalil Ait ;
Shehzad, Wasim ;
Ouhammou, Ahmed ;
Abbad, Abdelaziz ;
Naimi, Mohamed ;
Coissac, Eric ;
Taberlet, Pierre ;
Znari, Mohammed .
PLOS ONE, 2012, 7 (04)
[4]   Millions of reads, thousands of taxa: microbial community structure and associations analyzed via marker genes [J].
Balint, Miklos ;
Bahram, Mohammad ;
Eren, A. Murat ;
Faust, Karoline ;
Fuhrman, Jed A. ;
Lindahl, Bjorn ;
O'Hara, Robert B. ;
Opik, Maarja ;
Sogin, Mitchell L. ;
Unterseher, Martin ;
Tedersoo, Leho .
FEMS MICROBIOLOGY REVIEWS, 2016, 40 (05) :686-700
[5]   A critical analysis of state-of-the-art metagenomics OTU clustering algorithms [J].
Bhat, Ashaq Hussain ;
Prabhu, Puniethaa ;
Balakrishnan, Kalpana .
JOURNAL OF BIOSCIENCES, 2019, 44 (06)
[6]  
Bidartondo MI, 2008, SCIENCE, V319, P1616, DOI 10.1126/science.319.5870.1616a
[7]   Tracking earthworm communities from soil DNA [J].
Bienert, Friederike ;
De Danieli, Sebastien ;
Miquel, Christian ;
Coissac, Eric ;
Poillot, Carole ;
Brun, Jean-Jacques ;
Taberlet, Pierre .
MOLECULAR ECOLOGY, 2012, 21 (08) :2017-2030
[8]   Strategies for sample labelling and library preparation in DNA metabarcoding studies [J].
Bohmann, Kristine ;
Elbrecht, Vasco ;
Caroe, Christian ;
Bista, Iliana ;
Leese, Florian ;
Bunce, Michael ;
Yu, Douglas W. ;
Seymour, Mathew ;
Dumbrell, Alex J. ;
Creer, Simon .
MOLECULAR ECOLOGY RESOURCES, 2022, 22 (04) :1231-1246
[9]   Sequence clustering threshold has little effect on the recovery of microbial community structure [J].
Botnen, Synnove Smebye ;
Davey, Marie Louise ;
Halvorsen, Rune ;
Kauserud, Havard .
MOLECULAR ECOLOGY RESOURCES, 2018, 18 (05) :1064-1076
[10]   OBITOOLS: a UNIX-inspired software package for DNA metabarcoding [J].
Boyer, Frederic ;
Mercier, Celine ;
Bonin, Aurelie ;
Le Bras, Yvan ;
Taberlet, Pierre ;
Coissac, Eric .
MOLECULAR ECOLOGY RESOURCES, 2016, 16 (01) :176-182