Big data in genomic research for big questions with examples from covid-19 and other zoonoses

被引:1
作者
Wassenaar, Trudy M. [1 ]
Ussery, David W. [2 ]
Rosel, Adriana Cabal [3 ]
机构
[1] Mol Microbiol & Genom Consultants, Tannenstr 7, D-55576 Zotzenheim, Germany
[2] Univ Arkansas Med Sci, Dept Biomed Informat, 4301 W Markham St, Little Rock, AR 72205 USA
[3] Austrian Agcy Hlth & Food Safety, Inst Med Microbiol & Hyg, Div Publ Hlth, Wahringerstr 25a, A-1096 Vienna, Austria
基金
美国国家科学基金会;
关键词
omics; genomics; zoonoses; COVID-19; Salmonella; scientific publishing; big data; SALMONELLA-ENTERICA; MICROBIOME; COLI;
D O I
10.1093/jambio/lxac055
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Omics research inevitably involves the collection and analysis of big data, which can only be handled by automated approaches. Here we point out that the analysis of big data in the field of genomics dictates certain requirements, such as specialized software, quality control of input data, and simplification for visualization of the results. The latter results in a loss of information, as is exemplified for phylogenetic trees. Clear communication of big data analyses can be enhanced by novel visualization strategies. The interpretation of findings is sometimes hampered when dedicated analytical tools are not fully understood by microbiologists, while the researchers performing these analyses may not have a full overview of the biology of the microbes under study. These issues are illustrated here, using SARS-Cov-2 and Salmonella enterica as zoonotic examples. Whereas in scientific communications jargon should be avoided or explained, nomenclature to group similar organisms and distinguish these from more distant relatives is not only essential, but also influences the interpretation of results. Unfortunately, changes in taxonomically accepted names are now so frequent that they hamper rather than assist research, as is illustrated with difficulties of microbiome studies. Nomenclature to group viral isolates, as is done for SARS-Cov2, is also not without difficulties. Some weaknesses in current omics research stem from poor quality of data or biased databases, and problems can be magnified by machine learning approaches. Moreover, the overall opus of scientific publications can now be considered "big data", as is illustrated by the avalanche of COVID-19-related publications. The peer-review model of scientific publishing is only barely coping with this novel situation, resulting in retractions and the publication of bogus works. The avalanche of scientific publications that originated from the current pandemic can obstruct literature searches, and this will unfortunately continue over time.
引用
收藏
页数:13
相关论文
共 67 条
  • [1] Mash-based analyses of Escherichia coli genomes reveal 14 distinct phylogroups
    Abram, Kaleb
    Udaondo, Zulema
    Bleker, Carissa
    Wanchai, Visanu
    Wassenaar, Trudy M.
    Robeson, Michael S., II
    Ussery, David W.
    [J]. COMMUNICATIONS BIOLOGY, 2021, 4 (01)
  • [2] Forest and Trees: Exploring Bacterial Virulence with Genome-wide Association Studies and Machine Learning
    Allen, Jonathan P.
    Snitkin, Evan
    Pincus, Nathan B.
    Hauser, Alan R.
    [J]. TRENDS IN MICROBIOLOGY, 2021, 29 (07) : 621 - 633
  • [3] Geographical and temporal distribution of SARS-CoV-2 clades in the WHO European Region, January to June 2020
    Alm, Erik
    Broberg, Eeva K.
    Connor, Thomas
    Hodcroft, Emma B.
    Komissarov, Andrey B.
    Maurer-Stroh, Sebastian
    Melidou, Angeliki
    Neher, Richard A.
    O'Toole, Aine
    Pereyaslov, Dmitriy
    [J]. EUROSURVEILLANCE, 2020, 25 (32) : 7 - 14
  • [4] Applications of Machine Learning to the Problem of Antimicrobial Resistance: an Emerging Model for Translational Research
    Anahtar, Melis N.
    Yang, Jason H.
    Kanjilal, Sanjat
    [J]. JOURNAL OF CLINICAL MICROBIOLOGY, 2021, 59 (07)
  • [5] Efficient computation of Faith's phylogenetic diversity with applications in characterizing microbiomes
    Armstrong, George
    Cantrell, Kalen
    Huang, Shi
    McDonald, Daniel
    Haiminen, Niina
    Carrieri, Anna Paola
    Zhu, Qiyun
    Gonzalez, Antonio
    McGrath, Imran
    Beck, Kristen L.
    Hakim, Daniel
    Havulinna, Aki S.
    Meric, Guillaume
    Niiranen, Teemu
    Lahti, Leo
    Salomaa, Veikko
    Jain, Mohit
    Inouye, Michael
    Swafford, Austin D.
    Kim, Ho-Cheol
    Parida, Laxmi
    Vazquez-Baeza, Yoshiki
    Knight, Rob
    [J]. GENOME RESEARCH, 2021, 31 (11) : 2131 - 2137
  • [6] Host-bacterial mutualism in the human intestine
    Bäckhed, F
    Ley, RE
    Sonnenburg, JL
    Peterson, DA
    Gordon, JI
    [J]. SCIENCE, 2005, 307 (5717) : 1915 - 1920
  • [7] Constructing bi-plots for random forest: Tutorial
    Blanchet, Lionel
    Vitale, Raffaele
    van Vorstenbosch, Robert
    Stavropoulos, George
    Pender, John
    Jonkers, Daisy
    van Schooten, Frederik-Jan
    Smolinska, Agnieszka
    [J]. ANALYTICA CHIMICA ACTA, 2020, 1131 : 146 - 155
  • [8] Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2
    Bolyen, Evan
    Rideout, Jai Ram
    Dillon, Matthew R.
    Bokulich, NicholasA.
    Abnet, Christian C.
    Al-Ghalith, Gabriel A.
    Alexander, Harriet
    Alm, Eric J.
    Arumugam, Manimozhiyan
    Asnicar, Francesco
    Bai, Yang
    Bisanz, Jordan E.
    Bittinger, Kyle
    Brejnrod, Asker
    Brislawn, Colin J.
    Brown, C. Titus
    Callahan, Benjamin J.
    Caraballo-Rodriguez, Andres Mauricio
    Chase, John
    Cope, Emily K.
    Da Silva, Ricardo
    Diener, Christian
    Dorrestein, Pieter C.
    Douglas, Gavin M.
    Durall, Daniel M.
    Duvallet, Claire
    Edwardson, Christian F.
    Ernst, Madeleine
    Estaki, Mehrbod
    Fouquier, Jennifer
    Gauglitz, Julia M.
    Gibbons, Sean M.
    Gibson, Deanna L.
    Gonzalez, Antonio
    Gorlick, Kestrel
    Guo, Jiarong
    Hillmann, Benjamin
    Holmes, Susan
    Holste, Hannes
    Huttenhower, Curtis
    Huttley, Gavin A.
    Janssen, Stefan
    Jarmusch, Alan K.
    Jiang, Lingjing
    Kaehler, Benjamin D.
    Bin Kang, Kyo
    Keefe, Christopher R.
    Keim, Paul
    Kelley, Scott T.
    Knights, Dan
    [J]. NATURE BIOTECHNOLOGY, 2019, 37 (08) : 852 - 857
  • [9] Exact sequence variants should replace operational taxonomic units in marker-gene data analysis
    Callahan, Benjamin J.
    McMurdie, Paul J.
    Holmes, Susan P.
    [J]. ISME JOURNAL, 2017, 11 (12) : 2639 - 2643
  • [10] Callahan BJ, 2016, NAT METHODS, V13, P581, DOI [10.1038/NMETH.3869, 10.1038/nmeth.3869]