Improving the performance of machine learning models for biotechnology: The quest for deus ex machina

被引:11
作者
Mey, Friederike [1 ]
Clauwaert, Jim [2 ]
Van Huffel, Kirsten [1 ]
Waegeman, Willem [2 ]
De Mey, Marjan [1 ]
机构
[1] Univ Ghent, Dept Biotechnol, Ctr Synthet Biol CSB, B-9000 Ghent, Belgium
[2] Univ Ghent, Dept Data Anal & Math Modelling, KERMIT, B-9000 Ghent, Belgium
基金
欧盟地平线“2020”; 比利时弗兰德研究基金会;
关键词
Machine learning; Biotechnology; Synthetic biology; Model evaluation; METABOLIC PATHWAYS; SYNTHETIC BIOLOGY; GENE-EXPRESSION; PREDICTION; NOISE; FLUX;
D O I
10.1016/j.biotechadv.2021.107858
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Machine learning is becoming an integral part of the Design-Build-Test-Learn cycle in biotechnology. Machine learning models learn from collected datasets such as omics data and predict a defined outcome, which has led to both production improvements and predictive tools in the field. Robust prediction of the behavior of microbial cell factories and production processes not only greatly increases our understanding of the function of such systems, but also provides significant savings of development time. However, many pitfalls when modeling biological data - bad fit, noisy data, model instability, low data quantity and imbalances in the data - cause models to suffer in their performance. Here we provide an accessible, in-depth analysis on the problems created by these pitfalls, as well as means of their detection and mediation, with a focus on supervised learning. Assessing the state of the art, we show that, currently, in-depth analyses of model performance are often absent and must be improved. This review provides a toolbox for the analysis of model robustness and performance, and simultaneously proposes a standard for the community to facilitate future work. It is further accompanied by an interactive online tutorial on the discussed issues.
引用
收藏
页数:10
相关论文
共 90 条
  • [1] Principal component analysis of proteomics (PCAP) as a tool to direct metabolic engineering
    Alonso-Gutierrez, Jorge
    Kim, Eun-Mi
    Batth, Tanveer S.
    Cho, Nathan
    Hu, Qijun
    Chan, Leanne Jade G.
    Petzold, Christopher J.
    Hinson, Nathan J.
    Adams, Paul D.
    Keasling, Jay D.
    Martin, Hector Garcia
    Lee, Taek Soon
    [J]. METABOLIC ENGINEERING, 2015, 28 : 123 - 133
  • [2] The era of big data: Genome-scale modelling meets machine learning
    Antonakoudis, Athanasios
    Barbosa, Rodrigo
    Kotidis, Pavlos
    Kontoravdi, Cleo
    [J]. COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2020, 18 : 3287 - 3300
  • [3] About and beyond the Henri-Michaelis-Menten rate equation for single-substrate enzyme kinetics
    Bajzer, Zeljko
    Strehler, Emanuel E.
    [J]. BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2012, 417 (03) : 982 - 985
  • [4] Machine Learning-Assisted Network Inference Approach to Identify a New Class of Genes that Coordinate the Functionality of Cancer Networks
    Bari, Mehrab Ghanat
    Ung, Choong Yong
    Zhang, Cheng
    Zhu, Shizhen
    Li, Hu
    [J]. SCIENTIFIC REPORTS, 2017, 7
  • [5] Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation
    Belkin, Mikhail
    [J]. ACTA NUMERICA, 2021, 30 : 203 - 248
  • [6] Reconciling modern machine-learning practice and the classical bias-variance trade-off
    Belkin, Mikhail
    Hsu, Daniel
    Ma, Siyuan
    Mandal, Soumik
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2019, 116 (32) : 15849 - 15854
  • [7] Bengio Y, 2004, J MACH LEARN RES, V5, P1089
  • [8] antiSMASH 5.0: updates to the secondary metabolite genome mining pipeline
    Blin, Kai
    Shaw, Simon
    Steinke, Katharina
    Villebro, Rasmus
    Ziemert, Nadine
    Lee, Sang Yup
    Medema, Marnix H.
    Weber, Tilmann
    [J]. NUCLEIC ACIDS RESEARCH, 2019, 47 (W1) : W81 - W87
  • [9] Next-Generation Machine Learning for Biological Networks
    Camacho, Diogo M.
    Collins, Katherine M.
    Powers, Rani K.
    Costello, James C.
    Collins, James J.
    [J]. CELL, 2018, 173 (07) : 1581 - 1592
  • [10] Opportunities at the Intersection of Synthetic Biology, Machine Learning, and Automation
    Carbonell, Pablo
    Radivojevic, Tijana
    Garcia Martin, Hector
    [J]. ACS SYNTHETIC BIOLOGY, 2019, 8 (07): : 1474 - 1477