Machine learning-based protein crystal detection for monitoring of crystallization processes enabled with large-scale synthetic data sets of photorealistic images

被引:10
|
作者
Bischoff, Daniel [1 ]
Walla, Brigitte [1 ]
Weuster-Botz, Dirk [1 ]
机构
[1] Tech Univ Munich, Inst Biochem Engn, Boltzmannstr 15, D-85748 Garching, Germany
关键词
Protein crystallization; Automated image analysis; Synthetic data sets; Deep learning; PARTICLE-SIZE DISTRIBUTIONS; ONLINE MEASUREMENT; SHAPE; SEGMENTATION;
D O I
10.1007/s00216-022-04101-8
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Since preparative chromatography is a sustainability challenge due to large amounts of consumables used in downstream processing of biomolecules, protein crystallization offers a promising alternative as a purification method. While the limited crystallizability of proteins often restricts a broad application of crystallization as a purification method, advances in molecular biology, as well as computational methods are pushing the applicability towards integration in biotechnological downstream processes. However, in industrial and academic settings, monitoring protein crystallization processes non-invasively by microscopic photography and automated image evaluation remains a challenging problem. Recently, the identification of single crystal objects using deep learning has been the subject of increased attention for various model systems. However, the advancement of crystal detection using deep learning for biotechnological applications is limited: robust models obtained through supervised machine learning tasks require large-scale and high-quality data sets usually obtained in large projects through extensive manual labeling, an approach that is highly error-prone for dense systems of transparent crystals. For the first time, recent trends involving the use of synthetic data sets for supervised learning are transferred, thus generating photorealistic images of virtual protein crystals in suspension (PCS) through the use of ray tracing algorithms, accompanied by specialized data augmentations modelling experimental noise. Further, it is demonstrated that state-of-the-art models trained with the large-scale synthetic PCS data set outperform similar fine-tuned models based on the average precision metric on a validation data set, followed by experimental validation using high-resolution photomicrographs from stirred tank protein crystallization processes.
引用
收藏
页码:6379 / 6391
页数:13
相关论文
共 50 条
  • [21] Machine learning-based predictive control using noisy data: evaluating performance and robustness via a large-scale process simulator
    Wu, Zhe
    Luo, Junwei
    Rincon, David
    Christofides, Panagiotis D.
    CHEMICAL ENGINEERING RESEARCH & DESIGN, 2021, 168 : 275 - 287
  • [22] Machine learning based survival prediction in Glioma using large-scale registry data
    Zhao, Rachel
    Zhuge, Ying
    Camphausen, Kevin
    Krauze, Andra, V
    HEALTH INFORMATICS JOURNAL, 2022, 28 (04)
  • [23] CAS Landslide Dataset: A Large-Scale and Multisensor Dataset for Deep Learning-Based Landslide Detection
    Xu, Yulin
    Ouyang, Chaojun
    Xu, Qingsong
    Wang, Dongpo
    Zhao, Bo
    Luo, Yutao
    SCIENTIFIC DATA, 2024, 11 (01)
  • [24] CAS Landslide Dataset: A Large-Scale and Multisensor Dataset for Deep Learning-Based Landslide Detection
    Yulin Xu
    Chaojun Ouyang
    Qingsong Xu
    Dongpo Wang
    Bo Zhao
    Yutao Luo
    Scientific Data, 11
  • [25] AMAD: Active learning-based multivariate time series anomaly detection for large-scale IT systems
    Yu, Rongwei
    Wang, Yong
    Wang, Wang
    COMPUTERS & SECURITY, 2024, 137
  • [26] Large-scale Retrieval of Bayesian Machine Learning Models for Time Series Data via Gaussian Processes
    Berns, Fabian
    Beecks, Christian
    PROCEEDINGS OF THE 12TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (KDIR), VOL 1, 2020, : 71 - 80
  • [27] Deep learning-based coagulant dosage prediction for extreme events leveraging large-scale data
    Kim, Jiwoong
    Hua, Chuanbo
    Lin, Subin
    Kang, Seoktae
    Kang, Joo-Hyon
    Park, Mi-Hyun
    JOURNAL OF WATER PROCESS ENGINEERING, 2024, 66
  • [28] Robustness evaluation of large-scale machine learning-based reduced order models for reproducing flow fields
    Higashida, Aito
    Ando, Kazuto
    Ruettgers, Mario
    Lintermann, Andreas
    Tsubokura, Makoto
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 159 : 243 - 254
  • [29] Machine Learning-Based Seismic Fragility Analysis of Large-Scale Steel Buckling Restrained Brace Frames
    Sun, Baoyin
    Zhang, Yantai
    Huang, Caigui
    CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2020, 125 (02): : 755 - 776
  • [30] Improved machine learning-based pitch controller for rated power generation in large-scale wind turbine
    Narayanan, V. Lakshmi
    Dhaked, Dheeraj Kumar
    Sitharthan, R.
    RENEWABLE ENERGY FOCUS, 2024, 50