Machine learning-based protein crystal detection for monitoring of crystallization processes enabled with large-scale synthetic data sets of photorealistic images

被引:10
|
作者
Bischoff, Daniel [1 ]
Walla, Brigitte [1 ]
Weuster-Botz, Dirk [1 ]
机构
[1] Tech Univ Munich, Inst Biochem Engn, Boltzmannstr 15, D-85748 Garching, Germany
关键词
Protein crystallization; Automated image analysis; Synthetic data sets; Deep learning; PARTICLE-SIZE DISTRIBUTIONS; ONLINE MEASUREMENT; SHAPE; SEGMENTATION;
D O I
10.1007/s00216-022-04101-8
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Since preparative chromatography is a sustainability challenge due to large amounts of consumables used in downstream processing of biomolecules, protein crystallization offers a promising alternative as a purification method. While the limited crystallizability of proteins often restricts a broad application of crystallization as a purification method, advances in molecular biology, as well as computational methods are pushing the applicability towards integration in biotechnological downstream processes. However, in industrial and academic settings, monitoring protein crystallization processes non-invasively by microscopic photography and automated image evaluation remains a challenging problem. Recently, the identification of single crystal objects using deep learning has been the subject of increased attention for various model systems. However, the advancement of crystal detection using deep learning for biotechnological applications is limited: robust models obtained through supervised machine learning tasks require large-scale and high-quality data sets usually obtained in large projects through extensive manual labeling, an approach that is highly error-prone for dense systems of transparent crystals. For the first time, recent trends involving the use of synthetic data sets for supervised learning are transferred, thus generating photorealistic images of virtual protein crystals in suspension (PCS) through the use of ray tracing algorithms, accompanied by specialized data augmentations modelling experimental noise. Further, it is demonstrated that state-of-the-art models trained with the large-scale synthetic PCS data set outperform similar fine-tuned models based on the average precision metric on a validation data set, followed by experimental validation using high-resolution photomicrographs from stirred tank protein crystallization processes.
引用
收藏
页码:6379 / 6391
页数:13
相关论文
共 50 条
  • [41] Large-scale randomized experiments reveals that machine learning-based instruction helps people memorize more effectively
    Upadhyay, Utkarsh
    Lancashire, Graham
    Moser, Christoph
    Gomez-Rodriguez, Manuel
    NPJ SCIENCE OF LEARNING, 2021, 6 (01)
  • [42] Uncovering co-regulatory modules and gene regulatory networks in the heart through machine learning-based analysis of large-scale epigenomic data
    Vahab, Naima
    Bonu, Tarun
    Kuhlmann, Levin
    Ramialison, Mirana
    Tyagi, Sonika
    COMPUTERS IN BIOLOGY AND MEDICINE, 2024, 171
  • [43] Large-scale secure model learning and inference using synthetic data for IoT-based big data analytics
    Tekchandani, Prakash
    Das, Ashok Kumar
    Kumar, Neeraj
    COMPUTERS & ELECTRICAL ENGINEERING, 2024, 119
  • [44] Efficient detection of data entry errors in large-scale public health surveys: an unsupervised machine learning approach
    Sau, Arkaprabha
    Phadikar, Santanu
    Bhakta, Ishita
    DISCOVER PUBLIC HEALTH, 2024, 21 (01)
  • [45] A data-driven layout optimization framework of large-scale wind farms based on machine learning
    Yang, Kun
    Deng, Xiaowei
    Ti, Zilong
    Yang, Shanghui
    Huang, Senbin
    Wang, Yuhang
    RENEWABLE ENERGY, 2023, 218
  • [46] A Fast SVD-Hidden-nodes based Extreme Learning Machine for Large-Scale Data Analytics
    Deng, Wan-Yu
    Bai, Zuo
    Huang, Guang-Bin
    Zheng, Qing-Hua
    NEURAL NETWORKS, 2016, 77 : 14 - 28
  • [47] Human mitochondrial protein complexes revealed by large-scale coevolution analysis and deep learning-based structure modeling
    Pei, Jimin
    Zhang, Jing
    Cong, Qian
    BIOINFORMATICS, 2022, 38 (18) : 4301 - 4311
  • [48] Machine learning-based integration of large-scale climate drivers can improve the forecast of seasonal rainfall probability in Australia
    Feng, Puyu
    Wang, Bin
    Liu, De Li
    Ji, Fei
    Niu, Xiaoli
    Ruan, Hongyan
    Shi, Lijie
    Yu, Qiang
    ENVIRONMENTAL RESEARCH LETTERS, 2020, 15 (08):
  • [49] Predicting the immunological nonresponse to antiretroviral therapy in people living with HIV: a machine learning-based multicenter large-scale study
    Chen, Suling
    Zhang, Lixia
    Mao, Jingchun
    Qian, Zhe
    Jiang, Yuanhui
    Gao, Xinrui
    Tao, Mingzhu
    Liang, Guangyu
    Peng, Jie
    Cai, Shaohang
    FRONTIERS IN CELLULAR AND INFECTION MICROBIOLOGY, 2025, 15
  • [50] An experiential learning-based transit route choice model using large-scale smart-card data
    Arriagada, Jacqueline
    Guevara, C. Angelo
    Munizaga, Marcela
    Gao, Song
    TRANSPORTATION, 2024,