Machine learning-based protein crystal detection for monitoring of crystallization processes enabled with large-scale synthetic data sets of photorealistic images

被引:10
|
作者
Bischoff, Daniel [1 ]
Walla, Brigitte [1 ]
Weuster-Botz, Dirk [1 ]
机构
[1] Tech Univ Munich, Inst Biochem Engn, Boltzmannstr 15, D-85748 Garching, Germany
关键词
Protein crystallization; Automated image analysis; Synthetic data sets; Deep learning; PARTICLE-SIZE DISTRIBUTIONS; ONLINE MEASUREMENT; SHAPE; SEGMENTATION;
D O I
10.1007/s00216-022-04101-8
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Since preparative chromatography is a sustainability challenge due to large amounts of consumables used in downstream processing of biomolecules, protein crystallization offers a promising alternative as a purification method. While the limited crystallizability of proteins often restricts a broad application of crystallization as a purification method, advances in molecular biology, as well as computational methods are pushing the applicability towards integration in biotechnological downstream processes. However, in industrial and academic settings, monitoring protein crystallization processes non-invasively by microscopic photography and automated image evaluation remains a challenging problem. Recently, the identification of single crystal objects using deep learning has been the subject of increased attention for various model systems. However, the advancement of crystal detection using deep learning for biotechnological applications is limited: robust models obtained through supervised machine learning tasks require large-scale and high-quality data sets usually obtained in large projects through extensive manual labeling, an approach that is highly error-prone for dense systems of transparent crystals. For the first time, recent trends involving the use of synthetic data sets for supervised learning are transferred, thus generating photorealistic images of virtual protein crystals in suspension (PCS) through the use of ray tracing algorithms, accompanied by specialized data augmentations modelling experimental noise. Further, it is demonstrated that state-of-the-art models trained with the large-scale synthetic PCS data set outperform similar fine-tuned models based on the average precision metric on a validation data set, followed by experimental validation using high-resolution photomicrographs from stirred tank protein crystallization processes.
引用
收藏
页码:6379 / 6391
页数:13
相关论文
共 50 条
  • [1] Machine learning-based protein crystal detection for monitoring of crystallization processes enabled with large-scale synthetic data sets of photorealistic images
    Daniel Bischoff
    Brigitte Walla
    Dirk Weuster-Botz
    Analytical and Bioanalytical Chemistry, 2022, 414 : 6379 - 6391
  • [2] Large-Scale Data Learning Method for Anomaly Detection using Machine Learning for Monitoring Vibration in Vehicle Equipment
    Kondo M.
    IEEJ Transactions on Industry Applications, 2020, 140 (06): : 480 - 487
  • [3] Machine Learning-Based Online MPC for Large-Scale Charging Infrastructure Management
    Mejdi, Lazher
    Kardous, Faten
    Grayaa, Khaled
    IEEE ACCESS, 2024, 12 : 36896 - 36907
  • [4] A machine learning-based method for the large-scale evaluation of the qualities of the urban environment
    Liu, Lun
    Silva, Elisabete A.
    Wu, Chunyang
    Wang, Hui
    COMPUTERS ENVIRONMENT AND URBAN SYSTEMS, 2017, 65 : 113 - 125
  • [5] Large-scale data classification method based on machine learning model
    Department of Electrical Engineering, Dalian Institute of Science and Technology, Dalian, China
    Int. J. Database Theory Appl., 2 (71-80):
  • [6] Functional Network Construction in Arabidopsis Using Rule-Based Machine Learning on Large-Scale Data Sets
    Bassel, George W.
    Glaab, Enrico
    Marquez, Julietta
    Holdsworth, Michael J.
    Bacardit, Jaume
    PLANT CELL, 2011, 23 (09): : 3101 - 3116
  • [7] Efficient Distributed Preprocessing Model for Machine Learning-Based Anomaly Detection over Large-Scale Cybersecurity Datasets
    Larriva-Novo, Xavier
    Vega-Barbas, Mario
    Villagra, Victor A.
    Rivera, Diego
    Alvarez-Campana, Manuel
    Berrocal, Julio
    APPLIED SCIENCES-BASEL, 2020, 10 (10):
  • [8] Machine Learning-based Anomaly Detection of Ganglia Monitoring Data in HEP Data Center
    Chen, Juan
    Wang, Lu
    Hu, Qingbao
    24TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP 2019), 2020, 245
  • [9] Community detection based process decomposition and distributed monitoring for large-scale processes
    Yin, Xunyuan
    Qin, Yan
    Chen, Hongtian
    Du, Wenli
    Liu, Jinfeng
    Huang, Biao
    AICHE JOURNAL, 2022, 68 (11)
  • [10] Deep Learning-Based Sentimental Analysis for Large-Scale Imbalanced Twitter Data
    Jamal, Nasir
    Chen, Xianqiao
    Aldabbas, Hamza
    FUTURE INTERNET, 2019, 11 (09)