Deep Reinforcement Learning-Based Self-Optimization of Flow Chemistry

被引：0

作者：

Yewale, Ashish ^{[1
]}

Yang, Yihui ^{[2
]}

Nazemifard, Neda ^{[2
]}

Papageorgiou, Charles D. ^{[2
]}

Rielly, Chris D. ^{[1
]}

Benyahia, Brahim ^{[1
]}

机构：

[1] Loughborough Univ, Dept Chem Engn, Loughborough LE11 3TU, Leicestershire, England

[2] Takeda Pharmaceut Int Co, Synthet Mol Proc Dev Proc Engn & Technol, Cambridge, MA 02139 USA

来源：

ACS ENGINEERING AU | 2025年 / 5卷 / 03期

基金：

英国工程与自然科学研究理事会;

关键词：

flow chemistry; self-optimization; deep reinforcementlearning; deep deterministic policy gradient; adaptivehyperparameter tuning; Bayesian optimization; WIDE DYNAMIC-MODEL; MULTIOBJECTIVE OPTIMIZATION; SELECTION; DRUG;

D O I：

10.1021/acsengineeringau.5c00004

中图分类号：

TQ [化学工业];

学科分类号：

0817 ;

摘要：

The development of effective synthetic pathways is critical in many industrial sectors. The growing adoption of flow chemistry has opened new opportunities for more cost-effective and environmentally friendly manufacturing technologies. However, the development of effective flow chemistry processes is still hampered by labor- and experiment-intensive methodologies and poor or suboptimal performance. In this context, integrating advanced machine learning strategies into chemical process optimization can significantly reduce experimental burdens and enhance overall efficiency. This paper demonstrates the capabilities of deep reinforcement learning (DRL) as an effective self-optimization strategy for imine synthesis in flow, a key building block in many compounds such as pharmaceuticals and heterocyclic products. A deep deterministic policy gradient (DDPG) agent was designed to iteratively interact with the environment, the flow reactor, and learn how to deliver optimal operating conditions. A mathematical model of the reactor was developed based on new experimental data to train the agent and evaluate alternative self-optimization strategies. To optimize the DDPG agent's training performance, different hyperparameter tuning methods were investigated and compared, including trial-and-error and Bayesian optimization. Most importantly, a novel adaptive dynamic hyperparameter tuning was implemented to further enhance the training performance and optimization outcome of the agent. The performance of the proposed DRL strategy was compared against state-of-the-art gradient-free methods, namely SnobFit and Nelder-Mead. Finally, the outcomes of the different self-optimization strategies were tested experimentally. It was shown that the proposed DDPG agent has superior performance compared to its self-optimization counterparts. It offered better tracking of the global solution and reduced the number of required experiments by approximately 50 and 75% compared to Nelder-Mead and SnobFit, respectively. These findings hold significant promise for the chemical engineering community, offering a robust, efficient, and sustainable approach to optimizing flow chemistry processes and paving the way for broader integration of data-driven methods in process design and operation.

引用

页码：247 / 266

页数：20

共 75 条

[1] Multi-Agent DRL-based Multi-Objective Demand Response Optimization for Real-Time Energy Management in Smart Homes [J].

Abishu, Hayla Nahom ;

Seid, Abegaz Mohammed ;

Marquez-Sanchez, Sergio ;

Fernandez, Javier Hernandez ;

Corchado, Juan Manuel ;

Erbad, Aiman .

20TH INTERNATIONAL WIRELESS COMMUNICATIONS & MOBILE COMPUTING CONFERENCE, IWCMC 2024, 2024, :1210-1217

[2] Deep Q-Learning Based Reinforcement Learning Approach for Network Intrusion Detection [J].

Alavizadeh, Hooman ;

Alavizadeh, Hootan ;

Jang-Jaccard, Julian .

COMPUTERS, 2022, 11 (03)

[3]

Aliaksandr D., 2018, J PHARM PHARMACOL, V6, P989, DOI [DOI 10.17265/2328-2150/2018.12.002, 10.17265/2328-2150/2018.12.002]

[4] Machine learning and molecular descriptors enable rational solvent selection in asymmetric catalysis [J].

Amar, Yehia ;

Schweidtmann, ArturM. ;

Deutsch, Paul ;

Cao, Liwei ;

Lapkin, Alexei .

CHEMICAL SCIENCE, 2019, 10 (27) :6697-6706

[5]

Anandan P.D., 2022, Computer Aided Chemical Engineering, V51, P1093

[6] Closed-loop optimization of general reaction conditions for heteroaryl Suzuki-Miyaura coupling [J].

Angello, Nicholas H. ;

Rathore, Vandana ;

Beker, Wiktor ;

Wolos, Agnieszka ;

Jira, Edward R. ;

Roszak, Rafal ;

Wu, Tony C. ;

Schroeder, Charles M. ;

Aspuru-Guzik, Alan ;

Grzybowski, Bartosz A. ;

Burke, Martin D. .

SCIENCE, 2022, 378 (6618) :399-405

[7] Industry 4.0 for pharmaceutical manufacturing: Preparing for the smart factories of the future [J].

Arden, N. Sarah ;

Fisher, Adam C. ;

Tyner, Katherine ;

Yu, Lawrence X. ;

Lee, Sau L. ;

Kopcha, Michael .

INTERNATIONAL JOURNAL OF PHARMACEUTICS, 2021, 602

[8] Deep Reinforcement Learning A brief survey [J].

Arulkumaran, Kai ;

Deisenroth, Marc Peter ;

Brundage, Miles ;

Bharath, Anil Anthony .

IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (06) :26-38

[9] Automated stopped-flow library synthesis for rapid optimisation and machine learning directed experimentation [J].

Avila, Claudio ;

Cassani, Carlo ;

Kogej, Thierry ;

Mazuela, Javier ;

Sarda, Sunil ;

Clayton, Adam D. ;

Kossenjans, Michael ;

Green, Clive P. ;

Bourne, Richard A. .

CHEMICAL SCIENCE, 2022, 13 (41) :12087-12099

[10] Optimum catalyst selection over continuous and discrete process variables with a single droplet microfluidic reaction platform [J].

Baumgartner, Lorenz M. ;

Coley, Connor W. ;

Reizman, Brandon J. ;

Gao, Kevin W. ;

Jensen, Klavs F. .

REACTION CHEMISTRY & ENGINEERING, 2018, 3 (03) :301-311

← 1 2 3 4 5 6 7 8 →