Data-Driven Strategies for Accelerated Materials Design

被引:283
作者
Pollice, Robert [1 ,2 ]
Gomes, Gabriel dos Passos [1 ,2 ]
Aldeghi, Matteo [1 ,2 ,3 ]
Hickman, Riley J. [1 ,2 ]
Krenn, Mario [1 ,2 ,3 ]
Lavigne, Cyrille [1 ,2 ]
Lindner-D'Addario, Michael [1 ,2 ]
Nigam, AkshatKumar [1 ,2 ]
Ser, Cher Tian [1 ,2 ]
Yao, Zhenpeng [1 ,2 ]
Aspuru-Guzik, Alan [1 ,2 ,3 ,4 ]
机构
[1] Univ Toronto, Dept Chem, Chem Phys Theory Grp, Toronto, ON M5S 3H6, Canada
[2] Univ Toronto, Dept Comp Sci, Toronto, ON M5S 3H6, Canada
[3] Vector Inst Artificial Intelligence, Toronto, ON M5G 1M1, Canada
[4] Canadian Inst Adv Res CIFAR, Toronto, ON M5G, Canada
基金
加拿大自然科学与工程研究理事会; 奥地利科学基金会; 瑞士国家科学基金会;
关键词
LIGHT-EMITTING-DIODES; CLEAN ENERGY PROJECT; ORGANIC PHOTOVOLTAICS; COMPUTATIONAL DISCOVERY; SELECTION BIAS; MICROARRAY; CANDIDATES; BATTERIES;
D O I
10.1021/acs.accounts.0c00785
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The ongoing revolution of the natural sciences by the advent of machine learning and artificial intelligence sparked significant interest in the material science community in recent years. The intrinsically high dimensionality of the space of realizable materials makes traditional approaches ineffective for large-scale explorations. Modern data science and machine learning tools developed for increasingly complicated problems are an attractive alternative. An imminent climate catastrophe calls for a clean energy transformation by overhauling current technologies within only several years of possible action available. Tackling this crisis requires the development of new materials at an unprecedented pace and scale. For example, organic photovoltaics have the potential to replace existing silicon-based materials to a large extent and open up new fields of application. In recent years, organic light-emitting diodes have emerged as state-of-the-art technology for digital screens and portable devices and are enabling new applications with flexible displays. Reticular frameworks allow the atom-precise synthesis of nanomaterials and promise to revolutionize the field by the potential to realize multifunctional nanopartides with applications from gas storage, gas separation, and electrochemical energy storage to nanomedicine. In the recent decade, significant advances in all these fields have been facilitated by the comprehensive application of simulation and machine learning for property prediction, property optimization, and chemical space exploration enabled by considerable advances in computing power and algorithmic efficiency. In this Account, we review the most recent contributions of our group in this thriving field of machine learning for material science. We start with a summary of the most important material classes our group has been involved in, focusing on small molecules as organic electronic materials and crystalline materials. Specifically, we highlight the data-driven approaches we employed to speed up discovery and derive material design strategies. Subsequently, our focus lies on the data-driven methodologies our group has developed and employed, elaborating on high-throughput virtual screening, inverse molecular design, Bayesian optimization, and supervised learning. We discuss the general ideas, their working principles, and their use cases with examples of successful implementations in data-driven material discovery and design efforts. Furthermore, we elaborate on potential pitfalls and remaining challenges of these methods. Finally, we provide a brief outlook for the field as we foresee increasing adaptation and implementation of large scale data-driven approaches in material discovery and design campaigns.
引用
收藏
页码:849 / 860
页数:12
相关论文
共 74 条
[1]   Selection bias in gene extraction on the basis of microarray gene-expression data [J].
Ambroise, C ;
McLachlan, GJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (10) :6562-6566
[2]  
[Anonymous], 2020, ASPURU GUZIK GROUPSE
[3]  
Aspuru-Guzik, INT C LEARN REPR, P2020
[4]  
Aspuru-Guzik A., 2020, ARXIV MACHINE LEARNI
[5]  
Aspuru-Guzik A., 2020, ARXIV MACHINE LEARNI, P12127
[6]  
Aspuru-Guzik A., 2018, ARXIV MACHINE LEARNI
[7]  
Aspuru-Guzik A., 2020, ARXIV MACHINE LEARNI
[8]   The Matter Simulation (R)evolution [J].
Aspuru-Guzik, Alan ;
Lindh, Roland ;
Reiher, Markus .
ACS CENTRAL SCIENCE, 2018, 4 (02) :144-152
[9]   New developments in the Inorganic Crystal Structure Database (ICSD): accessibility in support of materials research and design [J].
Belsky, A ;
Hellenbrandt, M ;
Karen, VL ;
Luksch, P .
ACTA CRYSTALLOGRAPHICA SECTION B-STRUCTURAL SCIENCE, 2002, 58 :364-369
[10]   Identification Schemes for Metal-Organic Frameworks To Enable Rapid Search and Cheminformatics Analysis [J].
Bucior, Benjamin J. ;
Rosen, Andrew S. ;
Haranczyk, Maciej ;
Yao, Zhenpeng ;
Ziebel, Michael E. ;
Farha, Omar K. ;
Hupp, Joseph T. ;
Siepmann, J. Ilja ;
Aspuru-Guzik, Alan ;
Snurr, Randall Q. .
CRYSTAL GROWTH & DESIGN, 2019, 19 (11) :6682-6697