A Design-to-Device Pipeline for Data-Driven Materials Discovery

被引:89
作者
Cole, Jacqueline M. [1 ,2 ,3 ,4 ]
机构
[1] Univ Cambridge, Dept Phys, Cavendish Lab, Cambridge CB3 0HE, England
[2] Univ Cambridge, Dept Chem Engn & Biotechnol, Cambridge CB3 0HE, England
[3] STFC Rutherford Appleton Lab, ISIS Neutron & Muon Source, Didcot OX11 0QX, Oxon, England
[4] Univ Oxford, Math Inst, Oxford OX2 6GG, England
关键词
D O I
10.1021/acs.accounts.9b00470
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
CONSPECTUS: The world needs new materials to stimulate the chemical industry in key sectors of our economy: environment and sustainability, information storage, optical telecommunications, and catalysis. Yet, nearly all functional materials are still discovered by "trial-and-error", of which the lack of predictability affords a major materials bottleneck to technological innovation. The average "molecule-to-market" lead time for materials discovery is currently 20 years. This is far too long for industrial needs, as highlighted by the Materials Genome Initiative, which has ambitious targets of up to 4-fold reductions in average molecule-to-market lead times. Such a large step change in progress can only be realistically achieved if one adopts an entirely new approach to materials discovery. Fortunately, a fundamentally new approach to materials discovery has been emerging, whereby data science with artificial intelligence offers a prospective solution to speed up these average molecule-to-market lead times. This approach is known as data-driven materials discovery. Its broad prospects have only recently become a reality, given the timely and major advances in "big data", artificial intelligence, and high-performance computing (HPC). Access to massive data sets has been stimulated by government-regulated open-access requirements for data and literature. Natural-language processing (NLP) and machine-learning (ML) tools that can mine data and find patterns therein are becoming mainstream. Exascale HPC capabilities that can aid data mining and pattern recognition and also generate their own data from calculations are now within our grasp. These timely advances present an ideal opportunity to develop data-driven materials-discovery strategies to systematically design and predict new chemicals for a given device application. This Account shows how data science can afford materials discovery via a four-step "design-to-device" pipeline that entails (1) data extraction, (2) data enrichment, (3) material prediction, and (4) experimental validation. Massive databases of cognate chemical and property information are first forged from "chemistry-aware" natural-language-processing tools, such as ChemDataExtractor, and enriched using machine-learning methods and high-throughput quantum-chemical calculations. New materials for a bespoke application can then be predicted by mining these databases with algorithmic encodings of relationships between chemical structures and physical properties that are known to deliver functional materials. These may take the form of classification, enumeration, or machine-learning algorithms. A data-mining workflow short-lists these predictions to a handful of lead candidate materials that go forward to experimental validation. This design-to-device approach is being developed to offer a roadmap for the accelerated discovery of new chemicals for functional applications. Case studies presented demonstrate its utility for photovoltaic, optical, and catalytic applications. While this Account is focused on applications in the physical sciences, the generic pipeline discussed is readily transferable to other scientific disciplines such as biology and medicine.
引用
收藏
页码:599 / 610
页数:12
相关论文
共 48 条
[1]  
Agichtein E., 2000, P 5 ACM C DIG LIB SA
[2]   Perspective: Materials informatics and big data: Realization of the "fourth paradigm" of science in materials science [J].
Agrawal, Ankit ;
Choudhary, Alok .
APL MATERIALS, 2016, 4 (05)
[3]   The 2019 materials by design roadmap [J].
Alberi, Kirstin ;
Nardelli, Marco Buongiorno ;
Zakutayev, Andriy ;
Mitas, Lubos ;
Curtarolo, Stefano ;
Jain, Anubhav ;
Fornari, Marco ;
Marzari, Nicola ;
Takeuchi, Ichiro ;
Green, Martin L. ;
Kanatzidis, Mercouri ;
Toney, Mike F. ;
Butenko, Sergiy ;
Meredig, Bryce ;
Lany, Stephan ;
Kattner, Ursula ;
Davydov, Albert ;
Toberer, Eric S. ;
Stevanovic, Vladan ;
Walsh, Aron ;
Park, Nam-Gyu ;
Aspuru-Guzik, Alan ;
Tabor, Daniel P. ;
Nelson, Jenny ;
Murphy, James ;
Setlur, Anant ;
Gregoire, John ;
Li, Hong ;
Xiao, Ruijuan ;
Ludwig, Alfred ;
Martin, Lane W. ;
Rappe, Andrew M. ;
Wei, Su-Huai ;
Perkins, John .
JOURNAL OF PHYSICS D-APPLIED PHYSICS, 2019, 52 (01)
[4]  
[Anonymous], 2011, Materials Genome Initiative for Global Competitiveness
[5]  
[Anonymous], 2007, J MED CHEM, DOI DOI 10.1021/jm061259g
[6]  
Ashcroft CM, 2019, WOODH PUB SER ELECT, P139, DOI 10.1016/B978-0-08-102284-9.00005-X
[7]   Comparative dataset of experimental and computational attributes of UV/vis absorption spectra [J].
Beard, Edward J. ;
Sivaraman, Ganesh ;
Vazquez-Mayagoitia, Alvaro ;
Vishwanath, Venkatram ;
Cole, Jacqueline M. .
SCIENTIFIC DATA, 2019, 6 (1)
[8]   De novo exploration and self-guided learning of potential-energy surfaces [J].
Bernstein, Noam ;
Csanyi, Gabor ;
Deringer, Volker L. .
NPJ COMPUTATIONAL MATERIALS, 2019, 5 (1)
[9]   Machine learning for molecular and materials science [J].
Butler, Keith T. ;
Davies, Daniel W. ;
Cartwright, Hugh ;
Isayev, Olexandr ;
Walsh, Aron .
NATURE, 2018, 559 (7715) :547-555
[10]   Cosensitization in Dye-Sensitized Solar Cells [J].
Cole, Jacqueline M. ;
Pepe, Giulio ;
Al Bahri, Othman K. ;
Cooper, Christopher B. .
CHEMICAL REVIEWS, 2019, 119 (12) :7279-7327