Data-driven materials research enabled by natural language processing and information extraction

被引:179
作者
Olivetti, Elsa A. [1 ]
Cole, Jacqueline M. [2 ,3 ,4 ]
Kim, Edward [5 ]
Kononova, Olga [6 ,7 ]
Ceder, Gerbrand [6 ,7 ]
Han, Thomas Yong-Jin [8 ]
Hiszpanski, Anna M. [8 ]
机构
[1] MIT, Dept Mat Sci & Engn, Cambridge, MA 02139 USA
[2] Univ Cambridge, Dept Phys, Cavendish Lab, JJ Thomson Ave, Cambridge CB3 0HE, England
[3] Rutherford Appleton Lab, ISIS Neutron & Muon Source, Harwell Sci & Innovat Campus, Didcot OX11 0QX, Oxon, England
[4] Univ Cambridge, Dept Chem Engn & Biotechnol, West Cambridge Site,Philippa Fawcett Dr, Cambridge CB3 0AS, England
[5] Xero, Sci Evaluat & Measurement, Toronto, ON M5H 4G1, Canada
[6] Univ Calif Berkeley, Dept Mat Sci & Engn, Berkeley, CA 94720 USA
[7] Lawrence Berkeley Natl Lab, Mat Sci Div, Berkeley, CA 94720 USA
[8] Lawrence Livermore Natl Lab, Div Mat Sci, Livermore, CA 94550 USA
基金
美国国家科学基金会; 英国科学技术设施理事会;
关键词
RECOGNITION; DESIGN; INFRASTRUCTURE; DISCOVERY; KNOWLEDGE; PLATFORM; SYSTEM; GENOME;
D O I
10.1063/5.0021106
中图分类号
O59 [应用物理学];
学科分类号
摘要
Given the emergence of data science and machine learning throughout all aspects of society, but particularly in the scientific domain, there is increased importance placed on obtaining data. Data in materials science are particularly heterogeneous, based on the significant range in materials classes that are explored and the variety of materials properties that are of interest. This leads to data that range many orders of magnitude, and these data may manifest as numerical text or image-based information, which requires quantitative interpretation. The ability to automatically consume and codify the scientific literature across domains-enabled by techniques adapted from the field of natural language processing-therefore has immense potential to unlock and generate the rich datasets necessary for data science and machine learning. This review focuses on the progress and practices of natural language processing and text mining of materials science literature and highlights opportunities for extracting additional information beyond text contained in figures and tables in articles. We discuss and provide examples for several reasons for the pursuit of natural language processing for materials, including data compilation, hypothesis development, and understanding the trends within and across fields. Current and emerging natural language processing methods along with their applications to materials science are detailed. We, then, discuss natural language processing and data challenges within the materials science domain where future directions may prove valuable.
引用
收藏
页数:19
相关论文
共 138 条
[1]   LIGO: the Laser Interferometer Gravitational-Wave Observatory [J].
Abbott, B. P. ;
Abbott, R. ;
Adhikari, R. ;
Ajith, P. ;
Allen, B. ;
Allen, G. ;
Amin, R. S. ;
Anderson, S. B. ;
Anderson, W. G. ;
Arain, M. A. ;
Araya, M. ;
Armandula, H. ;
Armor, P. ;
Aso, Y. ;
Aston, S. ;
Aufmuth, P. ;
Aulbert, C. ;
Babak, S. ;
Baker, P. ;
Ballmer, S. ;
Barker, C. ;
Barker, D. ;
Barr, B. ;
Barriga, P. ;
Barsotti, L. ;
Barton, M. A. ;
Bartos, I. ;
Bassiri, R. ;
Bastarrika, M. ;
Behnke, B. ;
Benacquista, M. ;
Betzwieser, J. ;
Beyersdorf, P. T. ;
Bilenko, I. A. ;
Billingsley, G. ;
Biswas, R. ;
Black, E. ;
Blackburn, J. K. ;
Blackburn, L. ;
Blair, D. ;
Bland, B. ;
Bodiya, T. P. ;
Bogue, L. ;
Bork, R. ;
Boschi, V. ;
Bose, S. ;
Brady, P. R. ;
Braginsky, V. B. ;
Brau, J. E. ;
Bridges, D. O. .
REPORTS ON PROGRESS IN PHYSICS, 2009, 72 (07)
[2]   Virgo: a laser interferometer to detect gravitational waves [J].
Accadia, T. ;
Acernese, F. ;
Alshourbagy, M. ;
Amico, P. ;
Antonucci, F. ;
Aoudia, S. ;
Arnaud, N. ;
Arnault, C. ;
Arun, K. G. ;
Astone, P. ;
Avino, S. ;
Babusci, D. ;
Ballardin, G. ;
Barone, F. ;
Barrand, G. ;
Barsotti, L. ;
Barsuglia, M. ;
Basti, A. ;
Bauer, Th S. ;
Beauville, F. ;
Bebronne, M. ;
Bejger, M. ;
Beker, M. G. ;
Bellachia, F. ;
Belletoile, A. ;
Beney, J. L. ;
Bernardini, M. ;
Bigotta, S. ;
Bilhaut, R. ;
Birindelli, S. ;
Bitossi, M. ;
Bizouard, M. A. ;
Blom, M. ;
Boccara, C. ;
Boget, D. ;
Bondu, F. ;
Bonelli, L. ;
Bonnand, R. ;
Boschi, V. ;
Bosi, L. ;
Bouedo, T. ;
Bouhou, B. ;
Bozzi, A. ;
Bracci, L. ;
Braccini, S. ;
Bradaschia, C. ;
Branchesi, M. ;
Briant, T. ;
Brillet, A. ;
Brisson, V. .
JOURNAL OF INSTRUMENTATION, 2012, 7
[3]  
Agichtein E., 2000, ACM 2000. Digital Libraries. Proceedings of the Fifth ACM Conference on Digital Libraries, P85, DOI 10.1145/336597.336644
[4]  
Aguirre C. A., 2018, CEUR WORKSHOP P
[5]   Automatic identification of relevant chemical compounds from patents [J].
Akhondi, Saber A. ;
Rey, Hinnerk ;
Schwoerer, Markus ;
Maier, Michael ;
Toomey, John ;
Nau, Heike ;
Ilchmann, Gabriele ;
Sheehan, Mark ;
Irmer, Matthias ;
Bobach, Claudia ;
Doornenbal, Marius ;
Gregory, Michelle ;
Kors, Jan A. .
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2019,
[6]   Machine Learning in High Energy Physics Community White Paper [J].
Albertsson, Kim ;
Altoe, Piero ;
Anderson, Dustin ;
Andrews, Michael ;
Espinosa, Juan Pedro Araque ;
Aurisano, Adam ;
Basara, Laurent ;
Bevan, Adrian ;
Bhimji, Wahid ;
Bonacorsi, Daniele ;
Calafiura, Paolo ;
Campanelli, Mario ;
Capps, Louis ;
Carminati, Federico ;
Carrazza, Stefano ;
Childers, Taylor ;
Coniavitis, Elias ;
Cranmer, Kyle ;
David, Claire ;
Davis, Douglas ;
Duarte, Javier ;
Erdmann, Martin ;
Eschle, Jonas ;
Farbin, Amir ;
Feickert, Matthew ;
Castro, Nuno Filipe ;
Fitzpatrick, Conor ;
Floris, Michele ;
Forti, Alessandra ;
Garra-Tico, Jordi ;
Gemmler, Jochen ;
Girone, Maria ;
Glaysher, Paul ;
Gleyzer, Sergei ;
Gligorov, Vladimir ;
Golling, Tobias ;
Graw, Jonas ;
Gray, Lindsey ;
Greenwood, Dick ;
Hacker, Thomas ;
Harvey, John ;
Hegner, Benedikt ;
Heinrich, Lukas ;
Hooberman, Ben ;
Junggeburth, Johannes ;
Kagan, Michael ;
Kane, Meghan ;
Kanishchev, Konstantin ;
Karpinski, Przemyslaw ;
Kassabov, Zahari .
18TH INTERNATIONAL WORKSHOP ON ADVANCED COMPUTING AND ANALYSIS TECHNIQUES IN PHYSICS RESEARCH (ACAT2017), 2018, 1085
[7]  
Ali M, 2014, 2014 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMMUNICATIONS AND COMPUTING (ICSPCC), P184, DOI 10.1109/ICSPCC.2014.6986179
[8]   Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering [J].
Anderson, Peter ;
He, Xiaodong ;
Buehler, Chris ;
Teney, Damien ;
Johnson, Mark ;
Gould, Stephen ;
Zhang, Lei .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6077-6086
[9]   Polymer Informatics: Opportunities and Challenges [J].
Audus, Debra J. ;
de Pablo, Juan J. .
ACS MACRO LETTERS, 2017, 6 (10) :1078-1082
[10]   Advanced Steel Microstructural Classification by Deep Learning Methods [J].
Azimi, Seyed Majid ;
Britz, Dominik ;
Engstler, Michael ;
Fritz, Mario ;
Muecklich, Frank .
SCIENTIFIC REPORTS, 2018, 8