Data-driven materials research enabled by natural language processing and information extraction

被引:161
|
作者
Olivetti, Elsa A. [1 ]
Cole, Jacqueline M. [2 ,3 ,4 ]
Kim, Edward [5 ]
Kononova, Olga [6 ,7 ]
Ceder, Gerbrand [6 ,7 ]
Han, Thomas Yong-Jin [8 ]
Hiszpanski, Anna M. [8 ]
机构
[1] MIT, Dept Mat Sci & Engn, Cambridge, MA 02139 USA
[2] Univ Cambridge, Dept Phys, Cavendish Lab, JJ Thomson Ave, Cambridge CB3 0HE, England
[3] Rutherford Appleton Lab, ISIS Neutron & Muon Source, Harwell Sci & Innovat Campus, Didcot OX11 0QX, Oxon, England
[4] Univ Cambridge, Dept Chem Engn & Biotechnol, West Cambridge Site,Philippa Fawcett Dr, Cambridge CB3 0AS, England
[5] Xero, Sci Evaluat & Measurement, Toronto, ON M5H 4G1, Canada
[6] Univ Calif Berkeley, Dept Mat Sci & Engn, Berkeley, CA 94720 USA
[7] Lawrence Berkeley Natl Lab, Mat Sci Div, Berkeley, CA 94720 USA
[8] Lawrence Livermore Natl Lab, Div Mat Sci, Livermore, CA 94550 USA
基金
美国国家科学基金会; 英国科学技术设施理事会;
关键词
RECOGNITION; DESIGN; INFRASTRUCTURE; DISCOVERY; KNOWLEDGE; PLATFORM; SYSTEM; GENOME;
D O I
10.1063/5.0021106
中图分类号
O59 [应用物理学];
学科分类号
摘要
Given the emergence of data science and machine learning throughout all aspects of society, but particularly in the scientific domain, there is increased importance placed on obtaining data. Data in materials science are particularly heterogeneous, based on the significant range in materials classes that are explored and the variety of materials properties that are of interest. This leads to data that range many orders of magnitude, and these data may manifest as numerical text or image-based information, which requires quantitative interpretation. The ability to automatically consume and codify the scientific literature across domains-enabled by techniques adapted from the field of natural language processing-therefore has immense potential to unlock and generate the rich datasets necessary for data science and machine learning. This review focuses on the progress and practices of natural language processing and text mining of materials science literature and highlights opportunities for extracting additional information beyond text contained in figures and tables in articles. We discuss and provide examples for several reasons for the pursuit of natural language processing for materials, including data compilation, hypothesis development, and understanding the trends within and across fields. Current and emerging natural language processing methods along with their applications to materials science are detailed. We, then, discuss natural language processing and data challenges within the materials science domain where future directions may prove valuable.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Accelerating Materials Discovery for Polymer Solar Cells: Data-Driven Insights Enabled by Natural Language Processing
    Shetty, Pranav
    Adeboye, Aishat
    Gupta, Sonakshi
    Zhang, Chao
    Ramprasad, Rampi
    CHEMISTRY OF MATERIALS, 2024, 36 (16) : 7676 - 7689
  • [2] Improving the Efficacy of the Data Entry Process for Clinical Research With a Natural Language Processing-Driven Medical Information Extraction System: Quantitative Field Research
    Han, Jiang
    Chen, Ken
    Fang, Lei
    Zhang, Shaodian
    Wang, Fei
    Ma, Handong
    Zhao, Liebin
    Liu, Shijian
    JMIR MEDICAL INFORMATICS, 2019, 7 (03)
  • [3] From data to insights: how natural language processing and structured reporting advance data-driven radiology
    Fink, Matthias A.
    EUROPEAN RADIOLOGY, 2023, 33 (11) : 7494 - 7495
  • [4] From data to insights: how natural language processing and structured reporting advance data-driven radiology
    Matthias A. Fink
    European Radiology, 2023, 33 : 7494 - 7495
  • [5] PROCESSING NATURAL MALAY TEXTS: A DATA-DRIVEN APPROACH
    Don, Zuraidah Mohd
    TRAMES-JOURNAL OF THE HUMANITIES AND SOCIAL SCIENCES, 2010, 14 (01): : 90 - 103
  • [6] Towards data-driven medical imaging using natural language processing in patients with suspected urolithiasis
    Jungmann, Florian
    Kaempgen, Benedikt
    Mildenberger, Philipp
    Tsaur, Igor
    Jorg, Tobias
    Dueber, Christoph
    Mildenberger, Peter
    Kloeckner, Roman
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2020, 137
  • [7] Data Extraction by Using Natural Language Processing Tool
    More, Sujata D.
    Madankar, Mangala S.
    Chandak, M. B.
    HELIX, 2018, 8 (05): : 3846 - 3848
  • [8] Data-Driven Materials Research and Development for Functional Coatings
    Xu, Kai
    Xiao, Xuelian
    Wang, Linjing
    Lou, Ming
    Wang, Fangming
    Li, Changheng
    Ren, Hui
    Wang, Xue
    Chang, Keke
    ADVANCED SCIENCE, 2024, 11 (42)
  • [9] Automatic Corpus Extension for Data-driven Natural Language Generation
    Manishina, Elena
    Jabaian, Bassam
    Huet, Stephane
    Lefevre, Fabrice
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 3624 - 3631
  • [10] Data-Driven and Ontological Analysis of FrameNet for Natural Language Reasoning
    Ovchinnikova, Ekaterina
    Vieu, Laure
    Oltramari, Alessandro
    Borgo, Stefano
    Alexandrov, Theodore
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010,