Challenges and Advances in Information Extraction from Scientific Literature: a Review

被引:0
作者
Zhi Hong
Logan Ward
Kyle Chard
Ben Blaiszik
Ian Foster
机构
[1] University of Chicago,
[2] Argonne National Laboratory,undefined
来源
JOM | 2021年 / 73卷
关键词
Information extraction; Text mining; Scientific data;
D O I
暂无
中图分类号
学科分类号
摘要
Scientific articles have long been the primary means of disseminating scientific discoveries. Over the centuries, valuable data and potentially groundbreaking insights have been collected and buried deep in the mountain of publications. In materials engineering, such data are spread across technical handbooks specification sheets, journal articles, and laboratory notebooks in myriad formats. Extracting information from papers on a large scale has been a tedious and time-consuming job to which few researchers have wanted to devote their limited time and effort, yet is an activity that is essential for modern data-driven design practices. However, in recent years, significant progress has been made by the computer science community on techniques for automated information extraction from free text. Yet, transformative application of these techniques to scientific literature remains elusive—due not to a lack of interest or effort but to technical and logistical challenges. Using the challenges in the materials science literature as a driving motivation, we review the gaps between state-of-the-art information extraction methods and the practical application of such methods to scientific texts, and offer a comprehensive overview of work that can be undertaken to close these gaps.
引用
收藏
页码:3383 / 3400
页数:17
相关论文
共 410 条
  • [1] Landhuis E(2016)undefined Nature 535 457-undefined
  • [2] Olson G(2014)undefined Scr. Mater. 70 1-undefined
  • [3] de Pablo JJ(2019)undefined NPJ Comput. Mater. 5 1-undefined
  • [4] Jackson NE(2009)undefined J. Appl. Crystallogr. 42 726-undefined
  • [5] Webb MA(2015)undefined NPJ Comput. Mater. 1 1-undefined
  • [6] Chen LQ(2018)undefined J. Phys. Chem. C 122 17575-undefined
  • [7] Moore JE(2013)undefined APL Mater. 1 011002-undefined
  • [8] Morgan D(1975)undefined J. Am. Soc. Inform. Sci. 26 94-undefined
  • [9] Jacobs R(1964)undefined Bull. Med. Libr. Assoc. 52 150-undefined
  • [10] Pollock T(1997)undefined Artif. Intell. 91 183-undefined