Information extraction from scientific articles: a survey

被引:0
作者
Zara Nasar
Syed Waqar Jaffry
Muhammad Kamran Malik
机构
[1] University of the Punjab,Punjab University College of Information Technology
来源
Scientometrics | 2018年 / 117卷
关键词
Metadata extraction; Key-insights extraction; Text mining; Information extraction; Machine learning; Research articles; Scientific literature;
D O I
暂无
中图分类号
学科分类号
摘要
In last few decades, with the advent of World Wide Web (WWW), world is being overloaded with huge data. This huge data carries potential information that once extracted, can be used for betterment of humanity. Information from this data can be extracted using manual and automatic analysis. Manual analysis is not scalable and efficient, whereas, the automatic analysis involves computing mechanisms that aid in automatic information extraction over huge amount of data. WWW has also affected overall growth in scientific literature that makes the process of literature review quite laborious, time consuming and cumbersome job for researchers. Hence a dire need is felt to automatically extract potential information out of immense set of scientific articles to automate the process of literature review. Therefore, in this study, aim is to present the overall progress concerning automatic information extraction from scientific articles. The information insights extracted from scientific articles are classified in two broad categories i.e. metadata and key-insights. As available benchmark datasets carry a significant role in overall development in this research domain, existing datasets against both categories are extensively reviewed. Later, research studies in literature that have applied various computational approaches applied on these datasets are consolidated. Major computational approaches in this regard include Rule-based approaches, Hidden Markov Models, Conditional Random Fields, Support Vector Machines, Naïve-Bayes classification and Deep Learning approaches. Currently, there are multiple projects going on that are focused towards the dataset construction tailored to specific information needs from scientific articles. Hence, in this study, state-of-the-art regarding information extraction from scientific articles is covered. This study also consolidates evolving datasets as well as various toolkits and code-bases that can be used for information extraction from scientific articles.
引用
收藏
页码:1931 / 1990
页数:59
相关论文
共 119 条
  • [1] Abdelmagid M(2014)Survey on information extraction from chemical compound literatures: Techniques and challenges Journal of Theoretical and Applied Information Technology 67 284-289
  • [2] Himmat M(2009)Automated document metadata extraction Journal of Information Science 35 563-570
  • [3] Ahmed A(2017)Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry British Medical Journal Open 7 e012545-218
  • [4] Adefowoke Ojokoh B(2013)Dual coordinate descent algorithms for efficient large margin structured prediction Transactions of the Association for Computational Linguistics 1 207-250
  • [5] Sunday Adewale O(2012)BibPro: A citation parser based on sequence alignment IEEE Transactions on Knowledge and Data Engineering 24 236-297
  • [6] Oluwole Falaki S(1995)Support-vector networks Machine Learning 20 273-1158
  • [7] Borah R(2009)A flexible approach for extracting metadata from bibliographic citations Journal of the American Society for Information Science and Technology 60 1144-167
  • [8] Brown AW(2007)Reference metadata extraction using a hierarchical knowledge representation framework Decision Support Systems 43 152-2159
  • [9] Capers PL(2017)Challenges as enablers for high quality linked data: Insights from the semantic publishing challenge PeerJ Computer Science 3 e105-278
  • [10] Kaiser KA(2011)Adaptive subgradient methods for online learning and stochastic optimization Journal of Machine Learning Research 12 2121-429