A Literature Review on Methods for the Extraction of Usage Statements of Software and Data

被引:11
作者
Krueger, Frank [1 ,2 ]
Schindler, David [3 ]
机构
[1] Univ Rostock, Res Data Management, Inst Commun Engn, Rostock, Germany
[2] Univ Rostock, Mobile Multimedia Informat Syst Grp, Rostock, Germany
[3] Univ Rostock, Inst Commun Engn, Rostock, Germany
关键词
Software; Bibliographies; Data mining; Manuals; Standards; Supervised learning; Object recognition; software and data citation; named entity recognition; literature review; IMPACT;
D O I
10.1109/MCSE.2019.2943847
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Software and data have become major components of modern research, which is also reflected by an increased number of software usages. Knowledge about used software and data would provide researchers a better understanding of the results of a scientific investigation and thus foster its reproducibility. Software and data are, however, often not formally cited but their usage is mentioned in the main text. In order to assess the state of the art in extraction of such usage statements, we performed a literature review. We provide an overview of the existing methods for the identification of usage statements of software and data in scientific articles. This analysis mainly focuses on technical approaches, the employed corpora, and the purpose of the investigation itself. We found four different classes of approaches that are used in the literature: 1) term search, 2) manual extraction, 3) rule-based extraction, and 4) extraction based on supervised learning.
引用
收藏
页码:26 / 38
页数:13
相关论文
共 20 条
  • [1] Schroedinger's Code: A Preliminary Study on Research Source Code Availability and Link Persistence in Astrophysics
    Allen, Alice
    Teuben, Peter J.
    Ryan, P. Wesley
    [J]. ASTROPHYSICAL JOURNAL SUPPLEMENT SERIES, 2018, 236 (01)
  • [2] [Anonymous], 2014, JOINT DECLARATION DA
  • [3] [Anonymous], 2022, PeerJ Preprints, DOI [DOI 10.7287/PEERJ.PREPRINTS.2630V1, 10.7287/peerj.Preprints, DOI 10.7287/PEERJ.PREPRINTS]
  • [4] Boland Katarina, 2012, Theory and Practice of Digital Libraries. Second International Conference, TPDL 2012. Proceedings: LNCS 7489, P150, DOI 10.1007/978-3-642-33290-6_17
  • [5] Boland K., 2019, P JOINT WORKSH BIBL
  • [6] A Survey of Bioinformatics Database and Software Usage through Mining the Literature
    Duck, Geraint
    Nenadic, Goran
    Filannino, Michele
    Brass, Andy
    Robertson, David L.
    Stevens, Robert
    [J]. PLOS ONE, 2016, 11 (06):
  • [7] Ambiguity and variability of database and software names in bioinformatics
    Duck, Geraint
    Kovacevic, Aleksandar
    Robertson, David L.
    Stevens, Robert
    Nenadic, Goran
    [J]. JOURNAL OF BIOMEDICAL SEMANTICS, 2015, 6
  • [8] bioNerDS: exploring bioinformatics' database and software use through literature mining
    Duck, Geraint
    Nenadic, Goran
    Brass, Andy
    Robertson, David L.
    Stevens, Robert
    [J]. BMC BIOINFORMATICS, 2013, 14
  • [9] Wide-Open: Accelerating public data release by automating detection of overdue datasets
    Grechkin, Maxim
    Poon, Hoifung
    Howe, Bill
    [J]. PLOS BIOLOGY, 2017, 15 (06):
  • [10] Deep learning with word embeddings improves biomedical named entity recognition
    Habibi, Maryam
    Weber, Leon
    Neves, Mariana
    Wiegandt, David Luis
    Leser, Ulf
    [J]. BIOINFORMATICS, 2017, 33 (14) : I37 - I48