A simple and fast method for Named Entity context extraction from patents

被引:15
作者
Puccetti, Giovanni [1 ]
Chiarello, Filippo [2 ]
Fantoni, Gualtiero [3 ]
机构
[1] Scuola Normale Super Pisa, Piazza Cavalieri 7, I-56126 Pisa, Italy
[2] Dept Energy Syst Terr & Construct Engn, Largo Lucio Lazzarino 2, I-56122 Pisa, Italy
[3] Dept Civil & Ind Engn, Largo Lucio Lazzarino 2, I-56122 Pisa, Italy
关键词
Natural Language Processing; Information retrieval; Patents; FRAMEWORK;
D O I
10.1016/j.eswa.2021.115570
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The process of extracting relevant technical information from patents or technical literature is as valuable as it is challenging. It deals with highly relevant information extraction from a corpus of documents with particular structure, and a mix of technical and legal jargon. Patents are the wider free source of technical information where homogeneous entities can be found. From a technical perspective the approaches refer to Named Entity Recognition (NER) and make use of Machine Learning techniques for Natural Language Processing (NLP). However, due to the large amount of data, to the complexity of the lexicon, the peculiarity of the structure and the scarcity of the examples to be used to feed the machine learning system, new approaches should be studied. NER methods are increasing their performances in many contexts, but a gap still exists when dealing with technical documentation. The aim of this work is to create an automatic training sets for NER systems by exploiting the nature and structure of patents, an open and massive source of technical documentation. In particular, we focus on collecting the context where users of the invention appear within patents. We then measure to which extent we achieve our goal and discuss how much our method is generalizable to other entities and documents.
引用
收藏
页数:9
相关论文
共 54 条
  • [1] Abnar S, 2019, BLACKBOXNLP WORKSHOP ON ANALYZING AND INTERPRETING NEURAL NETWORKS FOR NLP AT ACL 2019, P191
  • [2] Never-Ending Learning for Open-Domain Question Answering over Knowledge Bases
    Abujabal, Abdalghani
    Roy, Rishiraj Saha
    Yahya, Mohamed
    Weikum, Gerhard
    [J]. WEB CONFERENCE 2018: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW2018), 2018, : 1053 - 1062
  • [3] [Anonymous], 2015, TENSORFLOW LARGE SCA
  • [4] Natural language processing to identify the creation and impact of new technologies in patent text: Code, data, and new measures
    Arts, Sam
    Hou, Jianan
    Gomez, Juan Carlos
    [J]. RESEARCH POLICY, 2021, 50 (02)
  • [5] Asche G, 2017, WORLD PAT INF, V48, P16, DOI 10.1016/j.wpi.2016.11.004
  • [6] A multi-strategy approach to biological named entity recognition
    Atkinson, John
    Bull, Veronica
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (17) : 12968 - 12974
  • [7] Joint entity recognition and relation extraction as a multi-head selection problem
    Bekoulis, Giannis
    Deleu, Johannes
    Demeester, Thomas
    Develder, Chris
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2018, 114 : 34 - 45
  • [8] An attentive neural architecture for joint segmentation and parsing and its application to real estate ads
    Bekoulis, Giannis
    Deleu, Johannes
    Demeester, Thomas
    Develder, Chris
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2018, 102 : 100 - 112
  • [9] A neural probabilistic language model
    Bengio, Y
    Ducharme, R
    Vincent, P
    Jauvin, C
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (06) : 1137 - 1155
  • [10] Binkhonain M., 2019, Expert Systems with Applications, V1, DOI DOI 10.1016/J.ESWAX.2019.100001