Machine learning and natural language processing on the patent corpus: Data, tools, and new measures

被引:48
|
作者
Balsmeieri, Benjamin [1 ]
Assaf, Mohamad [2 ,3 ]
Chesebro, Tyler [4 ]
Fierro, Gabe [4 ]
Johnson, Kevin [4 ]
Johnson, Scott [4 ]
Li, Guan-Cheng [2 ]
Lueck, Sonja [5 ]
O'Reagan, Doug [2 ]
Yeh, Bill [4 ]
Zang, Guangzheng [4 ]
Fleming, Lee [2 ]
机构
[1] Univ Luxembourg, Ctr Res Econ & Management, Esch Sur Alzette, Luxembourg
[2] Univ Calif Berkeley, Coleman Fung Inst Engn Leadership, Berkeley, CA 94720 USA
[3] Amer Univ Beirut, Dept Elect & Comp Engn, Beirut, Lebanon
[4] Univ Calif Berkeley, Elect Engn & Comp Sci, Berkeley, CA USA
[5] Univ Paderborn, Dept Econ, Paderborn, Germany
基金
美国国家科学基金会;
关键词
database; disambiguation; machine learning; natural language processing; patent; social networks; NETWORKS;
D O I
10.1111/jems.12259
中图分类号
F [经济];
学科分类号
02 ;
摘要
Drawing upon recent advances in machine learning and natural language processing, we introduce new tools that automatically ingest, parse, disambiguate, and build an updated database using U.S. patent data. The tools identify unique inventor, assignee, and location entities mentioned on each granted U.S. patent from 1976 to 2016. We describe data flow, algorithms, user interfaces, descriptive statistics, and a novelty measure based on the first appearance of a word in the patent corpus. We illustrate an automated coinventor network mapping tool and visualize trends in patenting over the last 40 years. Data and documentation can be found at https://console.cloud.google.com/launcher/partners/patents-public-data.
引用
收藏
页码:535 / 553
页数:19
相关论文
共 50 条
  • [31] Automotive fault nowcasting with machine learning and natural language processing
    John Pavlopoulos
    Alv Romell
    Jacob Curman
    Olof Steinert
    Tony Lindgren
    Markus Borg
    Korbinian Randl
    Machine Learning, 2024, 113 : 843 - 861
  • [32] Using Natural Language Processing and Machine Learning to Identify Opioids in Electronic Health Record Data
    McDermott, Sean P.
    Wasan, Ajay D.
    JOURNAL OF PAIN RESEARCH, 2023, 16 : 2133 - 2140
  • [33] Machine learning and Natural Language Processing of social media data for event detection in smart cities
    Hodorog, Andrei
    Petri, Ioan
    Rezgui, Yacine
    SUSTAINABLE CITIES AND SOCIETY, 2022, 85
  • [34] An Intelligent Patent Summary System Deploying Natural Language Processing and Machining Learning
    Trappey, A. J. C.
    Trappey, C. V.
    Wang, J. W. -C.
    Wu, J. -L.
    TRANSDISCIPLINARY ENGINEERING METHODS FOR SOCIAL INNOVATION OF INDUSTRY 4.0, 2018, 7 : 1204 - 1213
  • [35] An introduction to Deep Learning in Natural Language Processing: Models, techniques, and tools
    Lauriola, Ivano
    Lavelli, Alberto
    Aiolli, Fabio
    NEUROCOMPUTING, 2022, 470 : 443 - 456
  • [36] Review of Natural Language Processing for Corpus Linguistics
    Zhao, Qiuying
    CORPUS PRAGMATICS, 2022, 6 (04) : 311 - 314
  • [37] Visual tools for natural language processing
    Gaizauskas, R
    Rodgers, PJ
    Humphreys, K
    JOURNAL OF VISUAL LANGUAGES AND COMPUTING, 2001, 12 (04): : 375 - 412
  • [38] Natural language processing for learner corpus research
    Kyle, Kristopher
    INTERNATIONAL JOURNAL OF LEARNER CORPUS RESEARCH, 2021, 7 (01) : 1 - 16
  • [39] GIS, Big Data, and a Tweet Corpus Operationalized via Natural Language Processing
    Corso, Anthony J.
    Alsudais, Kareem
    AMCIS 2015 PROCEEDINGS, 2015,
  • [40] Natural language processing and machine learning to assist radiation oncology incident learning
    Mathew, Felix
    Wang, Hui
    Montgomery, Logan
    Kildea, John
    MEDICAL PHYSICS, 2021, 48 (08) : 4704 - 4705