Machine learning and natural language processing on the patent corpus: Data, tools, and new measures

被引:48
|
作者
Balsmeieri, Benjamin [1 ]
Assaf, Mohamad [2 ,3 ]
Chesebro, Tyler [4 ]
Fierro, Gabe [4 ]
Johnson, Kevin [4 ]
Johnson, Scott [4 ]
Li, Guan-Cheng [2 ]
Lueck, Sonja [5 ]
O'Reagan, Doug [2 ]
Yeh, Bill [4 ]
Zang, Guangzheng [4 ]
Fleming, Lee [2 ]
机构
[1] Univ Luxembourg, Ctr Res Econ & Management, Esch Sur Alzette, Luxembourg
[2] Univ Calif Berkeley, Coleman Fung Inst Engn Leadership, Berkeley, CA 94720 USA
[3] Amer Univ Beirut, Dept Elect & Comp Engn, Beirut, Lebanon
[4] Univ Calif Berkeley, Elect Engn & Comp Sci, Berkeley, CA USA
[5] Univ Paderborn, Dept Econ, Paderborn, Germany
基金
美国国家科学基金会;
关键词
database; disambiguation; machine learning; natural language processing; patent; social networks; NETWORKS;
D O I
10.1111/jems.12259
中图分类号
F [经济];
学科分类号
02 ;
摘要
Drawing upon recent advances in machine learning and natural language processing, we introduce new tools that automatically ingest, parse, disambiguate, and build an updated database using U.S. patent data. The tools identify unique inventor, assignee, and location entities mentioned on each granted U.S. patent from 1976 to 2016. We describe data flow, algorithms, user interfaces, descriptive statistics, and a novelty measure based on the first appearance of a word in the patent corpus. We illustrate an automated coinventor network mapping tool and visualize trends in patenting over the last 40 years. Data and documentation can be found at https://console.cloud.google.com/launcher/partners/patents-public-data.
引用
收藏
页码:535 / 553
页数:19
相关论文
共 50 条
  • [22] Fracking Twitter: Utilizing machine learning and natural language processing tools for identifying coalition and causal narratives
    Pattison, Andrew
    Cipolli, William
    Marichal, Jose
    Cherniakov, Christopher
    POLITICS & POLICY, 2023, 51 (05) : 755 - 774
  • [23] The parallel corpus for information extraction based on natural language processing and machine translation
    He, Honghua
    EXPERT SYSTEMS, 2019, 36 (05)
  • [24] Domain Adaptation of General Natural Language Processing Tools for a Patent Claim Visualization System
    Andersson, Linda
    Lupu, Mihai
    Hanbury, Allan
    MULTIDISCIPLINARY INFORMATION RETRIEVAL, 2013, 8201 : 70 - 82
  • [25] Interview Bot Development with Natural Language Processing and Machine Learning
    Siswanto, Joko
    Suakanto, Sinung
    Andriani, Made
    Hardiyanti, Margareta
    Kusumasari, Tien Febriyanti
    INTERNATIONAL JOURNAL OF TECHNOLOGY, 2022, 13 (02) : 274 - 285
  • [26] Machine learning in medicine: a practical introduction to natural language processing
    Harrison, Conrad J.
    Sidey-Gibbons, Chris J.
    BMC MEDICAL RESEARCH METHODOLOGY, 2021, 21 (01)
  • [27] Application of Natural Language Processing and Machine Learning to Radiology Reports
    Jeon, Seoungdeok
    Colburn, Zachary
    Sakai, Joshua
    Hung, Ling-Hong
    Yeung, Ka Yee
    12TH ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS (ACM-BCB 2021), 2021,
  • [28] Automotive fault nowcasting with machine learning and natural language processing
    Pavlopoulos, John
    Romell, Alv
    Curman, Jacob
    Steinert, Olof
    Lindgren, Tony
    Borg, Markus
    Randl, Korbinian
    MACHINE LEARNING, 2024, 113 (02) : 843 - 861
  • [29] Machine learning in medicine: a practical introduction to natural language processing
    Conrad J. Harrison
    Chris J. Sidey-Gibbons
    BMC Medical Research Methodology, 21
  • [30] Railroad accident analysis by machine learning and natural language processing
    Bridgelall, Raj
    Tolliver, Denver D.
    JOURNAL OF RAIL TRANSPORT PLANNING & MANAGEMENT, 2024, 29