Machine learning and natural language processing on the patent corpus: Data, tools, and new measures

被引:48
|
作者
Balsmeieri, Benjamin [1 ]
Assaf, Mohamad [2 ,3 ]
Chesebro, Tyler [4 ]
Fierro, Gabe [4 ]
Johnson, Kevin [4 ]
Johnson, Scott [4 ]
Li, Guan-Cheng [2 ]
Lueck, Sonja [5 ]
O'Reagan, Doug [2 ]
Yeh, Bill [4 ]
Zang, Guangzheng [4 ]
Fleming, Lee [2 ]
机构
[1] Univ Luxembourg, Ctr Res Econ & Management, Esch Sur Alzette, Luxembourg
[2] Univ Calif Berkeley, Coleman Fung Inst Engn Leadership, Berkeley, CA 94720 USA
[3] Amer Univ Beirut, Dept Elect & Comp Engn, Beirut, Lebanon
[4] Univ Calif Berkeley, Elect Engn & Comp Sci, Berkeley, CA USA
[5] Univ Paderborn, Dept Econ, Paderborn, Germany
基金
美国国家科学基金会;
关键词
database; disambiguation; machine learning; natural language processing; patent; social networks; NETWORKS;
D O I
10.1111/jems.12259
中图分类号
F [经济];
学科分类号
02 ;
摘要
Drawing upon recent advances in machine learning and natural language processing, we introduce new tools that automatically ingest, parse, disambiguate, and build an updated database using U.S. patent data. The tools identify unique inventor, assignee, and location entities mentioned on each granted U.S. patent from 1976 to 2016. We describe data flow, algorithms, user interfaces, descriptive statistics, and a novelty measure based on the first appearance of a word in the patent corpus. We illustrate an automated coinventor network mapping tool and visualize trends in patenting over the last 40 years. Data and documentation can be found at https://console.cloud.google.com/launcher/partners/patents-public-data.
引用
收藏
页码:535 / 553
页数:19
相关论文
共 50 条
  • [1] Natural language processing to identify the creation and impact of new technologies in patent text: Code, data, and new measures
    Arts, Sam
    Hou, Jianan
    Gomez, Juan Carlos
    RESEARCH POLICY, 2021, 50 (02)
  • [2] Natural Language Processing System for Text Classification Corpus Based on Machine Learning
    Su, Yawen
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (08)
  • [3] Intelligent compilation of patent summaries using machine learning and natural language processing techniques
    Trappey, Amy J. C.
    Trappey, Charles V.
    Wu, Jheng-Long
    Wang, Jack W. C.
    ADVANCED ENGINEERING INFORMATICS, 2020, 43
  • [4] Text Classification Based on Natural Language Processing and Machine Learning in Multi-Label Corpus
    Yu, Haitao
    Xiong, Feng
    Chen, Zuh ui
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (08)
  • [6] Knowledgeable Machine Learning for Natural Language Processing
    Han, Xu
    Zhang, Zhengyan
    Liu, Zhiyuan
    COMMUNICATIONS OF THE ACM, 2021, 64 (11) : 50 - 51
  • [7] Machine learning in statistical natural language processing
    Mochihashi, Daichi
    Kyokai Joho Imeji Zasshi/Journal of the Institute of Image Information and Television Engineers, 2015, 69 (02): : 131 - 135
  • [8] A Review of Natural Language Processing and Machine Learning Tools Used to Analyze Arabic Social Media
    Kanan, Tarek
    Sadaqa, Odai
    Aldajeh, Amal
    Alshwabka, Hanadi
    AL-dolime, Wassan
    AlZu'bi, Shadi
    Elbes, Mohammed
    Hawashin, Bilal
    Alia, Mohammad A.
    2019 IEEE JORDAN INTERNATIONAL JOINT CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATION TECHNOLOGY (JEEIT), 2019, : 622 - 628
  • [9] An Empirical Study on Patent Novelty Detection: A Novel Approach Using Machine Learning and Natural Language Processing
    Chikkamath, Renukswamy
    Endres, Markus
    Bayyapu, Lavanya
    Hewel, Christoph
    2020 SEVENTH INTERNATIONAL CONFERENCE ON SOCIAL NETWORK ANALYSIS, MANAGEMENT AND SECURITY (SNAMS), 2020, : 135 - 141
  • [10] Artificial learning companionusing machine learning and natural language processing
    R. Pugalenthi
    A Prabhu Chakkaravarthy
    J Ramya
    Samyuktha Babu
    R. Rasika Krishnan
    International Journal of Speech Technology, 2021, 24 : 553 - 560