Machine learning and natural language processing on the patent corpus: Data, tools, and new measures

被引:48
作者
Balsmeieri, Benjamin [1 ]
Assaf, Mohamad [2 ,3 ]
Chesebro, Tyler [4 ]
Fierro, Gabe [4 ]
Johnson, Kevin [4 ]
Johnson, Scott [4 ]
Li, Guan-Cheng [2 ]
Lueck, Sonja [5 ]
O'Reagan, Doug [2 ]
Yeh, Bill [4 ]
Zang, Guangzheng [4 ]
Fleming, Lee [2 ]
机构
[1] Univ Luxembourg, Ctr Res Econ & Management, Esch Sur Alzette, Luxembourg
[2] Univ Calif Berkeley, Coleman Fung Inst Engn Leadership, Berkeley, CA 94720 USA
[3] Amer Univ Beirut, Dept Elect & Comp Engn, Beirut, Lebanon
[4] Univ Calif Berkeley, Elect Engn & Comp Sci, Berkeley, CA USA
[5] Univ Paderborn, Dept Econ, Paderborn, Germany
基金
美国国家科学基金会;
关键词
database; disambiguation; machine learning; natural language processing; patent; social networks; NETWORKS;
D O I
10.1111/jems.12259
中图分类号
F [经济];
学科分类号
02 ;
摘要
Drawing upon recent advances in machine learning and natural language processing, we introduce new tools that automatically ingest, parse, disambiguate, and build an updated database using U.S. patent data. The tools identify unique inventor, assignee, and location entities mentioned on each granted U.S. patent from 1976 to 2016. We describe data flow, algorithms, user interfaces, descriptive statistics, and a novelty measure based on the first appearance of a word in the patent corpus. We illustrate an automated coinventor network mapping tool and visualize trends in patenting over the last 40 years. Data and documentation can be found at https://console.cloud.google.com/launcher/partners/patents-public-data.
引用
收藏
页码:535 / 553
页数:19
相关论文
共 14 条
  • [1] Independent boards and innovation
    Balsmeier, Benjamin
    Fleming, Lee
    Manso, Gustavo
    [J]. JOURNAL OF FINANCIAL ECONOMICS, 2017, 123 (03) : 536 - 557
  • [2] Does Going Public Affect Innovation?
    Bernstein, Shai
    [J]. JOURNAL OF FINANCE, 2015, 70 (04) : 1364 - 1403
  • [3] Carayol N., 2009, CAHIERS GRETHA 2009
  • [4] Fleming L., 2004, 20132 USPTO
  • [5] Hall B., 2001, NATL BUREAU EC RES W
  • [6] Hall B., 2012, 17773 NBER
  • [7] Lai Ronald., 2009, The careers and co-authorship networks of u.s. patent-holders, DOI 10.2200/S00428ED1V01Y201207WBE002
  • [8] Disambiguation and co-authorship networks of the US patent inventor database (1975-2010)
    Li, Guan-Cheng
    Lai, Ronald
    D'Amour, Alexander
    Doolin, David M.
    Sun, Ye
    Torvik, Vetle I.
    Yu, Amy Z.
    Fleming, Lee
    [J]. RESEARCH POLICY, 2014, 43 (06) : 941 - 955
  • [9] Marco Alan C., 2015, Working Paper No. 2015-1.
  • [10] Monath N., 2015, PAT VIEW C