DeepPatent: patent classification with convolutional neural networks and word embedding

被引:0
作者
Shaobo Li
Jie Hu
Yuxin Cui
Jianjun Hu
机构
[1] Guizhou University,Key Laboratory of Advanced Manufacturing Technology of Ministry of Education
[2] Guizhou University,School of Mechanical Engineering
[3] University of South Carolina,Department of Computer Science and Engineering
来源
Scientometrics | 2018年 / 117卷
关键词
Patent classification; Text classification; Convolutional neural network; Machine learning; Word embedding; 94-02; Y;
D O I
暂无
中图分类号
学科分类号
摘要
Patent classification is an essential task in patent information management and patent knowledge mining. However, this task is still largely done manually due to the unsatisfactory performance of current algorithms. Recently, deep learning methods such as convolutional neural networks (CNN) have led to great progress in image processing, voice recognition, and speech recognition, which has yet to be applied to patent classification. We proposed DeepPatent, a deep learning algorithm for patent classification based on CNN and word vector embedding. We evaluated the algorithm on the standard patent classification benchmark dataset CLEF-IP and compared it with other algorithms in the CLEF-IP competition. Experiments showed that DeepPatent with automatic feature extraction achieved a classification precision of 83.98%, which outperformed all the existing algorithms that used the same information for training. Its performance is better than the state-of-art patent classifier with a precision of 83.50%, whose performance is, however, based on 4000 characters from the description section and a lot of feature engineering while DeepPatent only used the title and abstract information. DeepPatent is further tested on USPTO-2M, a patent classification benchmark data set that we contributed with 2,000,147 records after data cleaning of 2,679,443 USA raw utility patent documents in 637 categories at the subclass level. Our algorithms achieved a precision of 73.88%.
引用
收藏
页码:721 / 744
页数:23
相关论文
共 77 条
  • [1] Altuntas S(2015)Forecasting technology success based on patent data Technological Forecasting and Social Change 96 202-214
  • [2] Dereli T(2012)Comparison of term frequency and document frequency based feature selection metrics in text categorization Expert Systems with Applications 39 4760-4768
  • [3] Kusiak A(2003)A neural probabilistic language model Journal of Machine Learning Research 3 1137-1155
  • [4] Azam N(2011)Automated patent classification Current Challenges in Patent Information Retrieval 29 239-261
  • [5] Yao J(1997)Selection of relevant features and examples in machine learning Artificial Intelligence 97 245-271
  • [6] Bengio Y(2012)A three-phase method for patent classification Information Processing and Management 48 1017-1030
  • [7] Ducharme R(2012)Using skipgrams and pos-based feature selection for patent classification Computational Linguistics in the Netherlands Journal 2 52-70
  • [8] Vincent P(2013)Text representations for patent classification Computational Linguistics 39 755-775
  • [9] Jauvin C(2008)Visualization of patent analysis for emerging technology Expert Systems with Applications 34 1804-1812
  • [10] Benzineb K(2012)Text classification and classifiers: A survey International Journal of Artificial Intelligence & Applications 3 85-3245