Patent document categorization based on semantic structural information

被引:42
作者
Kim, Jae-Ho [1 ]
Choi, Key-Sun [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Dept Comp Sci, SWRC, Taejon 305701, South Korea
关键词
patent categorization; text categorization; k-NN; semantic tag; structural information;
D O I
10.1016/j.ipm.2007.02.002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The number of patent documents is currently rising rapidly worldwide, creating the need for an automatic categorization system to replace time-consuming and labor-intensive manual categorization. Because accurate patent classification is crucial to search for relevant existing patents in a certain field, patent categorization is a very important and useful field. As patent documents are structural documents with their own characteristics distinguished from general documents, these unique traits should be considered in the patent categorization process. In this paper, we categorize Japanese patent documents automatically, focusing on their characteristics: patents are structured by claims, purposes, effects, embodiments of the invention, and so on. We propose a patent document categorization method that uses the k-NN (k-Nearest Neighbour) approach. In order to retrieve similar documents from a training document set, some specific components to denote the socalled semantic elements, such as claim, purpose, and application field, are compared instead of the whole texts. Because those specific components are identified by various user-defined tags, first all of the components are clustered into several semantic elements. Such semantically clustered structural components are the basic features of patent categorization. We can achieve a 74% improvement of categorization performance over a baseline system that does not use the structural information of the patent. (c) 2007 Published by Elsevier Ltd.
引用
收藏
页码:1200 / 1215
页数:16
相关论文
共 16 条
[1]  
APTE C, 1998, P C AUT LEARN DISC W
[2]  
BAKER LD, 1998, P 21 ANN INT ACM SIG
[3]  
CORNELIS KHA, 2003, P PSI 2003, V2890, P545
[4]  
Fall CJ, 2003, ACM SIGIR FORUM, V37
[5]   General convergence results for linear discriminant updates [J].
Grove, AJ ;
Littlestone, N ;
Schuurmans, D .
MACHINE LEARNING, 2001, 43 (03) :173-210
[6]  
LAM W, 1998, P 21 ANN INT ACM SIG, P81, DOI DOI 10.1145/290941.290961
[7]  
Larkey LeahS., 1999, P DL 99 4 ACM C DIG, P180, DOI DOI 10.1145/313238.313304
[8]  
MAKOTO I, 2005, P 3 NTCIR 5 WORKSH
[9]  
MAKOTO I, 2003, P 3 NTCIR WORKSH
[10]  
McCallum Andrew, 1998, AAAI 1998