Vector space model for patent documents with hierarchical class labels

被引:6
作者
Chen, Yen-Liang [1 ]
Chiu, Yu-Ting [1 ]
机构
[1] Natl Cent Univ, Dept Informat Management, Chungli 320, Taiwan
关键词
document classification; feature selection; hierarchical class label; vector space model (VSM);
D O I
10.1177/0165551512437635
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A vector space model (VSM) composed of selected important features is a common way to represent documents, including patent documents. Patent documents have some special characteristics that make it difficult to apply traditional feature selection methods directly: (a) it is difficult to find common terms for patent documents in different categories; and (b) the class label of a patent document is hierarchical rather than flat. Hence, in this article we propose a new approach that includes a hierarchical feature selection (HFS) algorithm which can be used to select more representative features with greater discriminative ability to present a set of patent documents with hierarchical class labels. The performance of the proposed method is evaluated through application to two documents sets with 2400 and 9600 patent documents, where we extract candidate terms from their titles and abstracts. The experimental results reveal that a VSM whose features are selected by a proportional selection process gives better coverage, while a VSM whose features are selected with a weighted-summed selection process gives higher accuracy.
引用
收藏
页码:222 / 233
页数:12
相关论文
共 29 条
[1]  
[Anonymous], 2006, TEXT MINING HDB ADV
[2]  
[Anonymous], DATA MINING CONCEPTS
[3]  
[Anonymous], EMERGING TECHNOLOGIE
[4]  
Baeza-Yates R, 1999, MODERN INFORMATION R, P163
[5]  
Benzineb K., 2003, ACM SIGIR FORUM, V37, P10, DOI DOI 10.1145/945546.945547
[6]   An IPC-based vector space model for patent retrieval [J].
Chen, Yen-Liang ;
Chiu, Yu-Ting .
INFORMATION PROCESSING & MANAGEMENT, 2011, 47 (03) :309-322
[7]  
Chi Xue, 2010, Proceedings of the 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2010), P1497, DOI 10.1109/FSKD.2010.5569326
[8]   An algorithmic framework for performing collaborative filtering [J].
Herlocker, JL ;
Konstan, JA ;
Borchers, A ;
Riedl, J .
SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 1999, :230-237
[9]  
Hotho A., 2005, J LANG TECHNOL COMPU, V20, P19, DOI [10.21248/jlcl.20.2005.68, DOI 10.1111/j.1365-2621.1978.tb09773.x, DOI 10.21248/JLCL.20.2005.68]
[10]   Text classification using graph mining-based feature extraction [J].
Jiang, Chuntao ;
Coenen, Frans ;
Sanderson, Robert ;
Zito, Michele .
KNOWLEDGE-BASED SYSTEMS, 2010, 23 (04) :302-308