Improving Documents Classification with Semantic Features

被引:2
作者
Bai Rujiang [1 ]
Liao Junhua [1 ]
机构
[1] Shandong Univ Technol Lib, Zibo, Peoples R China
来源
PROCEEDINGS OF THE SECOND INTERNATIONAL SYMPOSIUM ON ELECTRONIC COMMERCE AND SECURITY, VOL I | 2009年
关键词
text classification; ontology; RDF; SVM;
D O I
10.1109/ISECS.2009.231
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Successful text classification is highly dependent on the representations used. Currently, most approaches to text classification adopt the 'bag-of-words' document representation approach, where the frequency of occurrence of each word is considered as the most important feature, but this method ignores important semantic relationships between key terms. In this paper, we proposed a system that uses ontologies and Natural Language Processing techniques to index texts. Traditional BOW matrix is replaced by "Bag of Concepts"(BOC). For this purpose, we developed fully automated methods for mapping keywords to their corresponding ontology concepts. Support Vector Machine a successful machine learning technique is used for classification. Experimental results shows that our proposed method dose improve text classification performance significantly
引用
收藏
页码:640 / 643
页数:4
相关论文
共 7 条
  • [1] [Anonymous], 1989, Building large knowledge-based systems: Representation and inference in the Cyc project
  • [2] Fellbaum C., 1998, WordNet, DOI DOI 10.7551/MITPRESS/7287.001.0001
  • [3] A comparison of word- and sense-based text categorization using several classification algorithms
    Kehagias, A
    Petridis, V
    Kaburlasos, VG
    Fragkou, P
    [J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2003, 21 (03) : 227 - 247
  • [4] MIHALCEA R, 2000, P 13 INT FLOR ART IN
  • [5] Moschitti A, 2004, LECT NOTES COMPUT SC, V2997, P181
  • [6] Sahlgren M., 2004, P 20 INT C COMPUTATI, P487, DOI [10.3115/ 1220355.1220425, 10.3115/1220355.1220425, DOI 10.3115/1220355.1220425]
  • [7] Voorhees EM, 1999, LECT NOTES ARTIF INT, V1714, P32