Using WordNet for text categorization

被引:0
作者
Elberrichi, Zakaria
Rahmoun, Abdelattif
Bentaalah, Mohamed Amine
机构
关键词
20newsgroups; ontology; reuters-21578; text categorization; wordNet; cosine distance;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper explores a method that use WordNet concept to categorize text documents. The bag of words representation used for text representation is unsatisfactory as it ignores possible relations between terms. The proposed method extracts generic concepts from WordNet for all the terms in the text then combines them with the terms in different ways to form a new representative vector. The effects of this method are examined in several experiments using the multivariate chi-square to reduce the dimensionality, the cosine distance and two benchmark corpus the reuters-21578 newswire articles and the 20 newsgroups data for evaluation. The proposed method is especially effective in raising the macro-averaged F1 value, which increased to 0.714 for the Reuters from 0.649 and to 0.719 for the 20 newsgroups from 0.667.
引用
收藏
页码:16 / 24
页数:9
相关论文
共 50 条
[41]   Comparative study on using artificial neural networks for text categorization [J].
Malik, AM ;
Stacey, DA ;
Song, F .
6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL VI, PROCEEDINGS: INDUSTRIAL SYSTEMS AND ENGINEERING I, 2002, :97-102
[42]   Active learning using localized generalization error for text categorization [J].
Yeung, Daniel S. ;
Zhang, Ying ;
Ng, Wing W. Y. ;
Chen, Qing-Cai .
PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2006, :2686-+
[43]   Automatic Text Categorization of Marathi Documents Using Clustering Technique [J].
Vispute, Sushma R. ;
Potey, M. A. .
2013 15TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING TECHNOLOGIES (ICACT), 2013,
[44]   Text Categorization using Rocchio Algorithm and Random Forest Algorithm [J].
Selvi, Thamarai S. ;
Karthikeyan, P. ;
Vincent, A. ;
Abinaya, V ;
Neeraja, G. ;
Deepika, R. .
2016 EIGHTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (ICOAC), 2017, :7-12
[45]   Boosting Naive Bayes Text Categorization by Using Cloud Model [J].
Wan, Jian ;
He, Tingting ;
Chen, Jinguang ;
Dong, Jinling .
2011 INTERNATIONAL CONFERENCE ON COMPUTER, ELECTRICAL, AND SYSTEMS SCIENCES, AND ENGINEERING (CESSE 2011), 2011, :165-+
[46]   Latent semantic analysis for text categorization using neural network [J].
Yu, Bo ;
Xu, Zong-ben ;
Li, Cheng-hua .
KNOWLEDGE-BASED SYSTEMS, 2008, 21 (08) :900-904
[47]   Using Linear Regression Residual of Document Vectors in Text Categorization [J].
Altincay, Hakan .
2013 21ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2013,
[48]   An Integrated Approach to Improve the Text Categorization Using Semantic Measures [J].
Chand, K. Purna ;
Narsimha, G. .
COMPUTATIONAL INTELLIGENCE IN DATA MINING, VOL 2, 2015, 32 :39-47
[49]   Text categorization in documentary databases using light computational models [J].
Mendoza, Marcelo ;
Ortiz, Ivette ;
Rojas, Victor .
REVISTA SIGNOS, 2011, 44 (77) :251-274
[50]   Usage of Distinctive Classifiers for Text Categorization Using Distributional Features [J].
Mubeen, Sayyada ;
Qaseem, Mohammad S. ;
Govardhan, A. .
2011 ANNUAL IEEE INDIA CONFERENCE (INDICON-2011): ENGINEERING SUSTAINABLE SOLUTIONS, 2011,