Using WordNet for text categorization

被引:0
作者
Elberrichi, Zakaria
Rahmoun, Abdelattif
Bentaalah, Mohamed Amine
机构
关键词
20newsgroups; ontology; reuters-21578; text categorization; wordNet; cosine distance;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper explores a method that use WordNet concept to categorize text documents. The bag of words representation used for text representation is unsatisfactory as it ignores possible relations between terms. The proposed method extracts generic concepts from WordNet for all the terms in the text then combines them with the terms in different ways to form a new representative vector. The effects of this method are examined in several experiments using the multivariate chi-square to reduce the dimensionality, the cosine distance and two benchmark corpus the reuters-21578 newswire articles and the 20 newsgroups data for evaluation. The proposed method is especially effective in raising the macro-averaged F1 value, which increased to 0.714 for the Reuters from 0.649 and to 0.719 for the 20 newsgroups from 0.667.
引用
收藏
页码:16 / 24
页数:9
相关论文
共 50 条
[21]   Semantic Representation of Malayalam Text Documents in Cricket Domain Using WordNet [J].
Kumar, Sreedhi Deleep ;
Reshma, E. U. ;
Sunitha, C. ;
Ganesh, Amal .
INTERNATIONAL CONFERENCE ON INTELLIGENT DATA COMMUNICATION TECHNOLOGIES AND INTERNET OF THINGS, ICICI 2018, 2019, 26 :439-447
[22]   Research of Text Categorization Based on Ontology [J].
Wang Jiayun ;
Zhang Rui ;
Wang Peng .
PROCEEDINGS OF 2009 CONFERENCE ON COMMUNICATION FACULTY, 2009, :167-170
[23]   Text categorization using distributional clustering and concept extraction [J].
He, Yifan ;
Jiang, Minghu .
ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS: WITH ASPECTS OF THEORETICAL AND METHODOLOGICAL ISSUES, 2007, 4681 :720-+
[24]   Text Categorization of Marathi Documents using Modified LINGO [J].
Narhari, Shraddha A. ;
Shedge, Rajashree .
2017 IEEE INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATION AND CONTROL (ICAC3), 2017,
[25]   The Automated Arabic Text Categorization Using SVM and KNN [J].
Hadi, Wa'el Musa ;
Eljinini, Mohammad Ali H. ;
Alhawari, Samer .
KNOWLEDGE MANAGEMENT AND INNOVATION: A BUSINESS COMPETITIVE EDGE PERSPECTIVE, VOLS 1-3, 2010, :757-+
[26]   Using typical testors for feature selection in text categorization [J].
Pons-Porratal, Aurora ;
Gil-Garcia, Reynaldo ;
Berlanga-Liavori, Rafael .
PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS AND APPLICATIONS, PROCEEDINGS, 2007, 4756 :643-+
[27]   Improving Arabic Text Categorization using Decision Trees [J].
Harrag, Fouzi ;
El-Qawasmeh, Eyas ;
Pichappan, Pit .
NDT: 2009 FIRST INTERNATIONAL CONFERENCE ON NETWORKED DIGITAL TECHNOLOGIES, 2009, :110-+
[28]   Using bigrams detection for text categorization in scientific domain [J].
Montejo Raez, Arturo ;
Perea Ortega, Jose Manuel ;
Martin Valdivia, Maria Teresa ;
Urena Lopez, L. Alfonso .
PROCESAMIENTO DEL LENGUAJE NATURAL, 2010, (44) :91-98
[29]   Fast text categorization using concise semantic analysis [J].
Li Zhixing ;
Xiong Zhongyang ;
Zhang Yufang ;
Liu Chunyong ;
Li Kuan .
PATTERN RECOGNITION LETTERS, 2011, 32 (03) :441-448
[30]   A new method for measuring text similarity in learning management systems using WordNet [J].
Alkhatib, Bassel ;
Alnahhas, Ammar ;
Albadawi, Firas .
International Journal of Web-Based Learning and Teaching Technologies, 2014, 9 (02) :1-13