Automatic Category Theme Identification and Hierarchy Generation for Chinese Text Categorization

被引:0
作者
Hsin-Chang Yang
Chung-Hong Lee
机构
[1] Chang Jung University,Department of Information Management
[2] National Kaohsiung University of Applied Sciences,Department of Electrical Engineering
来源
Journal of Intelligent Information Systems | 2005年 / 25卷
关键词
automatic category theme identification; automatic category hierarchy generation; text categorization; self-organizing maps; text mining;
D O I
暂无
中图分类号
学科分类号
摘要
Recently research on text mining has attracted lots of attention from both industrial and academic fields. Text mining concerns of discovering unknown patterns or knowledge from a large text repository. The problem is not easy to tackle due to the semi-structured or even unstructured nature of those texts under consideration. Many approaches have been devised for mining various kinds of knowledge from texts. One important aspect of text mining is on automatic text categorization, which assigns a text document to some predefined category if the document falls into the theme of the category. Traditionally the categories are arranged in hierarchical manner to achieve effective searching and indexing as well as easy comprehension for human beings. The determination of category themes and their hierarchical structures were most done by human experts. In this work, we developed an approach to automatically generate category themes and reveal the hierarchical structure among them. We also used the generated structure to categorize text documents. The document collection was trained by a self-organizing map to form two feature maps. These maps were then analyzed to obtain the category themes and their structure. Although the test corpus contains documents written in Chinese, the proposed approach can be applied to documents written in any language and such documents can be transformed into a list of separated terms.
引用
收藏
页码:47 / 67
页数:20
相关论文
共 50 条
[41]   Inductive Model Generation for Text Categorization using a Bipartite Heterogeneous Network [J].
Rossi, Rafael Geraldeli ;
Faleiros, Thiago de Paulo ;
Lopes, Alneu de Andrade ;
Rezende, Solange Oliveira .
12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2012), 2012, :1086-1091
[42]   Category Discrimination Based Feature Selection Algorithm in Chinese Text Classification [J].
Yi, Junkai ;
Yang, Guang ;
Wan, Jing .
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2016, 32 (05) :1145-1159
[43]   Chinese Text Categorization via Bottom-Up Weighted Word Clustering [J].
Wu, Yu-Chieh .
INTERNATIONAL JOURNAL OF ENTERPRISE INFORMATION SYSTEMS, 2015, 11 (01) :50-61
[44]   Particle Swarm Optimization Based Nearest Neighbor Algorithm on Chinese Text Categorization [J].
Cheng, Shi ;
Shi, Yuhui ;
Qin, Quande ;
Ting, T. O. .
2013 IEEE SYMPOSIUM ON SWARM INTELLIGENCE (SIS), 2013, :164-171
[45]   Automatic Multilabel Categorization using Learning to Rank Framework for Complaint Text on Bandung Government [J].
Fauzan, Ahmad ;
Khodra, Masayu Leylia .
2014 INTERNATIONAL CONFERENCE OF ADVANCED INFORMATICS: CONCEPT, THEORY AND APPLICATION (ICAICTA), 2014, :28-33
[46]   Application of a staged learning-based resource allocation network to automatic text categorization [J].
Song, Wei ;
Chen, Peng ;
Park, Soon Cheol .
NEUROCOMPUTING, 2015, 149 :1125-1134
[47]   Text mining based theme logic structure identification: application in library journals [J].
Zhu, Qing ;
Wu, Yiqiong ;
Li, Yuze ;
Han, Jing ;
Zhou, Xiaoyang .
LIBRARY HI TECH, 2018, 36 (03) :411-425
[48]   Multilabel text categorization based on a new linear classifier learning method and a category-sensitive refinement method [J].
Chang, Yu-Chuan ;
Chen, Shyi-Ming ;
Liau, Churn-Jung .
EXPERT SYSTEMS WITH APPLICATIONS, 2008, 34 (03) :1948-1953
[49]   AN EFFICIENT FEATURE SELECTION METHOD USING NAMED ENTITY RECOGNITION FOR CHINESE TEXT CATEGORIZATION [J].
Liu, Bin ;
Li, Chunping .
PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-6, 2009, :3527-+
[50]   The Analysis and Optimization of KNN Algorithm Space-Time Efficiency for Chinese Text Categorization [J].
Cai, Ying ;
Wang, Xiaofei .
ADVANCES IN COMPUTER SCIENCE, ENVIRONMENT, ECOINFORMATICS, AND EDUCATION, PT I, 2011, 214 :542-550