Automatic Category Theme Identification and Hierarchy Generation for Chinese Text Categorization

被引:0
|
作者
Hsin-Chang Yang
Chung-Hong Lee
机构
[1] Chang Jung University,Department of Information Management
[2] National Kaohsiung University of Applied Sciences,Department of Electrical Engineering
来源
Journal of Intelligent Information Systems | 2005年 / 25卷
关键词
automatic category theme identification; automatic category hierarchy generation; text categorization; self-organizing maps; text mining;
D O I
暂无
中图分类号
学科分类号
摘要
Recently research on text mining has attracted lots of attention from both industrial and academic fields. Text mining concerns of discovering unknown patterns or knowledge from a large text repository. The problem is not easy to tackle due to the semi-structured or even unstructured nature of those texts under consideration. Many approaches have been devised for mining various kinds of knowledge from texts. One important aspect of text mining is on automatic text categorization, which assigns a text document to some predefined category if the document falls into the theme of the category. Traditionally the categories are arranged in hierarchical manner to achieve effective searching and indexing as well as easy comprehension for human beings. The determination of category themes and their hierarchical structures were most done by human experts. In this work, we developed an approach to automatically generate category themes and reveal the hierarchical structure among them. We also used the generated structure to categorize text documents. The document collection was trained by a self-organizing map to form two feature maps. These maps were then analyzed to obtain the category themes and their structure. Although the test corpus contains documents written in Chinese, the proposed approach can be applied to documents written in any language and such documents can be transformed into a list of separated terms.
引用
收藏
页码:47 / 67
页数:20
相关论文
共 50 条
  • [11] Stemming Malay Text and Its Application in Automatic Text Categorization
    Yasukawa, Michiko
    Lim, Hui Tian
    Yokoo, Hidetoshi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2009, E92D (12): : 2351 - 2359
  • [12] Using kNN model for automatic text categorization
    Guo, GD
    Wang, H
    Bell, D
    Bi, YX
    Greer, K
    SOFT COMPUTING, 2006, 10 (05) : 423 - 430
  • [13] Using kNN model for automatic text categorization
    Gongde Guo
    Hui Wang
    David Bell
    Yaxin Bi
    Kieran Greer
    Soft Computing, 2006, 10 : 423 - 430
  • [14] Fully Automatic Text Categorization by Exploiting WordNet
    Li, Jianqiang
    Zhao, Yu
    Liu, Bo
    INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS, 2009, 5839 : 1 - 12
  • [15] Automatic text categorization based on angle distribution
    Liu, T
    Guo, J
    Proceedings of 2005 International Conference on Machine Learning and Cybernetics, Vols 1-9, 2005, : 3797 - 3801
  • [16] Automatic Assamese Text Categorization Using WordNet
    Sarmah, Jumi
    Barman, Anup Kumar
    Sarma, Shikhar Kr.
    2013 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2013, : 85 - 89
  • [17] Improving linear classifier for Chinese text categorization
    Tsay, JJ
    Wang, JD
    INFORMATION PROCESSING & MANAGEMENT, 2004, 40 (02) : 223 - 237
  • [18] Chinese text categorization based on CCIPCA and SMO
    Li, Xin-Fu
    He, Hai-Bin
    Zhao, Lei-Lei
    PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 2514 - 2518
  • [19] Text Categorization for Generation of a Historical Shipbuilding Ontology
    Artemova, Galina
    Boyarsky, Kirill
    Gouzevitch, Dmitri
    Gusarova, Natalia
    Dobrenko, Natalia
    Kanevsky, Eugeny
    Petrova, Daria
    KNOWLEDGE ENGINEERING AND THE SEMANTIC WEB, KESW 2014, 2014, 468 : 1 - 14
  • [20] An Improved Feature Weighting Strategy in Chinese Text Categorization
    Song, Jia
    Qin, Sijun
    Zhang, Pengzhou
    PROCEEDINGS OF THE 2015 6TH INTERNATIONAL CONFERENCE ON MANUFACTURING SCIENCE AND ENGINEERING, 2016, 32 : 202 - 208