Automatic category theme identification and hierarchy generation for Chinese text categorization

被引:5
|
作者
Yang, HC [1 ]
Lee, CH
机构
[1] Chang Jung Univ, Dept Informat Management, Tainan, Taiwan
[2] Natl Kaohsiung Univ Appl Sci, Dept Elect Engn, Kaohsiung, Taiwan
关键词
automatic category theme identification; automatic category hierarchy generation; text categorization; self-organizing maps; text mining;
D O I
10.1007/s10844-005-0859-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently research on text mining has attracted lots of attention from both industrial and academic fields. Text mining concerns of discovering unknown patterns or knowledge from a large text repository. The problem is not easy to tackle due to the semi-structured or even unstructured nature of those texts under consideration. Many approaches have been devised for mining various kinds of knowledge from texts. One important aspect of text mining is on automatic text categorization, which assigns a text document to some predefined category if the document falls into the theme of the category. Traditionally the categories are arranged in hierarchical manner to achieve effective searching and indexing as well as easy comprehension for human beings. The determination of category themes and their hierarchical structures were most done by human experts. In this work, we developed an approach to automatically generate category themes and reveal the hierarchical structure among them. We also used the generated structure to categorize text documents. The document collection was trained by a self-organizing map to form two feature maps. These maps were then analyzed to obtain the category themes and their structure. Although the test corpus contains documents written in Chinese, the proposed approach can be applied to documents written in any language and such documents can be transformed into a list of separated terms.
引用
收藏
页码:47 / 67
页数:21
相关论文
共 50 条
  • [1] Automatic Category Theme Identification and Hierarchy Generation for Chinese Text Categorization
    Hsin-Chang Yang
    Chung-Hong Lee
    Journal of Intelligent Information Systems, 2005, 25 : 47 - 67
  • [2] The Chinese text categorization system with association rule and category priority
    Chiang, Ding-An
    Keh, Huan-Chao
    Huang, Hui-Hua
    Chyr, Derming
    EXPERT SYSTEMS WITH APPLICATIONS, 2008, 35 (1-2) : 102 - 110
  • [3] Exploiting hierarchy in text categorization
    Weigend A.S.
    Wiener E.D.
    Pedersen J.O.
    Information Retrieval, 1999, 1 (3): : 193 - 216
  • [4] Research on Chinese Text Automatic Categorization Based on VSM
    Tong Xiao-Jun
    Cui Ming-Gen
    Song Guo-Long
    2007 INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-15, 2007, : 3863 - +
  • [5] Automatic category generation for text documents by self-organizing maps
    Yang, HC
    Lee, CH
    IJCNN 2000: PROCEEDINGS OF THE IEEE-INNS-ENNS INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOL III, 2000, : 581 - 586
  • [6] Automatic text categorization with learning logic
    Al-Mubaid, H
    Siddiqui, MS
    COMPUTER APPLICATIONS IN INDUSTRY AND ENGINEERING, 2003, : 178 - 183
  • [7] Automatic generation of text categorization rules in a hybrid method based on machine learning
    Lana-Serrano, Sara
    Villena-Roman, Julio
    Collada-Perez, Sonia
    Carlos Gonzalez-Cristobal, Jose
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2011, (47): : 231 - 237
  • [8] Automatic text categorization:: Case study
    Corrêa, RF
    Ludermir, TB
    VII BRAZILIAN SYMPOSIUM ON NEURAL NETWORKS, PROCEEDINGS, 2002, : 150 - 150
  • [9] Automatic expert identification using a text categorization technique in knowledge management systems
    Yang, Kun-Woo
    Huh, Soon-Young
    EXPERT SYSTEMS WITH APPLICATIONS, 2008, 34 (02) : 1445 - 1455
  • [10] Automatic text categorization and its application to text retrieval
    Lam, W
    Ruiz, M
    Srinivasan, P
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1999, 11 (06) : 865 - 879