Automatic Category Theme Identification and Hierarchy Generation for Chinese Text Categorization

被引:0
作者
Hsin-Chang Yang
Chung-Hong Lee
机构
[1] Chang Jung University,Department of Information Management
[2] National Kaohsiung University of Applied Sciences,Department of Electrical Engineering
来源
Journal of Intelligent Information Systems | 2005年 / 25卷
关键词
automatic category theme identification; automatic category hierarchy generation; text categorization; self-organizing maps; text mining;
D O I
暂无
中图分类号
学科分类号
摘要
Recently research on text mining has attracted lots of attention from both industrial and academic fields. Text mining concerns of discovering unknown patterns or knowledge from a large text repository. The problem is not easy to tackle due to the semi-structured or even unstructured nature of those texts under consideration. Many approaches have been devised for mining various kinds of knowledge from texts. One important aspect of text mining is on automatic text categorization, which assigns a text document to some predefined category if the document falls into the theme of the category. Traditionally the categories are arranged in hierarchical manner to achieve effective searching and indexing as well as easy comprehension for human beings. The determination of category themes and their hierarchical structures were most done by human experts. In this work, we developed an approach to automatically generate category themes and reveal the hierarchical structure among them. We also used the generated structure to categorize text documents. The document collection was trained by a self-organizing map to form two feature maps. These maps were then analyzed to obtain the category themes and their structure. Although the test corpus contains documents written in Chinese, the proposed approach can be applied to documents written in any language and such documents can be transformed into a list of separated terms.
引用
收藏
页码:47 / 67
页数:20
相关论文
共 50 条
  • [21] Automatic Text Categorization of Marathi Documents Using Clustering Technique
    Vispute, Sushma R.
    Potey, M. A.
    2013 15TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING TECHNOLOGIES (ICACT), 2013,
  • [22] Supervised and Traditional Term Weighting Methods for Automatic Text Categorization
    Lan, Man
    Tan, Chew Lim
    Su, Jian
    Lu, Yue
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2009, 31 (04) : 721 - 735
  • [23] Dimensionality reduction by combining category information and latent semantic index for text categorization
    Zheng, Wenbin
    An, Lixin
    Xu, Zhanyi
    Journal of Information and Computational Science, 2013, 10 (08): : 2463 - 2469
  • [24] CHINESE TEXT CATEGORIZATION STUDY BASED ON FEATURE WEIGHT LEARNING
    Zhan, Yan
    Chen, Hao
    Zhang, Su-Fang
    Zheng, Mei
    PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-6, 2009, : 1723 - +
  • [25] A Fast Algorithm for Chinese Text Categorization Based on Key Tree
    Liu Xin
    Liu Renren
    He Wenjing
    INFORMATION TECHNOLOGY FOR MANUFACTURING SYSTEMS II, PTS 1-3, 2011, 58-60 : 1106 - +
  • [26] Document Representation Combining Concepts and Words in Chinese Text Categorization
    Che, Chao
    Teng, HongFei
    IEEE NLP-KE 2009: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, 2009, : 540 - 544
  • [27] Inverse-Category-Frequency Based Supervised Term Weighting Schemes for Text Categorization
    Wang, Deqing
    Zhang, Hui
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2013, 29 (02) : 209 - 225
  • [28] A new feature selection based on comprehensive measurement both in inter-category and intra-category for text categorization
    Yang, Jieming
    Liu, Yuanning
    Zhu, Xiaodong
    Liu, Zhen
    Zhang, Xiaoxu
    INFORMATION PROCESSING & MANAGEMENT, 2012, 48 (04) : 741 - 754
  • [29] Automatic text categorization based on content analysis with cognitive situation models
    Guo, Yi
    Shao, Zhiqing
    Hua, Nan
    INFORMATION SCIENCES, 2010, 180 (05) : 613 - 630
  • [30] Automatic categorization of web text documents using fuzzy inference rule
    Dhar, Ankita
    Mukherjee, Himadri
    Dash, Niladri Sekhar
    Roy, Kaushik
    SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2020, 45 (01):