Text Document Clustering: The Application of Cluster Analysis to Textual Document

被引:0
|
作者
Reddy, Venkata Srikanth [1 ]
Kinnicutt, Patrick [1 ]
Lee, Roger [1 ]
机构
[1] Cent Michigan Univ, Dept Comp Sci, Mt Pleasant, MI 48859 USA
来源
2016 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE & COMPUTATIONAL INTELLIGENCE (CSCI) | 2016年
关键词
clustering; text; word sequence; group of words; apriori algorithm; space model; efficiency;
D O I
10.1109/CSCI.2016.221
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Gathering the most relevant data for one's need, from the huge collection of data in the internet is a work of great difficult. To make it easier, we propose an application called text clustering, which is an automatic grouping of text documents into clusters, so that documents within a cluster defines the similarity between them, but they are not similar to documents in other clusters. Most of existing text clustering algorithms uses the traditional vector space model, which treats documents as group of words while the word sequences in the documents are ignored and the meaning of natural languages strongly depends on them. Our first objective is to implement a clustering algorithm in java, named Clustering based on Frequent Word Sequences. The frequent word sequences can provide compact and valuable information about the text documents. Our second objective is to use an association rule miner[13] to find the frequent two-word sets that satisfy the minimum support using Apriori Algorithm[2,5]. Our results will show that the finally compact documents will be more accurate and precise than the regular method documents.
引用
收藏
页码:1174 / 1179
页数:6
相关论文
共 50 条
  • [1] Text Document Clustering: The Application of Cluster Analysis to Textual Document
    2016, Institute of Electrical and Electronics Engineers Inc., United States
  • [2] Nonnegative factor analysis for text document clustering
    Skovajsova, Lenka
    Mokris, Igor
    PROCEEDINGS OF THE 9TH WSEAS INTERNATIONAL CONFERENCE ON SIMULATION, MODELLING AND OPTIMIZATION, 2009, : 345 - +
  • [3] Text Document Preprocessing and Dimension Reduction Techniques for Text Document Clustering
    Kadhim, Ammar Ismael
    Cheah, Yu-N
    Ahamed, Nurul Hashimah
    PROCEEDINGS 2014 4TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE WITH APPLICATIONS IN ENGINEERING AND TECHNOLOGY ICAIET 2014, 2014, : 69 - 73
  • [4] Text document clustering and the space of concept on text document automatically generated
    Fu, WP
    Wu, B
    He, Q
    Shi, ZZ
    2001 INTERNATIONAL CONFERENCES ON INFO-TECH AND INFO-NET PROCEEDINGS, CONFERENCE A-G: INFO-TECH & INFO-NET: A KEY TO BETTER LIFE, 2001, : C107 - C112
  • [5] Textual Document Clustering in Traditional and Modern Approaches
    Yafooz, Wael M. S.
    Abu Bakar, Zainab
    Mithun, Ahamed M.
    2018 IEEE CONFERENCE ON SYSTEMS, PROCESS AND CONTROL (ICSPC), 2018, : 159 - 164
  • [6] Textual Document Clustering using Topic Models
    Sun, Xiaoping
    2014 10TH INTERNATIONAL CONFERENCE ON SEMANTICS, KNOWLEDGE AND GRIDS (SKG), 2014, : 1 - 4
  • [7] Text document clustering based on neighbors
    Luo, Congnan
    Li, Yanjun
    Chung, Soon M.
    DATA & KNOWLEDGE ENGINEERING, 2009, 68 (11) : 1271 - 1288
  • [8] Ontologies improve text document clustering
    Hotho, A
    Staab, S
    Stumme, G
    THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2003, : 541 - 544
  • [9] Text Document Clustering with Metric Learning
    Wang, Jinlong
    Wu, Shunyao
    Huy Quan Vu
    Li, Gang
    SIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL, 2010, : 783 - 784
  • [10] INTER-DOCUMENT REFERENCE DETECTION AS AN ALTERNATIVE TO FULL TEXT SEMANTIC ANALYSIS IN DOCUMENT CLUSTERING
    De Maziere, Patrick A.
    Van Hulle, Marc M.
    2013 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2013,