Text Document Clustering: The Application of Cluster Analysis to Textual Document

被引：0

作者：

Reddy, Venkata Srikanth ^{[1
]}

Kinnicutt, Patrick ^{[1
]}

Lee, Roger ^{[1
]}

机构：

[1] Cent Michigan Univ, Dept Comp Sci, Mt Pleasant, MI 48859 USA

来源：

2016 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE & COMPUTATIONAL INTELLIGENCE (CSCI) | 2016年

关键词：

clustering; text; word sequence; group of words; apriori algorithm; space model; efficiency;

D O I：

10.1109/CSCI.2016.221

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Gathering the most relevant data for one's need, from the huge collection of data in the internet is a work of great difficult. To make it easier, we propose an application called text clustering, which is an automatic grouping of text documents into clusters, so that documents within a cluster defines the similarity between them, but they are not similar to documents in other clusters. Most of existing text clustering algorithms uses the traditional vector space model, which treats documents as group of words while the word sequences in the documents are ignored and the meaning of natural languages strongly depends on them. Our first objective is to implement a clustering algorithm in java, named Clustering based on Frequent Word Sequences. The frequent word sequences can provide compact and valuable information about the text documents. Our second objective is to use an association rule miner[13] to find the frequent two-word sets that satisfy the minimum support using Apriori Algorithm[2,5]. Our results will show that the finally compact documents will be more accurate and precise than the regular method documents.

引用

页码：1174 / 1179

页数：6

共 50 条

[1] Text Document Clustering: The Application of Cluster Analysis to Textual Document
2016, Institute of Electrical and Electronics Engineers Inc., United States
[2] Nonnegative factor analysis for text document clustering
Skovajsova, Lenka
Mokris, Igor
PROCEEDINGS OF THE 9TH WSEAS INTERNATIONAL CONFERENCE ON SIMULATION, MODELLING AND OPTIMIZATION, 2009, : 345 - +
[3] Text Document Preprocessing and Dimension Reduction Techniques for Text Document Clustering
Kadhim, Ammar Ismael
Cheah, Yu-N
Ahamed, Nurul Hashimah
PROCEEDINGS 2014 4TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE WITH APPLICATIONS IN ENGINEERING AND TECHNOLOGY ICAIET 2014, 2014, : 69 - 73
[4] Text document clustering and the space of concept on text document automatically generated
Fu, WP
Wu, B
He, Q
Shi, ZZ
2001 INTERNATIONAL CONFERENCES ON INFO-TECH AND INFO-NET PROCEEDINGS, CONFERENCE A-G: INFO-TECH & INFO-NET: A KEY TO BETTER LIFE, 2001, : C107 - C112
[5] Textual Document Clustering in Traditional and Modern Approaches
Yafooz, Wael M. S.
Abu Bakar, Zainab
Mithun, Ahamed M.
2018 IEEE CONFERENCE ON SYSTEMS, PROCESS AND CONTROL (ICSPC), 2018, : 159 - 164
[6] Textual Document Clustering using Topic Models
Sun, Xiaoping
2014 10TH INTERNATIONAL CONFERENCE ON SEMANTICS, KNOWLEDGE AND GRIDS (SKG), 2014, : 1 - 4
[7] Text document clustering based on neighbors
Luo, Congnan
Li, Yanjun
Chung, Soon M.
DATA & KNOWLEDGE ENGINEERING, 2009, 68 (11) : 1271 - 1288
[8] Ontologies improve text document clustering
Hotho, A
Staab, S
Stumme, G
THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2003, : 541 - 544
[9] Text Document Clustering with Metric Learning
Wang, Jinlong
Wu, Shunyao
Huy Quan Vu
Li, Gang
SIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL, 2010, : 783 - 784
[10] INTER-DOCUMENT REFERENCE DETECTION AS AN ALTERNATIVE TO FULL TEXT SEMANTIC ANALYSIS IN DOCUMENT CLUSTERING
De Maziere, Patrick A.
Van Hulle, Marc M.
2013 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2013,

← 1 2 3 4 5 →