Analysis of web clustering based on genetic algorithm with latent semantic indexing technology

被引:2
作者
Song, Wei [1 ]
Park, Soon Cheol [1 ]
机构
[1] Chonbuk Natl Univ Korea, Div Elect & Informat Engn, Chonju, South Korea
来源
ALPIT 2007: PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON ADVANCED LANGUAGE PROCESSING AND WEB INFORMATION TECHNOLOGY | 2007年
关键词
D O I
10.1109/ALPIT.2007.77
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper constructed a latent semantic text model using genetic algorithm (GA) for web clustering. The main difficulty in the application of GA for text clustering is thousands or even tens of thousands of dimensions in the feature space. Latent semantic indexing (LSI) is a successful technology which attempts to explore the latent semantics structure in textual data, and furthermore, it reduces this large space to smaller one and provides a robust space for clustering. GA belongs to search techniques that efficiently evolve the optimal solution for the problem. Evolved in the reduced latent semantic indexing model, GA can improve clustering accuracy and speed which is typically suitable for real time clustering. We used SSTRESS criteria to analyze the dissimilarity between original term-by-document corpus matrix and the approximate decomposition matrix with different ranks corresponding to the performance of our algorithm evolved in the reduced space. The superiority of GA applied in LSI model over K-means and conventional GA in the vector space model (VSM) is demonstrated by providing good Reuter text clustering results.
引用
收藏
页码:21 / +
页数:2
相关论文
共 50 条
  • [31] Vantage Point Latent Semantic Indexing for multimedia web document search
    Srikanth, D.
    Sakthivel, S.
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 5): : 10587 - 10594
  • [32] Web personalization using Extended Boolean operations with latent semantic indexing
    Nakov, P
    ARTIFICIAL INTELLIGENCE: METHODOLOGY, SYSTEMS, APPLICATIONS, PROCEEDINGS, 2000, 1904 : 189 - 198
  • [33] Latent Semantic Analysis (LSA) Based Object Recognition and Clustering
    Hebballi, Vinaykumar
    Rojit, Vidhu
    2015 INTERNATIONAL CONFERENCE ON GREEN COMPUTING AND INTERNET OF THINGS (ICGCIOT), 2015, : 416 - 421
  • [34] An Indexing Algorithm Based on Clustering of Minutia Cylinder Codes for Fast Latent Fingerprint Identification
    Perez-Sanchez, Ismay
    Cervantes, Barbara
    Angel Medina-Perez, Miguel
    Monroy, Raul
    Loyola-Gonzalez, Octavio
    Garcia, Salvador
    Herrera, Francisco
    IEEE ACCESS, 2021, 9 : 85488 - 85499
  • [35] Automatic Indexing of Journal Abstracts with Latent Semantic Analysis
    Adams, Joel Robert
    Bedrick, Steven
    EXPERIMENTAL IR MEETS MULTILINGUALITY, MULTIMODALITY, AND INTERACTION, 2015, 9283 : 200 - 208
  • [36] Quantitative cross impact analysis with latent semantic indexing
    Thorleuchter, Dirk
    Van den Poel, Dirk
    EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (02) : 406 - 411
  • [37] Random indexing of text samples for latent semantic analysis
    Kanerva, P
    Kristoferson, J
    Holst, H
    PROCEEDINGS OF THE TWENTY-SECOND ANNUAL CONFERENCE OF THE COGNITIVE SCIENCE SOCIETY, 2000, : 1036 - 1036
  • [38] Personal information retrieval based on latent semantic indexing
    Yang, Z
    Deng, GS
    PROCEEDINGS OF 2002 INTERNATIONAL CONFERENCE ON MANAGEMENT SCIENCE & ENGINEERING, VOLS I AND II, 2002, : 287 - 291
  • [39] Using latent semantic indexing for literature based discovery
    Gordon, MD
    Dumais, S
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1998, 49 (08): : 674 - 685
  • [40] Automatic text summarization based on latent semantic indexing
    Ai, Dongmei
    Zheng, Yuchao
    Zhang, Dezheng
    ARTIFICIAL LIFE AND ROBOTICS, 2010, 15 (01) : 25 - 29