Self organization of a massive document collection

被引:521
作者
Kohonen, T [1 ]
Kaski, S [1 ]
Lagus, K [1 ]
Salojärvi, J [1 ]
Honkela, J [1 ]
Paatero, V [1 ]
Saarela, A [1 ]
机构
[1] Aalto Univ, Neural Networks Res Ctr, FIN-02150 Espoo, Finland
来源
IEEE TRANSACTIONS ON NEURAL NETWORKS | 2000年 / 11卷 / 03期
基金
芬兰科学院;
关键词
data mining; exploratory data analysis; knowledge discovery; large databases; parallel implementation; random projection; self-organizing map (SOM); textual documents;
D O I
10.1109/72.846729
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article describes the implementation of a system that is able to organize vast document collections according to textual similarities. It is based on the self-organizing map (SOM) algorithm. As the feature vectors for the documents statistical representations of their vocabularies are used. The main goal in our work: has been to scale up the SOM algorithm to be able to deal with large amounts of high-dimensional data. In a practical experiment we mapped 6 840 568 patent abstracts onto a 1 002 240-node SOM, As the feature vectors we used 500-dimensional vectors of stochastic figures obtained as random projections of weighted word histograms.
引用
收藏
页码:574 / 585
页数:12
相关论文
共 53 条
  • [1] [Anonymous], SAGE U PAPERS SERIES
  • [2] [Anonymous], 1952, Psychometrika
  • [3] [Anonymous], ENCY STAT SCI
  • [4] CHEN H, 1998, IEEE COMPUTER AUG, P75
  • [5] Internet categorization and search: A self-organizing approach
    Chen, HC
    Schuffels, C
    Orwig, R
    [J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 1996, 7 (01) : 88 - 102
  • [6] Convergence and ordering of Kohonen's batch map
    Cheng, YZ
    [J]. NEURAL COMPUTATION, 1997, 9 (08) : 1667 - 1676
  • [7] de Leeuw Jan., 1982, Handbook of statistics, V2, P285
  • [8] DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
  • [9] 2-9
  • [10] Drineas P, 1999, PROCEEDINGS OF THE TENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P291