Self organization of a massive document collection

被引:521
作者
Kohonen, T [1 ]
Kaski, S [1 ]
Lagus, K [1 ]
Salojärvi, J [1 ]
Honkela, J [1 ]
Paatero, V [1 ]
Saarela, A [1 ]
机构
[1] Aalto Univ, Neural Networks Res Ctr, FIN-02150 Espoo, Finland
来源
IEEE TRANSACTIONS ON NEURAL NETWORKS | 2000年 / 11卷 / 03期
基金
芬兰科学院;
关键词
data mining; exploratory data analysis; knowledge discovery; large databases; parallel implementation; random projection; self-organizing map (SOM); textual documents;
D O I
10.1109/72.846729
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article describes the implementation of a system that is able to organize vast document collections according to textual similarities. It is based on the self-organizing map (SOM) algorithm. As the feature vectors for the documents statistical representations of their vocabularies are used. The main goal in our work: has been to scale up the SOM algorithm to be able to deal with large amounts of high-dimensional data. In a practical experiment we mapped 6 840 568 patent abstracts onto a 1 002 240-node SOM, As the feature vectors we used 500-dimensional vectors of stochastic figures obtained as random projections of weighted word histograms.
引用
收藏
页码:574 / 585
页数:12
相关论文
共 53 条
  • [21] SELF-ORGANIZED FORMATION OF TOPOLOGICALLY CORRECT FEATURE MAPS
    KOHONEN, T
    [J]. BIOLOGICAL CYBERNETICS, 1982, 43 (01) : 59 - 69
  • [22] Kohonen T, 1997, 1997 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, pPL1, DOI 10.1109/ICNN.1997.611622
  • [23] KOHONEN T, 1993, 1993 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOLS 1-3, P1147, DOI 10.1109/ICNN.1993.298719
  • [24] KOHONEN T, 1996, P INT C ART NEUR NET, P269
  • [25] KOHONEN T, 1992, S NEUR NETW ALL PERS
  • [26] Kohonen T., 1997, Self-organizing Maps, V2nd ed.
  • [27] KOHONEN T, 1998, P ICANN98 8 INT C AR, V1, P65
  • [28] Kohonen T., 1996, A31 HELS U TECHN LAB
  • [29] KOIKKALAINEN P, 1994, P ECAI 94 11 EUR C A, P211
  • [30] KOIKKALAINEN P, 1995, P ICANN 95 INT C ART, V2, P63