Self organization of a massive document collection

被引:526
作者
Kohonen, T [1 ]
Kaski, S [1 ]
Lagus, K [1 ]
Salojärvi, J [1 ]
Honkela, J [1 ]
Paatero, V [1 ]
Saarela, A [1 ]
机构
[1] Aalto Univ, Neural Networks Res Ctr, FIN-02150 Espoo, Finland
来源
IEEE TRANSACTIONS ON NEURAL NETWORKS | 2000年 / 11卷 / 03期
基金
芬兰科学院;
关键词
data mining; exploratory data analysis; knowledge discovery; large databases; parallel implementation; random projection; self-organizing map (SOM); textual documents;
D O I
10.1109/72.846729
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article describes the implementation of a system that is able to organize vast document collections according to textual similarities. It is based on the self-organizing map (SOM) algorithm. As the feature vectors for the documents statistical representations of their vocabularies are used. The main goal in our work: has been to scale up the SOM algorithm to be able to deal with large amounts of high-dimensional data. In a practical experiment we mapped 6 840 568 patent abstracts onto a 1 002 240-node SOM, As the feature vectors we used 500-dimensional vectors of stochastic figures obtained as random projections of weighted word histograms.
引用
收藏
页码:574 / 585
页数:12
相关论文
共 53 条
[21]   SELF-ORGANIZED FORMATION OF TOPOLOGICALLY CORRECT FEATURE MAPS [J].
KOHONEN, T .
BIOLOGICAL CYBERNETICS, 1982, 43 (01) :59-69
[22]  
Kohonen T, 1997, 1997 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, pPL1, DOI 10.1109/ICNN.1997.611622
[23]  
KOHONEN T, 1993, 1993 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOLS 1-3, P1147, DOI 10.1109/ICNN.1993.298719
[24]  
KOHONEN T, 1996, P INT C ART NEUR NET, P269
[25]  
KOHONEN T, 1992, S NEUR NETW ALL PERS
[26]  
Kohonen T., 1997, Self-organizing Maps, V2nd ed.
[27]  
KOHONEN T, 1998, P ICANN98 8 INT C AR, V1, P65
[28]  
Kohonen T., 1996, A31 HELS U TECHN LAB
[29]  
KOIKKALAINEN P, 1994, P ECAI 94 11 EUR C A, P211
[30]  
KOIKKALAINEN P, 1995, P ICANN 95 INT C ART, V2, P63