Self organization of a massive document collection

被引：526

作者：

Kohonen, T ^{[1
]}

Kaski, S ^{[1
]}

Lagus, K ^{[1
]}

Salojärvi, J ^{[1
]}

Honkela, J ^{[1
]}

Paatero, V ^{[1
]}

Saarela, A ^{[1
]}

机构：

[1] Aalto Univ, Neural Networks Res Ctr, FIN-02150 Espoo, Finland

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS | 2000年 / 11卷 / 03期

基金：

芬兰科学院;

关键词：

data mining; exploratory data analysis; knowledge discovery; large databases; parallel implementation; random projection; self-organizing map (SOM); textual documents;

D O I：

10.1109/72.846729

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This article describes the implementation of a system that is able to organize vast document collections according to textual similarities. It is based on the self-organizing map (SOM) algorithm. As the feature vectors for the documents statistical representations of their vocabularies are used. The main goal in our work: has been to scale up the SOM algorithm to be able to deal with large amounts of high-dimensional data. In a practical experiment we mapped 6 840 568 patent abstracts onto a 1 002 240-node SOM, As the feature vectors we used 500-dimensional vectors of stochastic figures obtained as random projections of weighted word histograms.

引用

页码：574 / 585

页数：12

共 53 条

[1]

[Anonymous], SAGE U PAPERS SERIES

[2]

[Anonymous], 1952, Psychometrika

[3]

[Anonymous], ENCY STAT SCI

[4]

CHEN H, 1998, IEEE COMPUTER AUG, P75

[5] Internet categorization and search: A self-organizing approach [J].

Chen, HC ;

Schuffels, C ;

Orwig, R .

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 1996, 7 (01) :88-102

[6] Convergence and ordering of Kohonen's batch map [J].

Cheng, YZ .

NEURAL COMPUTATION, 1997, 9 (08) :1667-1676

[7]

de Leeuw Jan., 1982, Handbook of statistics, V2, P285

[8]

DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO

[9]

2-9

[10]

Drineas P, 1999, PROCEEDINGS OF THE TENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P291

← 1 2 3 4 5 6 →