Internet categorization and search: A self-organizing approach

被引:136
作者
Chen, HC
Schuffels, C
Orwig, R
机构
[1] Mgmt. Information Systems Department, University of Arizona, Tucson
基金
美国国家科学基金会;
关键词
D O I
10.1006/jvci.1996.0008
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The problems of information overload and vocabulary differences have become more pressing with the emergence of increasingly popular Internet services. The main information retrieval mechanisms provided by the prevailing Internet WWW software are based on either keyword search (e.g., the Lycos server at CNU, the Yahoo server at Stanford) or hypertext browsing (e.g., Mosaic and Netscape). This research aims to provide an alternative concept-based categorization and search capability for WWW servers based on selected machine learning algorithms. Our proposed approach, which is grounded on automatic textual analysis of Internet documents (homepages), attempts to address the Internet search problem by first categorizing the content of Internet documents. We report results of our recent testing of a multilayered neural network clustering algorithm employing the Kohonen self-organizing feature map to categorize (classify) Internet homepages according to their content. The category hierarchies created could serve to partition the vast Internet services into subject-specific categories and databases and improve Internet keyword searching and/or browsing. (C) 1996 Academic Press, Inc.
引用
收藏
页码:88 / 102
页数:15
相关论文
共 66 条
  • [1] [Anonymous], 1993, PROTOCOL ANAL
  • [2] [Anonymous], 1987, COMMUNICATIONS ACM
  • [3] BELEW RK, 1989, 12TH P ANN INT ACM S, P11
  • [4] THE WORLDWIDE WEB
    BERNERSLEE, T
    CAILLIAU, R
    LUOTONEN, A
    NIELSEN, HF
    SECRET, A
    [J]. COMMUNICATIONS OF THE ACM, 1994, 37 (08) : 76 - &
  • [5] AN EVALUATION OF RETRIEVAL EFFECTIVENESS FOR A FULL-TEXT DOCUMENT-RETRIEVAL SYSTEM
    BLAIR, DC
    MARON, ME
    [J]. COMMUNICATIONS OF THE ACM, 1985, 28 (03) : 289 - 299
  • [6] BLOSSEVILLE MJ, 1992, P 15 ANN INT ACM SIG, P51
  • [7] BOWMAN C, 1994, P 2 INT WORLD WID WE
  • [8] SCALABLE INTERNET RESOURCE DISCOVERY - RESEARCH PROBLEMS AND APPROACHES
    BOWMAN, CM
    DANZIG, PB
    MANBER, U
    SCHWARTZ, MF
    [J]. COMMUNICATIONS OF THE ACM, 1994, 37 (08) : 98 - &
  • [9] Breiman L., 1984, Biometrics, V40, P358
  • [10] BROWSING IN HYPERTEXT - A COGNITIVE STUDY
    CARMEL, E
    CRAWFORD, S
    CHEN, HC
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1992, 22 (05): : 865 - 884