The design and implementation of a subject-oriented web information classification system

被引:2
作者
Huang, YS [1 ]
Wang, QP [1 ]
Yang, J [1 ]
Ding, Q [1 ]
机构
[1] China Univ Min & Technol, Sch Comp, Xuzhou 221008, Jiangsu, Peoples R China
来源
PROCEEDINGS OF THE NINTH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, VOLS 1 AND 2 | 2005年
关键词
search engine; data mining; classification; cluster; word frequency;
D O I
10.1109/CSCWD.2005.194294
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the explosive growth of World-Wide Web, it is becoming increasingly difficult for users to collect and analyze web pages that are relevant to a particular subject. In this paper, a Subject-oriented Web Information Classification System (WICS) is presented, by which web pages can be efficiently collected and classified into several subjects, and the search results are provided to users. Based on analyzing the ordinary search engines, web text mining is introduced and applied to the WICS. The text preprocessing, index, inverted files and Vector Space Distance algorithm (Vector Space Model, VSM are brought forward in the prototype. The initial experiments show that classify Web information by the prototype makes convenience for users to inquire information; the relevancy and precision are improved.
引用
收藏
页码:836 / 840
页数:5
相关论文
共 10 条
  • [1] CHEN M, 2002, DESIGN IMPLEMENTATIO
  • [2] CHEN Y, 2003, COMPUTER SCI, V30
  • [3] Han J.M. Kamber., 2001, DATA MINING CONCEPT
  • [4] HAND D, 2003, PRINCIPLES DATA MINI
  • [5] Expression of Drosophila neoplastic tumor suppressor genes discslarge, scribble, and lethal giant larvae in the mammalian ovary
    Huang, JHY
    Rajkovic, A
    Szafranski, P
    Ochsner, S
    Richards, J
    Goode, S
    [J]. GENE EXPRESSION PATTERNS, 2003, 3 (01) : 3 - 11
  • [6] LI GH, 2003, INFORMATION ORG RETR
  • [7] SHEN JQ, 2003, COMPUTER ENG OCT, V29
  • [8] TANG J, 2003, COMPUTER SCI, V30
  • [9] XU BW, 2003, SEARCH ENGINE TECHNO
  • [10] ZHOU J, 2003, C DISSECTION