The design and implementation of a subject-oriented web information classification system

被引：2

作者：

Huang, YS ^{[1
]}

Wang, QP ^{[1
]}

Yang, J ^{[1
]}

Ding, Q ^{[1
]}

机构：

[1] China Univ Min & Technol, Sch Comp, Xuzhou 221008, Jiangsu, Peoples R China

来源：

PROCEEDINGS OF THE NINTH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, VOLS 1 AND 2 | 2005年

关键词：

search engine; data mining; classification; cluster; word frequency;

D O I：

10.1109/CSCWD.2005.194294

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

With the explosive growth of World-Wide Web, it is becoming increasingly difficult for users to collect and analyze web pages that are relevant to a particular subject. In this paper, a Subject-oriented Web Information Classification System (WICS) is presented, by which web pages can be efficiently collected and classified into several subjects, and the search results are provided to users. Based on analyzing the ordinary search engines, web text mining is introduced and applied to the WICS. The text preprocessing, index, inverted files and Vector Space Distance algorithm (Vector Space Model, VSM are brought forward in the prototype. The initial experiments show that classify Web information by the prototype makes convenience for users to inquire information; the relevancy and precision are improved.

引用

页码：836 / 840

页数：5

共 10 条

[1] CHEN M, 2002, DESIGN IMPLEMENTATIO
[2] CHEN Y, 2003, COMPUTER SCI, V30
[3] Han J.M. Kamber., 2001, DATA MINING CONCEPT
[4] HAND D, 2003, PRINCIPLES DATA MINI
[5] Expression of Drosophila neoplastic tumor suppressor genes discslarge, scribble, and lethal giant larvae in the mammalian ovary
Huang, JHY
Rajkovic, A
Szafranski, P
Ochsner, S
Richards, J
Goode, S
[J]. GENE EXPRESSION PATTERNS, 2003, 3 (01) : 3 - 11
[6] LI GH, 2003, INFORMATION ORG RETR
[7] SHEN JQ, 2003, COMPUTER ENG OCT, V29
[8] TANG J, 2003, COMPUTER SCI, V30
[9] XU BW, 2003, SEARCH ENGINE TECHNO
[10] ZHOU J, 2003, C DISSECTION

← 1 →