Task-oriented World Wide Web retrieval by document type classification

被引:7
作者
Matsuda, K [1 ]
Fukushima, T [1 ]
机构
[1] NEC Corp Ltd, Human Media Res Labs, Nara 6300101, Japan
来源
PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON INFORMATION KNOWLEDGE MANAGEMENT, CIKM'99 | 1999年
关键词
WWW; information retrieval; document type; classification;
D O I
10.1145/319950.319964
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a novel approach to accurately searching Web pages for relevant information in problem solving by specifying a Web document category instead of the user's task. Accessing information from World Wide Web pages as an approach to problem solving has become commonplace. However, such a search is difficult with current search services, since these services only provide keyword-based search methods that are equivalent to narrowing down the target references according to domains. However, problem solving usually involves both a domain and a task. Accordingly, our approach is based on problem solving tasks. To specify a user's problem solving task, we introduce the concept of document types that directly relate to the problem solving tasks; with this approach, users can easily designate problem solving tasks. We implemented PageTypeSearch system based on our approach. Classifier of PageTypeSearch classifies Web pages into the document types by comparing their pages with typical structural characteristics of the types. We compare PageTypeSearch using the document type-indices with a conventional keyword-based search system in experiments. The average precision of the document type-based search is 88.9%, while the average precision of the keyword-based search is 31.2%. Moreover, the number of irrelevant references gathered by our system is about one-thirteenth that of traditional keyword-based search systems. Our approach has practical advantages for problem solving by introducing the viewpoint of tasks to achieve higher performance.
引用
收藏
页码:109 / 113
页数:5
相关论文
共 9 条
[1]   AUTOMATED LEARNING OF DECISION RULES FOR TEXT CATEGORIZATION [J].
APTE, C ;
DAMERAU, F ;
WEISS, SM .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 1994, 12 (03) :233-251
[2]  
BERNARD M, 1998, SIGIR FORUM, V32, P5
[3]  
BURKE RD, 1997, TR9705 U CHIC DEP CO
[4]  
COHEN WW, 1996, P 19 ANN INT ACM SIG, P307
[5]  
Craven M, 1998, FIFTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-98) AND TENTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICAL INTELLIGENCE (IAAI-98) - PROCEEDINGS, P509
[6]  
DOORENBOS RB, 1996, UWCSE960103 DEP COMP
[7]  
LAM W, 1997, P 15 INT JOINT C ART, P745
[8]  
Shakes J., 1997, P 6 INT WORLD WID WE, P189
[9]  
[No title captured]