A machine learning approach to building domain-specific search engines

被引:0
作者
McCallum, A [1 ]
Nigam, K [1 ]
Rennie, J [1 ]
Seymore, K [1 ]
机构
[1] Just Res, Pittsburgh, PA 15213 USA
来源
IJCAI-99: PROCEEDINGS OF THE SIXTEENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOLS 1 & 2 | 1999年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Domain-specific search engines are becoming increasingly popular because they offer increased accuracy and extra features not possible with general, Web-wide search engines. Unfortunately, they are also difficult and time-consuming to maintain. This paper proposes the use of machine learning techniques to greatly automate the creation and maintenance of domain-specific search engines. We describe new research in reinforcement learning, text classification and information extraction that enables efficient spidering, populates topic hierarchies, and identifies informative text segments. Using these techniques, we have built a demonstration system: a search engine for computer science research papers available at www.cora.justresearch.com.
引用
收藏
页码:662 / 667
页数:6
相关论文
共 18 条
[1]  
BIKEL D, 1997, ANLP 97
[2]  
BOLLACKER K, 1998, AGENTS 98
[3]  
Boyan J, 1996, AAAI WORKSH INT BAS
[4]  
COHEN W, 1998, AGENTS 98
[5]  
Craven M., 1998, AAAI 98
[6]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[7]  
JOACHIMS T, 1997, IJCAI 97
[8]   Reinforcement learning: A survey [J].
Kaelbling, LP ;
Littman, ML ;
Moore, AW .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 1996, 4 :237-285
[9]  
LEEK T, 1997, THESIS UCSD
[10]  
MCCALLUM A, 1998, ICML 98