Relational learning with statistical predicate invention: Better models for hypertext

被引:48
作者
Craven, M [1 ]
Slattery, S
机构
[1] Univ Wisconsin, Dept Biostat & Med Informat, Madison, WI 53706 USA
[2] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
关键词
relational learning; text categorization; predicate invention; Naive Bayes;
D O I
10.1023/A:1007676901476
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a new approach to learning hypertext classifiers that combines a statistical text-learning method with a relational rule learner. This approach is well suited to learning in hypertext domains because its statistical component allows it to characterize text in terms of word frequencies, whereas its relational component is able to describe how neighboring documents are related to each other by hyperlinks that connect them. We evaluate our approach by applying it to tasks that involve learning definitions for (i) classes of pages, (ii) particular relations that exist between pairs of pages, and (iii) locating a particular class of information in the internal structure of pages. Our experiments demonstrate that this new approach is able to learn more accurate classifiers than either of its constituent methods alone.
引用
收藏
页码:97 / 119
页数:23
相关论文
共 34 条
[1]  
[Anonymous], 1997, Proceedings of the fourteenth international conference on machine learning, DOI DOI 10.1016/J.ESWA.2008.05.026
[2]  
Cestnik B., 1990, P EUR C ART INT, P147
[3]  
COHEN W, 1995, ADV INDUCTIVE LOGIC
[4]  
Cohen W. W., 1995, P 12 INT C MACH LEAR, P115, DOI DOI 10.1016/B978-1-55860-377-6.50023-2
[5]  
Craven M, 1998, FIFTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-98) AND TENTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICAL INTELLIGENCE (IAAI-98) - PROCEEDINGS, P509
[6]  
CRAVEN M, 1998, P 10 EUR C MACH LEAR, P250
[7]  
DIPASQUO D, 1998, THESIS CARNEGIE MELL
[8]   On the optimality of the simple Bayesian classifier under zero-one loss [J].
Domingos, P ;
Pazzani, M .
MACHINE LEARNING, 1997, 29 (2-3) :103-130
[9]  
Dzeroski S., 1992, P 2 INT WORKSH IND L, P109
[10]   A GENERAL LOWER BOUND ON THE NUMBER OF EXAMPLES NEEDED FOR LEARNING [J].
EHRENFEUCHT, A ;
HAUSSLER, D ;
KEARNS, M ;
VALIANT, L .
INFORMATION AND COMPUTATION, 1989, 82 (03) :247-261