Stable Classification of Text Genres

被引:28
作者
Petrenz, Philipp [1 ]
Webber, Bonnie [1 ]
机构
[1] Univ Edinburgh, Edinburgh EH8 9AB, Midlothian, Scotland
关键词
D O I
10.1162/COLI_a_00052
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Every text has at least one topic and at least one genre. Evidence for a text's topic and genre comes, in part, from its lexical and syntactic features-features used in both Automatic Topic Classification and Automatic Genre Classification (AGC). Because an ideal AGC system should be stable in the face of changes in topic distribution, we assess five previously published AGC methods with respect to both performance on the same topic-genre distribution on which they were trained and stability of that performance across changes in topic-genre distribution. Our experiments lead us to conclude that (1) stability in the face of changing topical distributions should be added to the evaluation critera for new approaches to AGC, and (2) part-of-speech features should be considered individually when developing a high-performing, stable AGC system for a particular, possibly changing corpus.
引用
收藏
页码:385 / 393
页数:9
相关论文
共 13 条
[1]  
[Anonymous], P 1 INT C INF INT CO
[2]   PART-OF-SPEECH HISTOGRAMS FOR GENRE CLASSIFICATION OF TEXT [J].
Feldman, S. ;
Marin, M. A. ;
Ostendorf, M. ;
Gupta, M. R. .
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, :4781-4784
[3]   Learning to classify documents according to genre [J].
Finn, Aidan ;
Kushmerick, Nicholas .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2006, 57 (11) :1506-1518
[4]  
Joachims T, 1999, ADVANCES IN KERNEL METHODS, P169
[5]   Webpage genre identification using variable-length character n-grams [J].
Kanaris, Ioannis ;
Stamatatos, Efstathios .
19TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, VOL II, PROCEEDINGS, 2007, :3-+
[6]  
Karlgren Jussi., 1994, P 15 INT C COMPUTATI, P1071
[7]  
Kessler B, 1997, 35TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 8TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, P32
[8]  
Marcus M. P., 1993, COMPUT LINGUIST, V19, P313, DOI DOI 10.21236/ADA273556
[9]  
PETRENZ P, 2009, THESIS U EDINBURGH
[10]  
Sandhaus E., 2008, NEW YORK TIMES CORPU