Extracting significant time varying features from text

被引：37

作者：

Swan, R ^{[1
]}

Allan, J ^{[1
]}

机构：

[1] Univ Massachusetts, Dept Comp Sci, Ctr Intelligent Informat Retrieval, Amherst, MA 01003 USA

来源：

PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON INFORMATION KNOWLEDGE MANAGEMENT, CIKM'99 | 1999年

关键词：

D O I：

10.1145/319950.319956

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We propose a simple statistical model for the frequency of occurrence of features in a stream of text. Adoption of this model allows us to use classical significance tests to filter the stream for interesting events. We tested the model by building a system and running it on a news corpus. By a subjective evaluation, the system worked remarkably well: almost all of the groups of identified tokens corresponded to news stories and were appropriately placed in time. A preliminary objective evaluation was also used to measure the quality of the system and it showed some of the weaknesses and the power of our approach.

引用

页码：38 / 45

页数：8

共 12 条

[1]

Allan J., 1998, P DARPA BROADCAST NE, P194

[2]

ALLEN RB, 1995, P INT S DIG LIB, P175

[3]

DAGAN I, 1996, P S DOC AN INF RETR

[4]

FISHER D, 1996, P 6 MESS UND C NOV 1, P127

[5] MODELING DOCUMENTS WITH MULTIPLE POISSON-DISTRIBUTIONS [J].

MARGULIS, EL .

INFORMATION PROCESSING & MANAGEMENT, 1993, 29 (02) :215-227

[6]

PAPKA R, 1999, P DARPA BROADC WORKS

[7]

ROBIN L, 1995, THESIS MIT MEDIA LAB

[8]

SANDERSON M, 1999, P 22 INT ACM SIGIR C

[9]

XU J, 1994, IR52 U MASS CTR INT

[10]

YVONNE MM, 1974, DISCRETE MULTIVARIAT

← 1 2 →