Evaluation of Topic Segmentation Algorithms on Arabic Texts

被引:0
作者
Faycal, Nouar [1 ]
Hacene, Belhadef [2 ]
机构
[1] May 8th 1945 Guelma Univ, Management Sci Dept, Guelma, Algeria
[2] Univ Constantine 2 Abdelhamid Mehri, MISC Lab, Constantine, Algeria
来源
2018 2ND INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE AND SPEECH PROCESSING (ICNLSP) | 2018年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we are interested in the topic segmentation of Arabic texts. For this aim, we evaluate two based lexical cohesion algorithms: MinCutSeg and BayesSeg by using the Pk and WindowDiff metrics. To assess how well each algorithm works, each was applied on three datasets with longer texts from two different domains: transcribed multi-party conversations and written texts. After adaptation to the Arabic language, the test results show significant differences in performance depending on the types of documents. Keywords-Topic segmentation, multi-party conversation transcripts, minimum cut criterion, Bayesian model, Arabic language.
引用
收藏
页码:130 / 135
页数:6
相关论文
共 26 条
[1]  
[Anonymous], 2006, P HUM LANG TECHN C N
[2]  
[Anonymous], 1963, Magnetism
[3]   Statistical models for text segmentation [J].
Beeferman, D ;
Berger, A ;
Lafferty, J .
MACHINE LEARNING, 1999, 34 (1-3) :177-210
[4]  
Boufaden Narjes., 2001, P 6 NATURAL LANGUAGE, P273
[5]   Topic segmentation for textual document written in Arabic language [J].
Chaibi, Anja Habacha ;
Naili, Marwa ;
Sammoud, Samia .
KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS 18TH ANNUAL CONFERENCE, KES-2014, 2014, 35 :437-446
[6]  
Dias G., 2007, Proceedings of the 22nd National Conference on Artificial Intelligence-Volume 2, V2, P1334
[7]   A Probabilistic model of meetings that combines words and discourse features [J].
Dowman, Mike ;
Savova, Virginia ;
Griffiths, Thomas L. ;
Koerding, Konrad P. ;
Tenenbaum, Joshua B. ;
Purver, Matthew .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (07) :1238-1248
[8]  
Durant W., 1988, STORY CIVILIZATION
[9]  
Eisenstein Jacob, 2008, EMNLP 08, P334, DOI DOI 10.3115/1613715.1613760
[10]   Comparative analysis of different text segmentation algorithms on Arabic news stories [J].
El-Shayeb, Michael A. ;
El-Beltagy, Samhaa R. ;
Rafea, Ahmed .
IRI 2007: PROCEEDINGS OF THE 2007 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION, 2007, :441-+