AUTOMATIC IDENTIFICATION OF BIBLICAL QUOTATIONS IN HEBREW-ARAMAIC DOCUMENTS

被引:0
作者
HaCohen-Kerner, Yaakov [1 ]
Schweitzer, Nadav [2 ]
Shoham, Yaakov [1 ]
机构
[1] Jerusalem Coll Technol, Dept Comp Sci, IL-91160 Jerusalem, Israel
[2] Bar Ilan Univ, Dept Comp Sci, IL-52900 Ramat Gan, Israel
来源
KDIR 2010: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND INFORMATION RETRIEVAL | 2010年
关键词
Hebrew-Aramaic Texts; Information Retrieval; Quotation identification;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Quotations in a text document contain important information about the content, the context, the sources that the author uses, their importance and impact. Therefore, automatic identification of quotations from documents is an important task. Quotations included in rabbinic literature are difficult to identify and to extract for various reasons. The aim of this research is to automatically identify Biblical quotations included in rabbinic documents written in Hebrew-Aramaic. We deal with various kinds of quotations: partial, missing and incorrect. We formulate nineteen features to identify these quotations. These features were divided into seven different feature sets: matches, best matches, sums of weights, weighted averages, weighted medians, common words, and quotation indicators. Several features are novel. Experiments on various combinations of these features were performed using four common machine learning methods. A combination of 17 features using J48 (an improved version of C4.5) achieves an accuracy of 91.2%, which is an improvement of about 8% compared to a baseline result.
引用
收藏
页码:320 / 325
页数:6
相关论文
共 16 条
[1]  
[Anonymous], P INT C REC ADV NAT
[2]  
[Anonymous], 1997, ICML
[3]  
[Anonymous], 2001, Neural Networks: A Comprehensive Foundation
[4]  
Choueka Yaacov, 2000, Parallel Text Processing: Alignment and Use of Translation Corpora, P69
[5]   SUPPORT-VECTOR NETWORKS [J].
CORTES, C ;
VAPNIK, V .
MACHINE LEARNING, 1995, 20 (03) :273-297
[6]  
de La Clergerie E., 2009, P L TC 2009 POZN POL
[7]  
Forman G., 2003, Journal of Machine Learning Research, V3, P1289, DOI 10.1162/153244303322753670
[8]  
Gabrilovich E., 2004, P 21 INT C MACH LEAR, P321
[9]  
Hosmer W., 2000, Applied Logistic Regression, VSecond
[10]  
Liang J., 2010, SEM SEARCH 2010 WORK