Wikidition: Automatic lexiconization and linkification of text corpora

被引:3
作者
Mehler, Alexander [1 ]
Gleim, Ruediger [1 ]
vor der Brueck, Tim [2 ]
Hemati, Wahed [1 ]
Uslu, Tolga [1 ]
Eger, Steffen [1 ]
机构
[1] Goethe Univ Frankfurt, Robert Mayer Str 10, D-60325 Frankfurt, Germany
[2] Hsch Luzern, Technikumstr 21, CH-6048 Horw, Switzerland
来源
IT-INFORMATION TECHNOLOGY | 2016年 / 58卷 / 02期
关键词
Wikidition; linkification; lexiconization; digital edition; text mining;
D O I
10.1515/itit-2015-0035
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We introduce a new text technology, called Wikidition, which automatically generates large scale editions of corpora of natural language texts. Wikidition combines a wide range of text mining tools for automatically linking lexical, sentential and textual units. This includes the extraction of corpus-specific lexica down to the level of syntactic words and their grammatical categories. To this end, we introduce a novel measure of text reuse and exemplify Wikidition by means of the capitularies, that is, a corpus of Medieval Latin texts.
引用
收藏
页码:70 / 79
页数:10
相关论文
共 39 条
  • [1] Publishing Historical Texts on the Semantic Web-A Case Study
    Ahonen, Eeva
    Hyvonen, Eero
    [J]. 2009 IEEE THIRD INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2009), 2009, : 167 - +
  • [2] A BIT-STRING LONGEST-COMMON-SUBSEQUENCE ALGORITHM
    ALLISON, L
    DIX, TI
    [J]. INFORMATION PROCESSING LETTERS, 1986, 23 (06) : 305 - 310
  • [3] Ballesteros M., 2014, P COLING 2014 DUBL
  • [4] Baruzzo A., 2009, J DIGITAL INFORM, V10
  • [5] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [6] Clough P, 2002, 40TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, P152
  • [7] Eger S., 2015, P LATECH 2015 BEIJ
  • [8] Link Discovery: A Comprehensive Analysis
    Erbs, Nicolai
    Zesch, Torsten
    Gurevych, Iryna
    [J]. FIFTH IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2011), 2011, : 83 - 86
  • [9] Fernando S, 2012, P 6 WORKSH LANG TECH, P101
  • [10] Gurevych I., 2012, 1 JOINT C LEX COMP S, V1, P435