Publishing the Trove Newspaper Corpus

被引:0
作者
Cassidy, Steve [1 ]
机构
[1] Macquarie Univ, Dept Comp, Sydney, NSW, Australia
来源
LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2016年
关键词
newspaper; corpus; linked data;
D O I
暂无
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
The Trove Newspaper Corpus is derived from the National Library of Australia's digital archive of newspaper text. The corpus is a snapshot of the NLA collection taken in 2015 to be made available for language research as part of the Alveo Virtual Laboratory and contains 143 million articles dating from 1806 to 2007. This paper describes the work we have done to make this large corpus available as a research collection, facilitating access to individual documents and enabling large scale processing of the newspaper text in a cloud-based environment.
引用
收藏
页码:4520 / 4525
页数:6
相关论文
共 50 条
  • [21] Corpus REDEWIEDERGABE
    Brunner, Annelen
    Engelberg, Stefan
    Jannidis, Fotis
    Tu, Ngoc Duyen Tanja
    Weimer, Lukas
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 803 - 812
  • [22] Production of Bioethanol from Waste Newspaper
    Byadgi, Shruti A.
    Kalburgi, P. B.
    WASTE MANAGEMENT FOR RESOURCE UTILISATION, 2016, 35 : 555 - 562
  • [23] Pyrolysis Kinetics of Newspaper and Its Gasification
    Bhuiyan, M. N. A.
    Ota, M.
    Murakami, K.
    Yoshida, H.
    ENERGY SOURCES PART A-RECOVERY UTILIZATION AND ENVIRONMENTAL EFFECTS, 2010, 32 (02) : 108 - 118
  • [24] 'Schizophrenia' as a Metaphor in Greek Newspaper Websites
    Athanasopoulou, Christina
    Valimaki, Maritta
    INTEGRATING INFORMATION TECHNOLOGY AND MANAGEMENT FOR QUALITY OF CARE, 2014, 202 : 275 - 278
  • [25] The Bahrain Corpus: A Multi-genre Corpus of Bahraini Arabic
    Abdulrahim, Dana
    Inoue, Go
    Shamsan, Latifa
    Khalifa, Salam
    Habash, Nizar
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 2345 - 2352
  • [26] APPROACH TO TRANSCRIPTION OF ORAL CORPUS: THE TRANSCRIPTION SYMBOLS IN JUDICIAL CORPUS
    Ridao Rodrigo, Susana
    REVISTA DE LLENGUA I DRET-JOURNAL OF LANGUAGE AND LAW, 2022, (77) : 93 - 110
  • [27] IDENTIC Corpus: Morphologically Enriched Indonesian - English Parallel Corpus
    Larasati, Septina Dian
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 902 - 906
  • [28] PLANEO CORPUS: METHODOLOGY AND RESULTS OF A CORPUS OF ANDALUSIAN LINGUISTIC LANDSCAPE
    Rodriguez, Lola Pons
    PHILOLOGIA HISPALENSIS, 2024, 38 (01): : 153 - 166
  • [29] Named Entity Corpus Construction using Wikipedia and DBpedia Ontology
    Hahm, Younggyun
    Park, Jungyeul
    Lim, Kyungtae
    Kim, Youngsik
    Hwang, Dosam
    Choi, Key-Sun
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 2565 - 2569
  • [30] Ubiquitous Usage of a French Large Corpus: Processing the Est Republicain Corpus
    Seddah, Djame
    Candito, Marie
    Crabbe, Benoit
    Anguiano, Enrique Henestroza
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 3249 - 3254