The BETTER Cross-Language Information Retrieval Datasets

被引:3
作者
Soboroff, Ian [1 ]
机构
[1] NIST, Gaithersburg, MD 20899 USA
来源
PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023 | 2023年
关键词
information retrieval; test collection; information extraction;
D O I
10.1145/3539618.3591910
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The IARPA BETTER (Better Extraction from Text Through Enhanced Retrieval) program held three evaluations of information retrieval (IR) and information extraction (IE). For both tasks, the only training data available was in English, but systems had to perform cross-language retrieval and extraction from Arabic, Farsi, Chinese, Russian, and Korean. Pooled assessment and information extraction annotation were used to create reusable IR test collections. These datasets are freely available to researchers working in cross-language retrieval, information extraction, or the conjunction of IR and IE. This paper describes the datasets, how they were constructed, and how they might be used by researchers.
引用
收藏
页码:3047 / 3053
页数:7
相关论文
共 50 条
  • [41] A Non-linear Semantic Mapping Technique for Cross-Language Sentence Matching
    Banchs, Rafael E.
    Costa-Jussa, Marta R.
    ADVANCES IN NATURAL LANGUAGE PROCESSING, 2010, 6233 : 57 - 66
  • [42] Cross-view Embeddings for Information Retrieval
    Gupta, Parth
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2019, (62): : 115 - 118
  • [43] Learning to Rank for Information Retrieval and Natural Language Processing
    Li H.
    Synthesis Lectures on Human Language Technologies, 2011, 4 (01): : 1 - 115
  • [44] Dependency structure applied to language modeling for information retrieval
    Lee, Changki
    Lee, Gary Geunbae
    Jang, Myung-Gil
    ETRI JOURNAL, 2006, 28 (03) : 337 - 346
  • [45] Information Retrieval in Telugu Language Using Synset Relationships
    Ramakrishna, Kolikipogu
    Rani, B. Padmaja
    Subrahmanyam, D.
    2013 15TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING TECHNOLOGIES (ICACT), 2013,
  • [46] Cross language retrieval model based on interlingua semantics
    Wang, Mingwen
    Hao, Ye
    Huang, Guobin
    Bi, Wenxia
    Journal of Computational Information Systems, 2007, 3 (04): : 1555 - 1560
  • [47] Personalization Information Retrieval Based on Unigram Language Model
    Yu Yangxin
    MECHATRONICS AND INDUSTRIAL INFORMATICS, PTS 1-4, 2013, 321-324 : 2269 - 2273
  • [48] Problems of Semantics of Words of the Kazakh Language in the Information Retrieval
    Diana, Rakhimova
    Assem, Shormakova
    COMPUTATIONAL COLLECTIVE INTELLIGENCE, PT II, 2019, 11684 : 70 - 81
  • [49] Information Retrieval and Spectrum Based Bug Localization: Better Together
    Le, Tien-Duy B.
    Oentaryo, Richard J.
    Lo, David
    2015 10TH JOINT MEETING OF THE EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND THE ACM SIGSOFT SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE 2015) PROCEEDINGS, 2015, : 579 - 590
  • [50] TF-IDF-INSPIRED DETECTION FOR CROSS-LANGUAGE SOURCE CODE PLAGIARISM AND COLLUSION
    Karnalim, Oscar
    COMPUTER SCIENCE-AGH, 2020, 21 (01): : 113 - 136