The BETTER Cross-Language Information Retrieval Datasets

被引：3

作者：

Soboroff, Ian ^{[1
]}

机构：

[1] NIST, Gaithersburg, MD 20899 USA

来源：

PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023 | 2023年

关键词：

information retrieval; test collection; information extraction;

D O I：

10.1145/3539618.3591910

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The IARPA BETTER (Better Extraction from Text Through Enhanced Retrieval) program held three evaluations of information retrieval (IR) and information extraction (IE). For both tasks, the only training data available was in English, but systems had to perform cross-language retrieval and extraction from Arabic, Farsi, Chinese, Russian, and Korean. Pooled assessment and information extraction annotation were used to create reusable IR test collections. These datasets are freely available to researchers working in cross-language retrieval, information extraction, or the conjunction of IR and IE. This paper describes the datasets, how they were constructed, and how they might be used by researchers.

引用

页码：3047 / 3053

页数：7

共 50 条

[41] A Non-linear Semantic Mapping Technique for Cross-Language Sentence Matching
Banchs, Rafael E.
Costa-Jussa, Marta R.
ADVANCES IN NATURAL LANGUAGE PROCESSING, 2010, 6233 : 57 - 66
[42] Cross-view Embeddings for Information Retrieval
Gupta, Parth
PROCESAMIENTO DEL LENGUAJE NATURAL, 2019, (62): : 115 - 118
[43] Learning to Rank for Information Retrieval and Natural Language Processing
Li H.
Synthesis Lectures on Human Language Technologies, 2011, 4 (01): : 1 - 115
[44] Dependency structure applied to language modeling for information retrieval
Lee, Changki
Lee, Gary Geunbae
Jang, Myung-Gil
ETRI JOURNAL, 2006, 28 (03) : 337 - 346
[45] Information Retrieval in Telugu Language Using Synset Relationships
Ramakrishna, Kolikipogu
Rani, B. Padmaja
Subrahmanyam, D.
2013 15TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING TECHNOLOGIES (ICACT), 2013,
[46] Cross language retrieval model based on interlingua semantics
Wang, Mingwen
Hao, Ye
Huang, Guobin
Bi, Wenxia
Journal of Computational Information Systems, 2007, 3 (04): : 1555 - 1560
[47] Personalization Information Retrieval Based on Unigram Language Model
Yu Yangxin
MECHATRONICS AND INDUSTRIAL INFORMATICS, PTS 1-4, 2013, 321-324 : 2269 - 2273
[48] Problems of Semantics of Words of the Kazakh Language in the Information Retrieval
Diana, Rakhimova
Assem, Shormakova
COMPUTATIONAL COLLECTIVE INTELLIGENCE, PT II, 2019, 11684 : 70 - 81
[49] Information Retrieval and Spectrum Based Bug Localization: Better Together
Le, Tien-Duy B.
Oentaryo, Richard J.
Lo, David
2015 10TH JOINT MEETING OF THE EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND THE ACM SIGSOFT SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE 2015) PROCEEDINGS, 2015, : 579 - 590
[50] TF-IDF-INSPIRED DETECTION FOR CROSS-LANGUAGE SOURCE CODE PLAGIARISM AND COLLUSION
Karnalim, Oscar
COMPUTER SCIENCE-AGH, 2020, 21 (01): : 113 - 136

← 1 2 3 4 5 →