Dense Passage Retrieval for Open-Domain Question Answering

被引:0
作者
Karpukhin, Vladimir [1 ]
Oguz, Barlas [1 ]
Min, Sewon [2 ]
Lewis, Patrick [1 ]
Wu, Ledell [1 ]
Edunov, Sergey [1 ]
Chen, Danqi [3 ]
Yih, Wen Tau [1 ]
机构
[1] Facebook AI, London, England
[2] Univ Washington, Seattle, WA USA
[3] Princeton Univ, Princeton, NJ USA
来源
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP) | 2020年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Open-domain question answering relies on efficient passage retrieval to select candidate contexts, where traditional sparse vector space models, such as TF-IDF or BM25, are the de facto method. In this work, we show that retrieval can be practically implemented using dense representations alone, where embeddings are learned from a small number of questions and passages by a simple dual-encoder framework. When evaluated on a wide range of open-domain QA datasets, our dense retriever outperforms a strong Lucene-BM25 system greatly by 9%-19% absolute in terms of top-20 passage retrieval accuracy, and helps our end-to-end QA system establish new state-of-the-art on multiple open-domain QA benchmarks.(1)
引用
收藏
页码:6769 / 6781
页数:13
相关论文
共 50 条
[21]   Open-domain textual question answering techniques [J].
Harabagiu, Sanda M. ;
Maiorano, Steven J. ;
Paşca, Marius A. .
Natural Language Engineering, 2003, 9 (03) :231-267
[22]   Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering [J].
Izacard, Gautier ;
Grave, Edouard .
16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, :874-880
[23]   A Light Ranker for Open-Domain Question Answering [J].
Qiu, Boyu ;
Xu, Jungang ;
Chen, Xu ;
Sun, Yingfei .
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[24]   Data Envelopment Analysis of linguistic features and passage relevance for open-domain Question Answering [J].
Ofoghi, Bahadorreza ;
Mahdiloo, Mahdi ;
Yearwood, John .
KNOWLEDGE-BASED SYSTEMS, 2022, 244
[25]   Open Domain Question Answering over Tables via Dense Retrieval [J].
Herzig, Jonathan ;
Muller, Thomas ;
Krichene, Syrine ;
Eisenschlos, Julian Martin .
2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, :512-519
[26]   SPARTA: Efficient Open-Domain Question Answering via Sparse Transformer Matching Retrieval [J].
Zhao, Tiancheng ;
Lu, Xiaopeng ;
Lee, Kyusong .
2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, :565-575
[27]   PyGaggle: A Gaggle of Resources for Open-Domain Question Answering [J].
Pradeep, Ronak ;
Chen, Haonan ;
Gu, Lingwei ;
Tamber, Manveer Singh ;
Lin, Jimmy .
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2023, PT III, 2023, 13982 :148-162
[28]   Adaptive Information Seeking for Open-Domain Question Answering [J].
Zhu, Yunchang ;
Pang, Liang ;
Lan, Yanyan ;
Shen, Huawei ;
Cheng, Xueqi .
2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, :3615-3626
[29]   RRQA: reconfirmed reader for open-domain question answering [J].
Li, Shi ;
Zhang, Wenqian .
APPLIED INTELLIGENCE, 2023, 53 (15) :18420-18430
[30]   ODSQA: OPEN-DOMAIN SPOKEN QUESTION ANSWERING DATASET [J].
Lee, Chia-Hsuan ;
Wang, Shang-Ming ;
Chang, Huan-Cheng ;
Lee, Hung-Yi .
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, :949-956