Hierarchical Transformer-based Query by Multiple Documents

被引:0
|
作者
Huang, Zhiqi [1 ]
Naseri, Shahrzad [1 ]
Bonab, Hamed [2 ]
Sarwar, Sheikh Muhammad [2 ]
Allan, James [1 ]
机构
[1] Univ Massachusetts, Amherst, MA 01003 USA
[2] Amazon Inc, Seattle, WA USA
来源
PROCEEDINGS OF THE 2023 ACM SIGIR INTERNATIONAL CONFERENCE ON THE THEORY OF INFORMATION RETRIEVAL, ICTIR 2023 | 2023年
关键词
Query by multiple documents; Hierarchical transformer; Neural re-ranking; RETRIEVAL;
D O I
10.1145/3578337.3605130
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It is often difficult for users to form keywords to express their information needs, especially when they are not familiar with the domain of the articles of interest. Moreover, in some search scenarios, there is no explicit query for the search engine to work with. Query-By-Multiple-Documents (QBMD), in which the information needs are implicitly represented by a set of relevant documents addresses these retrieval scenarios. Unlike the keyword-based retrieval task, the query documents are treated as exemplars of a hidden query topic, but it is often the case that they can be relevant to multiple topics. In this paper, we present aHierarchical Interaction-based (HINT) bi-encoder retrieval architecture that encodes a set of query documents and retrieval documents separately for the QBMD task. We design a hierarchical attention mechanism that allows the model to 1) encode long sequences efficiently and 2) learn the interactions at low-level and high-level semantics (e.g., tokens and paragraphs) across multiple documents. With contextualized representations, the final scoring is calculated based on a stratified late interaction, which ensures each query document contributes equally to the matching against the candidate document. We build a large-scale, weakly supervised QBMD retrieval dataset based on Wikipedia for model training. We evaluate the proposed model on both Query-By-Single-Document (QBSD) and QBMD tasks. For QBSD, we use a benchmark dataset for legal case retrieval. For QBMD, we transform standard keyword-based retrieval datasets into the QBMD setting. Our experimental results show that HINT significantly outperforms all competitive baselines.
引用
收藏
页码:105 / 115
页数:11
相关论文
共 50 条
  • [1] Towards Hierarchical Regional Transformer-based Multiple Instance Learning
    Cersovsky, Josef
    Mohammadi, Sadegh
    Kainmueller, Dagmar
    Hoehne, Johannes
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 3954 - 3962
  • [2] Transformer-based Hierarchical Encoder for Document Classification
    Sakhrani, Harsh
    Parekh, Saloni
    Ratadiya, Pratik
    21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS ICDMW 2021, 2021, : 852 - 858
  • [3] Transformer-Based Multiple-Object Tracking via Anchor-Based-Query and Template Matching
    Wang, Qinyu
    Lu, Chenxu
    Gao, Long
    He, Gang
    SENSORS, 2024, 24 (01)
  • [4] Anchor DETR: Query Design for Transformer-Based Object Detection
    Wang, Yingming
    Zhang, Xiangyu
    Yang, Tong
    Sun, Jian
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 2567 - 2575
  • [5] TranSQ: Transformer-Based Semantic Query for Medical Report Generation
    Kong, Ming
    Huang, Zhengxing
    Kuang, Kun
    Zhu, Qiang
    Wu, Fei
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT VIII, 2022, 13438 : 610 - 620
  • [6] TRANSFORMER-BASED HIERARCHICAL CLUSTERING FOR BRAIN NETWORK ANALYSIS
    Dai, Wei
    Cui, Hejie
    Kan, Xuan
    Guo, Ying
    Van Rooij, Sanne
    Yang, Carl
    2023 IEEE 20TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING, ISBI, 2023,
  • [7] Transformer-Based Models for the Automatic Indexing of Scientific Documents in French
    Angel Gonzalez, Jose
    Buscaldi, Davide
    Sanchis, Emilio
    Hurtado, Lluis-F
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2022), 2022, 13286 : 60 - 72
  • [8] TFMFT: Transformer-based multiple fish tracking
    Li, Weiran
    Liu, Yeqiang
    Wang, Wenxu
    Li, Zhenbo
    Yue, Jun
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2024, 217
  • [9] Transformer-Based Efficient Salient Instance Segmentation Networks With Orientative Query
    Pei, Jialun
    Cheng, Tianyang
    Tang, He
    Chen, Chuanbo
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1964 - 1978
  • [10] TSD: Random feature query design for transformer-based shrimp detector
    Gong, Bo
    Jing, Ling
    Chen, Yingyi
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2024, 221