Mixed Attention Transformer for Leveraging Word-Level Knowledge to Neural Cross-Lingual Information Retrieval

被引:1
作者
Huang, Zhiqi [1 ]
Bonab, Hamed [1 ]
Sarwar, Sheikh Muhammad [1 ]
Rahimi, Razieh [1 ]
Allan, James [1 ]
机构
[1] Univ Massachusetts Amherst, Ctr Intelligent Informat Retrieval, Amherst, MA 01003 USA
来源
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021 | 2021年
关键词
Cross-lingual information retrieval; Attention mechanism; Neural network;
D O I
10.1145/3459637.3482452
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Pre-trained contextualized representations offer great success for many downstream tasks, including document ranking. The multilingual versions of such pre-trained representations provide a possibility of jointly learning many languages with the same model. Although it is expected to gain big with such joint training, in the case of cross-lingual information retrieval (CLIR), the models under a multilingual setting are not achieving the same level of performance as those under a monolingual setting. We hypothesize that the performance drop is due to the translation gap between query and documents. In the monolingual retrieval task, because of the same lexical inputs, it is easier for model to identify the query terms that occurred in documents. However, in the multilingual pre-trained models that the words in different languages are projected into the same hyperspace, the model tends to "translate" query terms into related terms - i.e., terms that appear in a similar context - in addition to or sometimes rather than synonyms in the target language. This property is creating difficulties for the model to connect terms that co-occur in both query and document. To address this issue, we propose a novel Mixed Attention Transformer (MAT) that incorporates external word-level knowledge, such as a dictionary or translation table. We design a sandwichlike architecture to embed MAT into the recent transformer-based deep neural models. By encoding the translation knowledge into an attention matrix, the model with MAT is able to focus on the mutually translated words in the input sequence. Experimental results demonstrate the effectiveness of the external knowledge and the significant improvement of MAT-embedded neural re-ranking model on CLIR task.
引用
收藏
页码:760 / 770
页数:11
相关论文
共 53 条
[1]   Training Effective Neural CLIR by Bridging the Translation Gap [J].
Bonab, Hamed ;
Sarwar, Sheikh Muhammad ;
Allan, James .
PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, :9-18
[2]   Simulating CLIR Translation Resource Scarcity using High-resource Languages [J].
Bonab, Hamed ;
Allan, James ;
Sitaraman, Ramesh .
PROCEEDINGS OF THE 2019 ACM SIGIR INTERNATIONAL CONFERENCE ON THEORY OF INFORMATION RETRIEVAL (ICTIR'19), 2019, :128-135
[3]  
Braschler M, 2003, LECT NOTES COMPUT SC, V3237, P44
[4]  
Braschler M, 2003, LECT NOTES COMPUT SC, V2785, P9
[5]  
Braschler M, 2002, LECT NOTES COMPUT SC, V2406, P9
[6]  
Braschler M., 2001, Cross-Language Information Retrieval and Evaluation. Workshop of the Cross-Language Evaluation Forum, CLEF 2000. Revised Papers (Lecture Notes in Computer Science Vol.2069), P89
[7]  
Conneau A., 2020, P 58 ANN M ASS COMPU, DOI 10.18653/v1/2020.acl-main.747
[8]  
Conneau A., 2020, P 58 ANN M ASS COMP, P8440, DOI [DOI 10.18653/V1/2020.ACL-MAIN.747, 10.18653/v1/2020.acl-main.747]
[9]  
Conneau A, 2018, Arxiv, DOI arXiv:1710.04087
[10]  
Correia GM, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P2174