Deep bi-directional interaction network for sentence matching

被引:12
作者
Liu, Mingtong [1 ]
Zhang, Yujie [1 ]
Xu, Jinan [1 ]
Chen, Yufeng [1 ]
机构
[1] Beijing Jiaotong Univ, Beijing Key Lab Traff Data Anal & Min, Sch Comp & Informat Technol, Beijing, Peoples R China
关键词
Sentence matching; Deep interaction network; Deep fusion; Attention mechanism; Multi-layer neural network; Interpretability study;
D O I
10.1007/s10489-020-02156-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The goal of sentence matching is to determine the semantic relation between two sentences, which is the basis of many downstream tasks in natural language processing, such as question answering and information retrieval. Recent studies using attention mechanism to align the elements of two sentences have shown promising results in capturing semantic similarity/relevance. Most existing methods mainly focus on the design of multi-layer attention network, however, some critical issues have not been dealt with well: 1) the higher attention layer is easily affected by error propagation because it relies on the alignment results of preceding attentions; 2) models have the risk of losing low-layer semantic features with the increase of network depth; and 3) the approach of capturing global matching information brings about large computing complexity for model training. To this end, we propose a Deep Bi-Directional Interaction Network (DBDIN) to solve these issues, which captures semantic relatedness from two directions and each direction employs multiple attention-based interaction units. To be specific, the attention of each interaction unit will repeatedly focus on the original sentence representation of another one for semantic alignment, which alleviates the error propagation problem by attending to a fixed semantic representation. Then we design deep fusion to aggregate and propagate attention information from low layers to high layers, which effectively retains low-layer semantic features for subsequential interactions. Moreover, we introduce a self-attention mechanism at last to enhance global matching information with smaller model complexity. We conduct experiments on natural language inference and paraphrase identification tasks with three benchmark datasets SNLI, SciTail and Quora. Experimental results demonstrate that our proposed method can achieve significant improvements over baseline systems without using any external knowledge. Additionally, we conduct interpretable study to disclose how our deep interaction network with attention can benefit sentence matching, which provides a reference for future model design. Ablation studies and visualization analyses further verify that our model can better capture interactive information between two sentences, and the proposed components are indeed able to help modeling semantic relation more precisely.
引用
收藏
页码:4305 / 4329
页数:25
相关论文
共 68 条
  • [1] A Survey of Paraphrasing and Textual Entailment Methods
    Androutsopoulos, Ion
    Malakasiotis, Prodromos
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2010, 38 : 135 - 187
  • [2] Reorientation Effects in Vitreous Carbon and Pyrolytic Graphite
    Lewis, J. C.
    Floyd, I. J.
    [J]. JOURNAL OF MATERIALS SCIENCE, 1966, 1 (02) : 154 - 159
  • [3] Bjerva Johannes, 2016, P COLING 2016 26 INT, P3531
  • [4] Bowman S. R., 2015, P 2015 C EMP METH NA, P632, DOI 10.18653/v1/D15-1075
  • [5] Bowman SR, 2016, PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, P1466
  • [6] Chen JZ, 2016, PROCEEDINGS OF 2016 12TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS), P551, DOI [10.1109/CIS.2016.0134, 10.1109/CIS.2016.133]
  • [7] Enhanced LSTM for Natural Language Inference
    Chen, Qian
    Zhu, Xiaodan
    Ling, Zhenhua
    Wei, Si
    Jiang, Hui
    Inkpen, Diana
    [J]. PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 1657 - 1668
  • [8] Chen Q, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, P2406
  • [9] Cho K, 2014, ARXIV14061078, P1724, DOI DOI 10.3115/V1/D14-1179
  • [10] Chomsky N, 1957, SOME CONCEPTS CONSEQ