Join-Chain Network: A Logical Reasoning View of the Multi-head Attention in Transformer

被引:1
作者
Zhang, Jianyi [1 ]
Chen, Yiran [1 ]
Chen, Jianshu
机构
[1] Duke Univ, Durham, NC 27708 USA
来源
2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW | 2022年
关键词
Logical reasoning; multi-head attention; NLP;
D O I
10.1109/ICDMW58026.2022.00123
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Developing neural architectures that are capable of logical reasoning has become increasingly important for a wide range of applications (e.g., natural language processing). Towards this grand objective, we propose a symbolic reasoning architecture that chains many join operators together to model output logical expressions. In particular, we demonstrate that such an ensemble of join chains can express a broad subset of "tree-structured" first-order logical expressions, named FOET, which is particularly useful for modeling natural languages. To endow it with differentiable learning capability, we closely examine various neural operators for approximating the symbolic join-chains. Interestingly, we find that the widely used multi-head self-attention module in transformer can be understood as a special neural operator that implements the union bound of the join operator in probabilistic predicate space. Our analysis not only provides a new perspective on the mechanism of the pretrained models such as BERT for natural language understanding, but also suggests several important future improvement directions.
引用
收藏
页码:947 / 957
页数:11
相关论文
共 10 条
  • [1] Devlin J, 2019, Arxiv, DOI arXiv:1810.04805
  • [2] Dong HH, 2019, Arxiv, DOI arXiv:1904.11694
  • [3] Dosovitskiy A, 2021, Arxiv, DOI [arXiv:2010.11929, DOI 10.48550/ARXIV.2010.11929]
  • [4] Hinman PG., 2005, Fundamentals of mathematical logic
  • [5] Liang PRY, 2013, Arxiv, DOI arXiv:1309.4408
  • [6] MUGGLETON S, 1990, NEW GENERAT COMPUT, V8, P295
  • [7] Muggleton S., 1996, Advances in inductive logic programming, V32, P254
  • [8] Reddy S., 2016, Transactions of the Association for Computational Linguistics (TACL), V4, P127
  • [9] Reutter Juan, 2020, 8 INT C LEARNING REP
  • [10] Vaswani A, 2017, ADV NEUR IN, V30