Evaluating Natural Language Inference Models: A Metamorphic Testing Approach

被引：3

作者：

Jiang, Mingyue ^{[1
]}

Bao, Houzhen ^{[1
]}

Tu, Kaiyi ^{[1
]}

Zhang, Xiao-Yi ^{[2
]}

Ding, Zuohua ^{[1
]}

机构：

[1] Zhejiang Sci Tech Univ, Hangzhou, Peoples R China

[2] Natl Inst Informat, Tokyo, Japan

来源：

2021 IEEE 32ND INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING (ISSRE 2021) | 2021年

关键词：

Natural Language Inference; Metamorphic Testing; Metamorphic Relation; Quality Evaluation; Oracle Problem;

D O I：

10.1109/ISSRE52982.2021.00033

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Natural language inference (NLI) is a fundamental NLP task that forms the cornerstone of deep natural language understanding. Unfortunately, evaluation of NLI models is challenging. On one hand, due to the lack of test oracles, it is difficult to automatically judge the correctness of NLI's prediction results. On the other hand, apart from knowing how well a model performs, there is a further need for understanding the capabilities and characteristics of different NLI models. To mitigate these issues, we propose to apply the technique of metamorphic testing (MT) to NLI. We identify six categories of metamorphic relations, covering a wide range of properties that are expected to be possessed by NLI task. Based on this, MT can be conducted on NLI models without using test oracles, and MT results are able to interpret NLI models' capabilities from varying aspects. We further demonstrate the validity and effectiveness of our approach by conducting experiments on five NLI models. Our experiments expose a large number of prediction failures from subject NLI models, and also yield interpretations for common characteristics of NLI models.

引用

页码：220 / 230

页数：11