Evaluating Commonsense in Pre-Trained Language Models

被引:0
作者
Zhou, Xuhui [1 ,4 ]
Zhang, Yue [2 ]
Cui, Leyang [2 ,3 ]
Huang, Dandan [2 ]
机构
[1] Univ Washington, Seattle, WA 98195 USA
[2] Westlake Univ, Sch Engn, Hangzhou, Zhejiang, Peoples R China
[3] Zhejiang Univ, Hangzhou, Zhejiang, Peoples R China
[4] Westlake Univ, Hangzhou, Zhejiang, Peoples R China
来源
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2020年 / 34卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Contextualized representations trained over large raw text data have given remarkable improvements for NLP tasks including question answering and reading comprehension. There have been works showing that syntactic, semantic and word sense knowledge are contained in such representations, which explains why they benefit such tasks. However, relatively little work has been done investigating commonsense knowledge contained in contextualized representations, which is crucial for human question answering and reading comprehension. We study the commonsense ability of GPT, BERT, XLNet, and RoBERTa by testing them on seven challenging benchmarks, finding that language modeling and its variants are effective objectives for promoting models' commonsense ability while bi-directional context and larger training set are bonuses. We additionally find that current models do poorly on tasks require more necessary inference steps. Finally, we test the robustness of models by making dual test cases, which are correlated so that the correct prediction of one sample should lead to correct prediction of the other. Interestingly, the models show confusion on these test cases, which suggests that they learn commonsense at the surface rather than the deep level. We release a test set, named CATs publicly, for future research.
引用
收藏
页码:9733 / 9740
页数:8
相关论文
共 23 条
[1]  
[Anonymous], 2019, LANGUAGE MODELS ARE
[2]  
[Anonymous], 2019, P 8 JOINT C LEX COMP, DOI DOI 10.7764/CDI.45.1584
[3]  
Bowman Samuel R., 2015, EMNLP, P632
[4]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[5]  
Habernal I., 2018, P 2018 C N AM CHAPT, P1930, DOI 10.18653/v1/N18-1175
[6]  
Lai G., 2017, P 2017 C EMP METH NA, P785, DOI [10.18653/v1/D17-1082, DOI 10.18653/V1/D17-1082]
[7]  
Levesque Hector., 2011, Proceedings of the Thirteenth International Conference on Principles of Knowledge Representation and Reasoning, V46, P552, DOI DOI 10.5555/3031843.3031909
[8]   ConceptNet - a practical commonsense reasoning tool-kit [J].
Liu, H ;
Singh, P .
BT TECHNOLOGY JOURNAL, 2004, 22 (04) :211-226
[9]  
Liu NF, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P1073
[10]  
Liu Yinhan, 2019, ROBERTA ROBUSTLY OPT