Testing QA Systems' ability in Processing Synonym Commonsense Knowledge

被引：2

作者：

Sigdel, Bijay ^{[1
]}

Lin, Gongqi ^{[1
]}

Miao, Yuan ^{[1
]}

Ahmed, Khandakar ^{[1
]}

机构：

[1] Victoria Univ, Coll Engn & Sci, Melbourne, Vic, Australia

来源：

2020 24TH INTERNATIONAL CONFERENCE INFORMATION VISUALISATION (IV 2020) | 2020年

关键词：

Commonsense knowledge; QA Systems; Word Sense Disambiguation; Machine Reading Comprehension;

D O I：

10.1109/IV51561.2020.00059

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

'Synonym' is an imperative instrument of commonsense knowledge that we apply to make a good sense and sound judgement of our reading. To investigate the ability of machine comprehension models in handling the synonym commonsense knowledge, we developed an innovative approach to automatically generate a dataset based on the Stanford Question Answering Dataset (SQuAD 2.0). The brand-new dataset consists of additional distracting sentences or questions spawned using synonym commonsense knowledge. We formulated new questions by replacing noun entities of the original ones in SQuAD 2.0 with their synonyms. This approach followed the two fundamental principles of SQuAD 2.0 dataset: relevancy and plausibility (incorrect answers are more challenging if they are relevant and plausible). It improves the robustness/abstraction of the question set. To improve the synonym selection strategy in Word Sense Disambiguation (WSD) problem, we designed a new algorithm Multiple Source Adapted Lesk Algorithm (MSALA). Rather than only using WordNet as the source of gloss for adapted Lesk algorithm, we used both lexical database WordNet and commonsense database ConceptNet. This fusion provides a rich hierarchy of semantic relations for the MSALA algorithm. Using this method, we devised 11,000 questions and evaluated the performance of the state-of-the-art question answering system-BERT. Our result shows that the accuracy of the contemporary BERT-Base model dropped from 74.98% to 63.24%. This 10+% accuracy drop revealed the limitations of BERT in handling synonym commonsense knowledge.

引用

页码：317 / 321

页数：5

共 11 条

[1] [Anonymous], 2020, WORD SENSE DISAMBIGU
[2] [Anonymous], 2020, STANF QUEST ANSW DAT
[3] Banerjee S., 2002, Computational Linguistics and Intelligent Text Processing. Third International Conference, CICLing 2002. Proceedings (Lecture Notes in Computer Science Vol.2276), P136
[4] Choi E, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P2174
[5] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[6] Joshi Mandar, 2017, Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension
[7] Lan Z., ALBERT LITE BERT SEL
[8] Rajpurkar P, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, P784
[9] Rajpurkar Pranav, 2016, arXiv
[10] Torres Sulema., 2009, ADV COMPUTER SCI APP, P155

← 1 2 →