Do People and Neural Nets Pay Attention to the Same Words? Studying Eye-tracking Data for Non-factoid QA Evaluation

被引：13

作者：

Bolotova, Valeria ^{[1
]}

Blinov, Vladislav ^{[2
]}

Zheng, Yukun ^{[3
]}

Croft, W. Bruce ^{[4
]}

Scholer, Falk ^{[1
]}

Sanderson, Mark ^{[1
]}

机构：

[1] RMIT Univ, Melbourne, Vic, Australia

[2] Ural Fed Univ, Ekaterinburg, Russia

[3] Tsinghua Univ, Beijing, Peoples R China

[4] Univ Massachusetts, Amherst, MA 01003 USA

来源：

CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT | 2020年

关键词：

ONLINE SEARCH; FREQUENCY; MOVEMENTS;

D O I：

10.1145/3340531.3412043

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We investigated how users evaluate passage-length answers for non-factoid questions. We conduct a study where answers were presented to users, sometimes shown with automatic word highlighting. Users were tasked with evaluating answer quality, correctness, completeness, and conciseness. Words in the answer were also annotated, both explicitly through user mark up and implicitly through user gaze data obtained from eye-tracking. Our results show that the correctness of an answer strongly depends on its completeness, conciseness is less important. Analysis of the annotated words showed correct and incorrect answers were assessed differently. Automatic highlighting helped users to evaluate answers quicker while maintaining accuracy, particularly when highlighting was similar to annotation. We fine-tuned a BERT model on a non-factoid QA task to examine if the model attends to words similar to those annotated. Similarity was found, consequently, we propose a method to exploit the BERT attention map to generate suggestions that simulate eye gaze during user evaluation.

引用

页码：85 / 94

页数：10

共 43 条

[1] Inter-Coder Agreement for Computational Linguistics [J].

Artstein, Ron ;

Poesio, Massimo .

COMPUTATIONAL LINGUISTICS, 2008, 34 (04) :555-596

[2] Eye movements of highly skilled and average readers: Differential effects of frequency and predictability [J].

Ashby, J ;

Rayner, K ;

Clifton, C .

QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY SECTION A-HUMAN EXPERIMENTAL PSYCHOLOGY, 2005, 58 (06) :1065-1086

[3]

Bando L.L., 2010, P 3 S INF INT CONT, P195, DOI DOI 10.1145/1840784.1840813

[4]

Broder A., 2002, SIGIR Forum, V36, P3, DOI 10.1145/792550.792552

[5]

Chilton L. B., 2011, 20 INT C WORLD WID W, P27, DOI [10.1145/1963405.1963413, DOI 10.1145/1963405.1963413]

[6]

Ciaramita M., 2008, P 46 ANN M ASS COMP, P719

[7] What does BERT look at? An Analysis of BERT's Attention [J].

Clark, Kevin ;

Khandelwal, Urvashi ;

Levy, Omer ;

Manning, Christopher D. .

BLACKBOXNLP WORKSHOP ON ANALYZING AND INTERPRETING NEURAL NETWORKS FOR NLP AT ACL 2019, 2019, :276-286

[8] The influence of caption features on clickthrough patterns in web search [J].

Clarke, Charles L. A. ;

Agichtein, Eugene ;

Dumais, Susan ;

White, Ryen W. .

Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'07, 2007, :135-142

[9]

Claudino L., 2014, EMNLP, P633

[10]

Cohen D, 2016, PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON THE THEORY OF INFORMATION RETRIEVAL, ICTIR 2016, P143, DOI 10.1145/2970398.2970438

← 1 2 3 4 5 →