SPORTSINTERVIEW A Large-Scale Sports Interview Benchmark for Entity-centric Dialogues

被引:0
作者
Sun, Hanfei [1 ]
Cao, Ziyuan [1 ]
Yang, Diyi [1 ]
机构
[1] Georgia Inst Technol, Atlanta, GA 30332 USA
来源
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2022年
关键词
Entity-centric sports interview; conversation dataset; text generation; deep neural network;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We propose a novel knowledge grounded dialogue (interview) dataset SPORTSINTERVIEW set in the domain of sports interview. Our dataset contains two types of external knowledge sources as knowledge grounding, and is rich in content, containing about 150K interview sessions and 34K distinct interviewees. Compared to existing knowledge grounded dialogue datasets, our interview dataset is larger in size, comprises natural dialogues revolving around real-world sports matches, and have more than one dimension of external knowledge linking. We performed several experiments on SPORTSINTERVIEW and found that models such as BART fine-tuned on our dataset are able to learn lots of relevant domain knowledge and generate meaningful sentences (questions or responses). However, their performance is still far from humans (by comparing to gold sentences in the dataset) and hence encourages future research utilizing SPORTSINTERVIEW.
引用
收藏
页码:5821 / 5828
页数:8
相关论文
共 13 条
[1]  
[Anonymous], 2015, ARXIV151003055
[2]  
Dinan Emily, 2019, 7 INT C LEARN REPR I
[3]  
Fan Angela, 2018, INT C LEARN REPR
[4]  
Holtzman Ari, 2019, The curious case of neural text degeneration
[5]   Learning to Prune Filters in Convolutional Neural Networks [J].
Huang, Qiangui ;
Zhou, Kevin ;
You, Suya ;
Neumann, Ulrich .
2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, :709-718
[6]  
Lewis M., 2020, P 58 ANN M ASS COMPU, P7871, DOI [10.18653/v1/2020.acl, 10.18653/v1/2020.acl-main.703, DOI 10.18653/V1/2020.ACL-MAIN.703]
[7]  
Li Y., 2017, P 8 INT JOINT C NAT, P986
[8]  
Majumder BP, 2020, PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), P8129
[9]  
Post Matt, 2018, P 3 C MACH TRANSL RE, P186, DOI [10.18653/v1/W18-6319, DOI 10.18653/V1/W18-6319]
[10]  
Radford A, 2019, OPENAI BLOG, V1, P9, DOI DOI 10.4018/978-1-5225-9348-5.CH006