PMIndiaSum: Multilingual and Cross-lingual Headline Summarization for Languages in India

被引:0
作者
Urlanal, Ashok [1 ]
Chen, Pinzhen [2 ]
Zhao, Zheng [2 ]
Cohen, Shay B. [2 ]
Shrivastava, Manish [1 ]
Haddow, Barry [2 ]
机构
[1] IIIT Hyderabad, Hyderabad, India
[2] Univ Edinburgh, Edinburgh, Midlothian, Scotland
来源
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023) | 2023年
基金
英国科研创新办公室;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces PMIndiaSum, a multilingual and massively parallel summarization corpus focused on languages in India. Our corpus provides a training and testing ground for four language families, 14 languages, and the largest to date with 196 language pairs. We detail our construction workflow including data acquisition, processing, and quality assurance. Furthermore, we publish benchmarks for monolingual, cross-lingual, and multilingual summarization by fine-tuning, prompting, as well as translate-and-summarize. Experimental results confirm the crucial role of our data in aiding summarization between Indian languages. Our dataset is publicly available and can be freely modified and re-distributed.(1)
引用
收藏
页码:11606 / 11628
页数:23
相关论文
共 42 条
[1]  
[Anonymous], 1979, PSYCHOL B
[2]  
Aralikatte Rahul, 2023, FINDINGS ASS COMPUTA
[3]  
Aumiller Dennis, 2022, P 2022 C EMP METH NA
[4]  
Bhattacharjee A, 2023, PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, P2541
[5]  
Bommasani Rishi, 2020, P 2020 C EMP METH NA
[6]  
Chiang W.-L., 2023, Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality
[7]  
Clark Elizabeth, 2023, ARXIV
[8]  
Dabre Raj, 2022, FINDINGS ASS COMPUTA
[9]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[10]   Gram-Scale Synthesized Two-Dimensional VSe2 and SnSe2 for Ultrahigh Electrocatalytic Sulfion Recycling [J].
Feng, Wang ;
Cheng, Mo ;
Du, Ruofan ;
Wang, Yuzhu ;
Wang, Peng ;
Li, Hui ;
Song, Luying ;
Wen, Xia ;
Yang, Junbo ;
Li, Xiaohui ;
He, Jun ;
Shi, Jianping .
ADVANCED MATERIALS INTERFACES, 2022, 9 (13)