CATAMARAN: A Cross-lingual Long Text Abstractive Summarization Dataset

被引:0
|
作者
Chen, Zheng [1 ]
Lin, Hongyu [1 ]
机构
[1] Univ Elect Sci & Technol China, 4,Sect 2,North Jianshe Rd, Chengdu, Peoples R China
来源
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2022年
关键词
Abstractive summarization; Cross-lingual summarization; Long text summarization;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Cross-lingual summarization, which produces the summary in one language from a given source document in another language, could be extremely helpful for humans to obtain information across the world. However, it is still a little-explored task due to the lack of datasets. Recent studies are primarily based on pseudo-cross-lingual datasets obtained by translation. Such an approach would inevitably lead to the loss of information in the original document and introduce noise into the summary, thus hurting the overall performance. In this paper, we present CATAMARAN, the first high-quality cross-lingual long text abstractive summarization dataset. It contains about 20,000 parallel news articles and corresponding summaries, all written by humans. The average lengths of articles are 1133.65 for English articles and 2035.33 for Chinese articles, and the average lengths of the summaries are 26.59 and 70.05, respectively. We train and evaluate an mBART-based cross-lingual abstractive summarization model using our dataset. The result shows that, compared with mono-lingual systems, the cross-lingual abstractive summarization system could also achieve solid performance.
引用
收藏
页码:6932 / 6937
页数:6
相关论文
共 50 条
  • [31] English-Arabic Text Translation and Abstractive Summarization Using Transformers
    Holiel, Heidi Ahmed
    Mohamed, Nancy
    Ahmed, Arwa
    Medhat, Walaa
    2023 20TH ACS/IEEE INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, AICCSA, 2023,
  • [32] TASP : Topic-based abstractive summarization of Facebook text posts
    Benedetto, Irene
    La Quatra, Moreno
    Cagliero, Luca
    Vassio, Luca
    Trevisan, Martino
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 255
  • [33] Anaphora resolved abstractive text summarization (AR-ATS) system
    Moratanch, N.
    Chitrakala, S.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (03) : 4569 - 4597
  • [34] Anaphora resolved abstractive text summarization (AR-ATS) system
    N. Moratanch
    S. Chitrakala
    Multimedia Tools and Applications, 2023, 82 : 4569 - 4597
  • [35] BD2TSumm: A Benchmark Dataset for Abstractive Disaster Tweet Summarization
    Garg, Piyush Kumar
    Chakraborty, Roshni
    Dandapat, Sourav Kumar
    ONLINE SOCIAL NETWORKS AND MEDIA, 2025, 45
  • [36] A Workbench for Rapid Generation of Cross-Lingual Summaries
    Jhaveri, Nisarg
    Gupta, Manish
    Varma, Vasudeva
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 3209 - 3215
  • [37] Enhancing low-resource cross-lingual summarization from noisy data with fine-grained reinforcement learning
    Huang, Yuxin
    Gu, Huailing
    Yu, Zhengtao
    Gao, Yumeng
    Pan, Tong
    Xu, Jialong
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2024, 25 (01) : 121 - 134
  • [38] Neural Attention Model for Abstractive Text Summarization Using Linguistic Feature Space
    Dilawari, Aniqa
    Khan, Muhammad Usman Ghani
    Saleem, Summra
    Zahoor-Ur-Rehman
    Shaikh, Fatema Sabeen
    IEEE ACCESS, 2023, 11 : 23557 - 23564
  • [39] Abstractive text summarization of low- resourced languages using deep learning
    Shafiq, Nida
    Hamid, Isma
    Asif, Muhammad
    Nawaz, Qamar
    Aljuaid, Hanan
    Ali, Hamid
    PEERJ COMPUTER SCIENCE, 2023, 9
  • [40] AI-based abstractive text summarization towards AIoT and edge computing
    Ma, Jun
    Li, Tong
    Zhang, Yanling
    INTERNET TECHNOLOGY LETTERS, 2023, 6 (05)