TASP : Topic-based abstractive summarization of Facebook text posts

被引:2
作者
Benedetto, Irene [1 ,4 ]
La Quatra, Moreno [2 ]
Cagliero, Luca [1 ]
Vassio, Luca [1 ]
Trevisan, Martino [3 ]
机构
[1] Politecn Torino, Corso Duca Abruzzi 24, I-10129 Turin, Italy
[2] Kore Univ Enna, Piazza Univ, I-94100 Enna, Italy
[3] Univ Trieste, Piazzale Europa 1, I-34127 Trieste, Italy
[4] MAIZE, Via San Quintino 31, I-10121 Turin, Italy
关键词
Social network mining; Abstractive summarization; Natural language understanding;
D O I
10.1016/j.eswa.2024.124567
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Summarizing trending topics in large collections of Facebook posts is particularly relevant to profile social user activities and interests. However, automatically generating these summaries poses significant challenges due to the high heterogeneity of the input data, the limited fluency of extractive summaries, and the absence of abstractive summarization methods capable of handling multiple posts simultaneously. Existing abstractive models are either not suited to handle large post collections or disregard topic-level text relations. In this work, we present TASP , a novel tool for trending topic detection and summarization from English-written Facebook posts. It trains abstractive summarization models on multi-post collections by leveraging a shortlist of authoritative posts published by renowned newspapers. At inference time, TASP first creates clusters of semantically similar social posts, each one representing a distinct topic, using pre-trained transformer-based language models. Then, it generates abstractive summaries of the clusters for which authoritative information is missing. To the best of our knowledge, TASP is the first available tool suited to abstractive multi-post summarization. We test our approach on a large-scale dataset of real Facebook posts. The results show (1) The higher effectiveness of transformer-based approaches in generating topic-specific post clusters compared to traditional methods. (2) The importance of attending long pieces of text in multi-post abstractive summary generation.
引用
收藏
页数:13
相关论文
共 62 条
  • [1] Angelov D, 2020, Arxiv, DOI [arXiv:2008.09470, 10.48550/arXiv.2008.09470]
  • [2] [Anonymous], 2004, ANN M ASS COMPUTATIO
  • [3] Aslam J.A., 2013, TREC
  • [4] Beltagy I, 2020, Arxiv, DOI arXiv:2004.05150
  • [5] Benedetto I., 2023, P 8 EV CAMP NAT LANG, V3473
  • [6] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [7] Bouma G., 2009, P GSCL, P31, DOI DOI 10.1007/BF02774984
  • [8] Campello Ricardo J. G. B., 2013, Advances in Knowledge Discovery and Data Mining. 17th Pacific-Asia Conference (PAKDD 2013). Proceedings, P160, DOI 10.1007/978-3-642-37456-2_14
  • [9] Disentangling the Information Flood on OSNs: Finding Notable Posts and Topics
    Caso, Paola
    Trevisan, Martino
    Vassio, Luca
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW, 2022, : 1168 - 1175
  • [10] Summarization, simplification, and generation: The case of patents
    Casola, Silvia
    Lavelli, Alberto
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2022, 205