Sentence Pair Embeddings Based Evaluation Metric for Abstractive and Extractive Summarization

被引:0
作者
Akula, Ramya [1 ]
Garibay, Ivan [1 ]
机构
[1] Univ Cent Florida, Orlando, FL 32816 USA
来源
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2022年
关键词
Evaluation Metric; Abstractive Summarization; Extractive Summarization; Semantic Similarity;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The development of an automatic evaluation metric remains an open problem in text generation. Widely used evaluation metrics, like ROUGE and BLEU, are based on exact word matching and fail to capture semantic similarity. Recent works, such as BERTScore, MoverScore and, Sentence Mover's Similarity, are an improvement over these standard metrics as they use contextualized word or sentence embeddings to capture semantic similarity. We in this work, propose a novel evaluation metric, Sentence Pair EmbEDdings (SPEED) Score, for text generation which is based on semantic similarity between sentence pairs as opposed to earlier approaches. To find semantic similarity between a pair of sentences, we obtain sentence-level embeddings from multiple transformer models pre-trained specifically on various sentence pair tasks such as Paraphrase Detection (PD), Semantic Text Similarity (STS), and Natural Language Inference (NLI). As these sentence pair tasks involve capturing the semantic similarity between a pair of input texts, we leverage these models in our metric computation. Our proposed evaluation metric shows impressive performance in evaluating both abstractive and extractive summarization models and achieves state-of-the-art results on the SummEval dataset, demonstrating the effectiveness of our approach. Also, we perform the run-time analysis to show that our proposed metric is faster than the current state-of-the-art.
引用
收藏
页码:6009 / 6017
页数:9
相关论文
共 23 条
  • [1] Boski M, 2017, 2017 10TH INTERNATIONAL WORKSHOP ON MULTIDIMENSIONAL (ND) SYSTEMS (NDS)
  • [2] Bowman S., 2015, P 2015 C EMP METH NA, P632, DOI DOI 10.18653/V1/D15-1075
  • [3] Cera D., SEMEVAL 2017 TASK 1
  • [4] Clark E, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P2748
  • [5] Dolan W.B., 2005, P INT WORKSHOP PARAP
  • [6] SummEval: Re-evaluating Summarization Evaluation
    Fabbri, Alexander R.
    Kryscinski, Wojciech
    McCann, Bryan
    Xiong, Caiming
    Socher, Richard
    Radev, Dragomir
    [J]. TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2021, 9 : 391 - 409
  • [7] Gao Y, 2020, 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), P1347
  • [8] Hermann Karl Moritz, 2015, ADV NEURAL INFORM PR, V28
  • [9] Iyer Shankar, 2017, DATA QUORA
  • [10] Kilickaya M, 2017, 15TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2017), VOL 1: LONG PAPERS, P199