MeaningBERT: assessing meaning preservation between sentences

被引:2
作者
Beauchemin, David [1 ]
Saggion, Horacio [2 ]
Khoury, Richard [1 ]
机构
[1] Univ Laval, Dept Comp Sci & Software Engn, Grp Res Artificial Intelligence, Quebec City, PQ, Canada
[2] Univ Pompeu Fabra, Dept Informat & Commun Technol, Large Scale Text Understanding Syst Lab, Nat Language Proc Grp, Barcelona, Spain
来源
FRONTIERS IN ARTIFICIAL INTELLIGENCE | 2023年 / 6卷
基金
加拿大自然科学与工程研究理事会;
关键词
evaluation of text simplification systems; meaning preservation; automatic text simplification; lexical simplification; syntactic simplification; few-shot evaluation of text simplification systems;
D O I
10.3389/frai.2023.1223924
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the field of automatic text simplification, assessing whether or not the meaning of the original text has been preserved during simplification is of paramount importance. Metrics relying on n-gram overlap assessment may struggle to deal with simplifications which replace complex phrases with their simpler paraphrases. Current evaluation metrics for meaning preservation based on large language models (LLMs), such as BertScore in machine translation or QuestEval in summarization, have been proposed. However, none has a strong correlation with human judgment of meaning preservation. Moreover, such metrics have not been assessed in the context of text simplification research. In this study, we present a meta-evaluation of several metrics we apply to measure content similarity in text simplification. We also show that the metrics are unable to pass two trivial, inexpensive content preservation tests. Another contribution of this study is MeaningBERT (https://github.com/GRAAL-Research/MeaningBERT), a new trainable metric designed to assess meaning preservation between two sentences in text simplification, showing how it correlates with human judgment. To demonstrate its quality and versatility, we will also present a compilation of datasets used to assess meaning preservation and benchmark our study against a large selection of popular metrics.
引用
收藏
页数:10
相关论文
共 36 条
  • [1] Alva-Manchego F., 2020, Annual Meeting of the Association for Computational Linguistics, DOI [10.18653/v1/2020.acl-main.424, DOI 10.18653/V1/2020.ACL-MAIN.424]
  • [2] Alva-Manchego F, 2021, COMPUT LINGUIST, V47, P861, DOI [10.1162/COLI_a_00418, 10.1162/coli_a_00418]
  • [3] [Anonymous], 2005, Global Autonomous Language Exploitation (GALE)
  • [4] Banerjee S., 2005, P ACL WORKSH INTR EX, P65
  • [5] Brown T, 2020, Adv Neural Inf Process Syst, V33, P1877
  • [6] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [7] SummEval: Re-evaluating Summarization Evaluation
    Fabbri, Alexander R.
    Kryscinski, Wojciech
    McCann, Bryan
    Xiong, Caiming
    Socher, Richard
    Radev, Dragomir
    [J]. TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2021, 9 : 391 - 409
  • [8] Flesh Rudolph., 1948, Elementary English, V25, P344
  • [9] Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation
    Gatt, Albert
    Krahmer, Emiel
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2018, 61 : 65 - 170
  • [10] Gunning Robert, 1969, The Journal of Business Communication, V6, P3, DOI [10.1177/002194366900600202, DOI 10.1177/002194366900600202]