Natural Language Generation, its Evaluation and Metrics

被引：0

作者：

Gehrmann, Sebastian ^{[9
]}

Adewumi, Tosin ^{[19
,20
]}

Aggarwal, Karmanya ^{[14
]}

Ammanamanchi, Pawan Sasanka ^{[15
]}

Anuoluwapo, Aremu ^{[20
,37
]}

Bosselut, Antoine ^{[27
]}

Chandu, Khyathi Raghavi ^{[2
]}

Clinciu, Miruna ^{[7
,11
,34
]}

Das, Dipanjan ^{[9
]}

Dhole, Kaustubh D. ^{[1
]}

Du, Wanyu ^{[41
]}

Durmus, Esin ^{[5
]}

Gangal, Varun ^{[2
]}

Garbacea, Cristina ^{[38
]}

Hashimoto, Tatsunori ^{[27
]}

Hou, Yufang ^{[13
]}

Jernite, Yacine ^{[12
]}

Jhamtani, Harsh ^{[2
]}

Ji, Yangfeng ^{[41
]}

Jolly, Shailza ^{[6
,28
]}

Kale, Mihir ^{[9
]}

Kumar, Dhruv ^{[43
]}

Ladhak, Faisal ^{[4
]}

Madaan, Aman ^{[2
]}

Maddela, Mounica ^{[8
]}

Mahajan, Khyati ^{[33
]}

Mahamood, Saad ^{[31
]}

Majumder, Bodhisattwa Prasad ^{[36
]}

Martins, Pedro Henrique ^{[16
]}

McMillan-Major, Angelina ^{[42
]}

Mille, Simon ^{[25
]}

van Miltenburg, Emiel ^{[30
]}

Nadeem, Moin ^{[21
]}

Narayan, Shashi ^{[9
]}

Nikolaev, Vitaly ^{[9
]}

Niyongabo, Rubungo Andre ^{[20
,35
]}

Osei, Salomey ^{[18
,20
]}

Parikh, Ankur ^{[9
]}

Perez-Beltrachini, Laura ^{[34
]}

Rao, Niranjan Ramesh ^{[23
]}

Raunak, Vikas ^{[22
]}

Rodriguez, Juan Diego ^{[40
]}

Santhanam, Sashank ^{[33
]}

Sedoc, Joao ^{[24
]}

Sellam, Thibault ^{[9
]}

Shaikh, Samira ^{[33
]}

Shimorina, Anastasia ^{[32
]}

Sobrevilla Cabezudo, Marco Antonio ^{[39
]}

Strobelt, Hendrik ^{[13
]}

Subramani, Nishant ^{[17
,20
]}

机构：

[1] Amelia R&D, New York, NY USA

[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA

[3] Charles Univ Prague, Prague, Czech Republic

[4] Columbia Univ, New York, NY 10027 USA

[5] Cornell Univ, Ithaca, NY 14853 USA

[6] DFKI, Berlin, Germany

[7] Edinburgh Ctr Robot, Edinburgh, Midlothian, Scotland

[8] Georgia Tech, Atlanta, GA USA

[9] Google Res, Mountain View, CA 94043 USA

[10] Harvard Univ, Cambridge, MA 02138 USA

[11] Heriot Watt Univ, Edinburgh, Midlothian, Scotland

[12] Hugging Face, New York, NY USA

[13] IBM Res, Armonk, NY USA

[14] IIIT Delhi, Delhi, India

[15] IIIT Hyderabad, Hyderabad, Telangana, India

[16] Inst Telecomunicacoes, Aveiro, Portugal

[17] Intel, Intelligent Syst Lab, Santa Clara, CA USA

[18] Kwame Nkrumah Univ Sci & Technol, Kumasi, Ghana

[19] Lulea Univ Technol, Lulea, Sweden

[20] Masakhane, Gauteng, South Korea

[21] MIT, Cambridge, MA 02139 USA

[22] Microsoft, Redmond, WA USA

[23] Natl Inst Technol Karnataka, Mangalore, India

[24] NYU, New York, NY 10003 USA

[25] Pompeu Fabra Univ, Barcelona, Spain

[26] Samsung Res, Suwon, South Korea

[27] Stanford Univ, Stanford, CA 94305 USA

[28] Tech Univ Kaiserslautern, Kaiserslautern, Germany

[29] Tech Univ Munich, Munich, Germany

[30] Tilburg Univ, Tilburg, Netherlands

[31] Trivago, Dusseldorf, Germany

[32] Univ Lorraine, Nancy, France

[33] Univ North Carolina Charlotte, Charlotte, NC USA

[34] Univ Edinburgh, Edinburgh, Midlothian, Scotland

[35] Univ Elect Sci & Technol China, Chengdu, Sichuan, Peoples R China

[36] Univ Calif San Diego, La Jolla, CA 92093 USA

[37] Univ Lagos, Lagos, Nigeria

[38] Univ Michigan, Ann Arbor, MI 48109 USA

[39] Univ Sao Paulo, Sao Paulo, Brazil

[40] Univ Texas Austin, Austin, TX 78712 USA

[41] Univ Virginia, Charlottesville, VA 22903 USA

[42] Univ Washington, Seattle, WA 98195 USA

[43] Univ Waterloo, Waterloo, ON, Canada

来源：

1ST WORKSHOP ON NATURAL LANGUAGE GENERATION, EVALUATION, AND METRICS (GEM 2021) | 2021年

关键词：

OF-THE-ART;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics. Measuring progress in NLG relies on a constantly evolving ecosystem of automated metrics, datasets, and human evaluation standards. Due to this moving target, new models often still evaluate on divergent anglo-centric corpora with well-established, but flawed, metrics. This disconnect makes it challenging to identify the limitations of current models and opportunities for progress. Addressing this limitation, GEM provides an environment in which models can easily be applied to a wide set of tasks and in which evaluation strategies can be tested. Regular updates to the benchmark will help NLG research become more multilingual and evolve the challenge alongside models. This paper serves as the description of the data for which we are organizing a shared task at our ACL 2021 Workshop and to which we invite the entire NLG community to participate.

引用

页码：96 / 120

页数：25

共 132 条

[1] Akoury N, 2020, PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), P6470
[2] Alva-Manchego F, 2020, 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), P4668
[3] Anastasopoulos Antonios, 2020, P 58 ANN M ASS COMPU, P8658
[4] [Anonymous], 2018, ARXIV180309010CSDB
[5] [Anonymous], 2016, P 1 C MACH TRANSL, DOI 10.18653/v1/W16-2302
[6] [Anonymous], 2000, Studies in Natural Language Processing
[7] Auli, 2016, EMNLP, P1203, DOI DOI 10.18653/V1/D16-1128
[8] Auli M., 2019, 7 INT C LEARN REPR I
[9] Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[10] Banerjee S., 2005, P ACL WORKSH INTR EX, P65

← 1 2 3 4 5 6 7 8 9 10 →