Detecting Fine-Grained Cross-Lingual Semantic Divergences without Supervision by Learning to Rank

被引:0
作者
Briakou, Eleftheria [1 ]
Carpuat, Marine [1 ]
机构
[1] Univ Maryland, Dept Comp Sci, College Pk, MD 20742 USA
来源
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP) | 2020年
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Detecting fine-grained differences in content conveyed in different languages matters for cross-lingual NLP and multilingual corpora analysis, but it is a challenging machine learning problem since annotation is expensive and hard to scale. This work improves the prediction and annotation of fine-grained semantic divergences. We introduce a training strategy for multilingual BERT models by learning to rank synthetic divergent examples of varying granularity. We evaluate our models on the Rationalized English-French Semantic Divergences, a new dataset released with this work, consisting of English-French sentence-pairs annotated with semantic divergence classes and token-level rationales. Learning to rank helps detect fine-grained sentence-level divergences more accurately than a strong sentence-level similarity model, while token-level predictions have the potential of further distinguishing between coarse and fine-grained divergences.
引用
收藏
页码:1563 / 1580
页数:18
相关论文
共 55 条
[1]  
Agirre Eneko, 2016, SEMEVAL 2016 10 INT, P512, DOI DOI 10.18653/V1/S16-1082
[2]  
[Anonymous], 2019, SHARED TASK PAPERS
[3]  
[Anonymous], 2018, Long Papers
[4]  
[Anonymous], 2007, P ANN M ASS COMPUTAT
[5]   Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond [J].
Artetxe, Mikel ;
Schwenk, Holger .
TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2019, 7 :597-610
[6]  
Bao Patti, 2012, P SIGCHI C HUM FACT, P1075, DOI 10.1145/2207676.2208553
[7]  
Bender Emily M, 2018, Transactions of the Association for Computational Linguistics, V6, P587
[8]  
Cardon Remi., 2020, Proceedings of the 13th Workshop on Building and Using Comparable Corpora, P44
[9]  
Cer D., 2017, P 11 INT WORKSHOP SE, DOI DOI 10.18653/V1/S17-2001
[10]  
Cer D, 2018, CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018): PROCEEDINGS OF SYSTEM DEMONSTRATIONS, P169