MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing

被引:0
作者
Dou, Longxu [1 ]
Gao, Yan [2 ]
Pan, Mingyang [1 ]
Wang, Dingzirui [1 ]
Che, Wanxiang [1 ]
Zhan, Dechen [1 ]
Lou, Jian-Guang [2 ]
机构
[1] Harbin Inst Technol, Harbin, Peoples R China
[2] Microsoft Res Asia, Beijing, Peoples R China
来源
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11 | 2023年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text- to-SQL semantic parsing is an important NLP task, which greatly facilitates the interaction between users and the database and becomes the key component in many human-computer interaction systems. Much recent progress in text-to-SQL has been driven by large-scale datasets, but most of them are centered on English. In this work, we present MULTISPIDER, the largest multilingual text-to-SQL dataset which covers seven languages (English, German, French, Spanish, Japanese, Chinese, and Vietnamese). Upon MULTISPIDER, we further identify the lexical and structural challenges of text-to-SQL (caused by specific language properties and dialect sayings) and their intensity across different languages. Experimental results under three typical settings (zero-shot, monolingual and multilingual) reveal a 6.1% absolute drop in accuracy in non-English languages. Qualitative and quantitative analyses are conducted to understand the reason for the performance drop of each language. Besides the dataset, we also propose a simple schema augmentation framework SAVE (Schema-Augmentation-with-Verification), which significantly boosts the overall performance by about 1.8% and closes the 29.5% performance gap across languages.
引用
收藏
页码:12745 / 12753
页数:9
相关论文
共 50 条
  • [41] On the Vulnerabilities of Text-to-SQL Models
    Peng, Xutan
    Zhang, Yipeng
    Yang, Jingfeng
    Stevenson, Mark
    [J]. 2023 IEEE 34TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING, ISSRE, 2023, : 1 - 12
  • [42] Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task
    Yu, Tao
    Zhang, Rui
    Yang, Kai
    Yasunaga, Michihiro
    Wang, Dongxu
    Li, Zifan
    Ma, James
    Li, Irene
    Yao, Qingning
    Roman, Shanelle
    Zhang, Zilin
    Radev, Dragomir R.
    [J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 3911 - 3921
  • [43] Improving Text-to-SQL Evaluation Methodology
    Finegan-Dollak, Catherine
    Kummerfeld, Jonathan K.
    Zhang, Li
    Ramanathan, Karthik
    Sadasivam, Sesh
    Zhang, Rui
    Radev, Dragomir
    [J]. PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 351 - 360
  • [44] Exploring Schema Generalizability of Text-to-SQL
    Li, Jieyu
    Chen, Lu
    Cao, Ruisheng
    Zhu, Su
    Xu, Hongshen
    Chen, Zhi
    Zhang, Hanchong
    Yu, Kai
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 1344 - 1360
  • [45] SQL-to-Schema Enhances Schema Linking in Text-to-SQL
    Yang, Sun
    Su, Qiong
    Li, Zhishuai
    Li, Ziyue
    Mao, Hangyu
    Liu, Chenxi
    Zhao, Rui
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, PT I, DEXA 2024, 2024, 14910 : 139 - 145
  • [46] Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation
    Guo, Jiaqi
    Zhan, Zecheng
    Gao, Yan
    Xiao, Yan
    Lou, Jian-Guang
    Liu, Ting
    Zhang, Dongmei
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 4524 - 4535
  • [47] Text-to-SQL: A methodical review of challenges and models
    Kanburoglu, Ali Bugra
    Tek, F. Boray
    [J]. TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2024, 32 (03) : 403 - 419
  • [48] RuleSQLova: Improving Text-to-SQL with Logic Rules
    Han, Shoukang
    Gao, Neng
    Guo, Xiaobo
    Shan, Yiwei
    [J]. 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [49] KaggleDBQA: Realistic Evaluation of Text-to-SQL Parsers
    Lee, Chia-Hsuan
    Polozov, Oleksandr
    Richardson, Matthew
    [J]. 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 2261 - 2273
  • [50] A survey on deep learning approaches for text-to-SQL
    Katsogiannis-Meimarakis, George
    Koutrika, Georgia
    [J]. VLDB JOURNAL, 2023, 32 (04) : 905 - 936