Benchmarking and Improving Text-to-SQL Generation under Ambiguity

被引:0
作者
Bhaskar, Adithya [1 ,2 ]
Tomar, Tushar [1 ]
Sathe, Ashutosh [1 ]
Sarawagi, Sunita [1 ]
机构
[1] Indian Inst Technol, Mumbai, Maharashtra, India
[2] Princeton Univ, Princeton, NJ 08544 USA
来源
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023 | 2023年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Research in Text-to-SQL conversion has been largely benchmarked against datasets where each text query corresponds to one correct SQL. However, natural language queries over real-life databases frequently involve significant ambiguity about the intended SQL due to overlapping schema names and multiple confusing relationship paths. To bridge this gap, we develop a novel benchmark called AmbiQT with over 3000 examples where each text is interpretable as two plausible SQLs due to lexical and/or structural ambiguity. When faced with ambiguity, an ideal top-k decoder should generate all valid interpretations for possible disambiguation by the user (Elgohary et al., 2021; Zhong et al., 2022). We evaluate several Text-to-SQL systems and decoding algorithms, including those employing state-of-the-art LLMs, and find them to be far from this ideal. The primary reason is that the prevalent beam search algorithm and its variants, treat SQL queries as a string and produce unhelpful token-level diversity in the top-k. We propose LogicalBeam, a new decoding algorithm that navigates the SQL logic space using a blend of plan-based template generation and constrained infilling. Counterfactually generated plans diversify templates while in-filling with a beam-search, that branches solely on schema names, provides value diversity. Logical-Beam is up to 2.5x more effective than state-of-the-art models at generating all candidate SQLs in the top-k ranked outputs. It also enhances the top-5 Exact and Execution Match Accuracies on SPIDER and Kaggle DBQA(1).
引用
收藏
页码:7053 / 7074
页数:22
相关论文
共 47 条
[1]  
[Anonymous], 2022, OpenAI
[2]  
Arcadinho Samuel, 2022, T5QL: Taming language models for SQL generation
[3]  
Awasthi Abhijeet, 2022, P 2022 C EMPIRICAL M
[4]   WebTables: Exploring the Power of Tables on the Web [J].
Cafarella, Michael J. ;
Halevy, Alon ;
Wang, Daisy Zhe ;
Wu, Eugene ;
Zhang, Yang .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2008, 1 (01) :538-549
[5]  
Chen M., 2021, EVALUATING LARGE LAN
[6]  
Chung H W., 2022, SCALING INSTRUCTION
[7]  
Data Commons, 2009, US
[8]  
Elgohary A, 2021, 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), P5599
[9]  
Elgohary A, 2020, 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), P2065
[10]  
Finkel JennyRose., 2006, EMNLP 06, P618