Benchmarking and Improving Text-to-SQL Generation under Ambiguity

被引：0

作者：

Bhaskar, Adithya ^{[1
,2
]}

Tomar, Tushar ^{[1
]}

Sathe, Ashutosh ^{[1
]}

Sarawagi, Sunita ^{[1
]}

机构：

[1] Indian Inst Technol, Mumbai, Maharashtra, India

[2] Princeton Univ, Princeton, NJ 08544 USA

来源：

2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023 | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Research in Text-to-SQL conversion has been largely benchmarked against datasets where each text query corresponds to one correct SQL. However, natural language queries over real-life databases frequently involve significant ambiguity about the intended SQL due to overlapping schema names and multiple confusing relationship paths. To bridge this gap, we develop a novel benchmark called AmbiQT with over 3000 examples where each text is interpretable as two plausible SQLs due to lexical and/or structural ambiguity. When faced with ambiguity, an ideal top-k decoder should generate all valid interpretations for possible disambiguation by the user (Elgohary et al., 2021; Zhong et al., 2022). We evaluate several Text-to-SQL systems and decoding algorithms, including those employing state-of-the-art LLMs, and find them to be far from this ideal. The primary reason is that the prevalent beam search algorithm and its variants, treat SQL queries as a string and produce unhelpful token-level diversity in the top-k. We propose LogicalBeam, a new decoding algorithm that navigates the SQL logic space using a blend of plan-based template generation and constrained infilling. Counterfactually generated plans diversify templates while in-filling with a beam-search, that branches solely on schema names, provides value diversity. Logical-Beam is up to 2.5x more effective than state-of-the-art models at generating all candidate SQLs in the top-k ranked outputs. It also enhances the top-5 Exact and Execution Match Accuracies on SPIDER and Kaggle DBQA(1).

引用

页码：7053 / 7074

页数：22

共 47 条

[1]

[Anonymous], 2022, OpenAI

[2]

Arcadinho Samuel, 2022, T5QL: Taming language models for SQL generation

[3]

Awasthi Abhijeet, 2022, P 2022 C EMPIRICAL M

[4] WebTables: Exploring the Power of Tables on the Web [J].

Cafarella, Michael J. ;

Halevy, Alon ;

Wang, Daisy Zhe ;

Wu, Eugene ;

Zhang, Yang .

PROCEEDINGS OF THE VLDB ENDOWMENT, 2008, 1 (01) :538-549

[5]

Chen M., 2021, EVALUATING LARGE LAN

[6]

Chung H W., 2022, SCALING INSTRUCTION

[7]

Data Commons, 2009, US

[8]

Elgohary A, 2021, 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), P5599

[9]

Elgohary A, 2020, 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), P2065

[10]

Finkel JennyRose., 2006, EMNLP 06, P618

← 1 2 3 4 5 →