TRANSQL: A Transformer-based Model for Classifying SQL Queries

被引:0
作者
Tahmasebi, Shirin [1 ]
Payberah, Amir H. [1 ]
Paragraph, Ahmet Soylu [2 ]
Roman, Dumitru [3 ]
Matskin, Mihhail [1 ]
机构
[1] KTH Royal Inst Technol, Stockholm, Sweden
[2] Oslo Metropolitan Univ, Oslo, Norway
[3] SINTEF AS, Trondheim, Norway
来源
2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA | 2022年
关键词
SQL Classification; BERT; GPT;
D O I
10.1109/ICMLA55696.2022.00131
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Domain-Specific Languages (DSL) are becoming popular in various fields as they enable domain experts to focus on domain-specific concepts rather than software-specific ones. Many domain experts usually reuse their previously-written scripts for writing new ones; however, to make this process straightforward, there is a need for techniques that can enable domain experts to find existing relevant scripts easily. One fundamental component of such a technique is a model for identifying similar DSL scripts. Nevertheless, the inherent nature of DSLs and lack of data makes building such a model challenging. Hence, in this work, we propose TRANSQL, a transformer-based model for classifying DSL scripts based on their similarities, considering their few-shot context. We build TRANSQL using BERT and GPT-3, two performant language models. Our experiments focus on SQL as one of the most commonly-used DSLs. The experiment results reveal that the BERT-based TRANSQL cannot perform well for DSLs since they need extensive data for the fine-tuning phase. However, the GPT-based TRANSQL gives markedly better and more promising results.
引用
收藏
页码:788 / 793
页数:6
相关论文
共 31 条
[1]  
Smith NA, 2020, Arxiv, DOI arXiv:1902.06006
[2]   SQL QueRIE Recommendations [J].
Akbarnejad, Javad ;
Chatzopoulou, Gloria ;
Eirinaki, Magdalini ;
Koshy, Suju ;
Mittal, Sarika ;
On, Duc ;
Polyzotis, Neoklis ;
Varman, Jothi S. Vindhiya .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2010, 3 (02) :1597-1600
[3]   Representations and Optimizations for Embedded Parallel Dataflow Languages [J].
Alexandrov, Alexander ;
Krastev, Georgi ;
Markl, Volker .
ACM TRANSACTIONS ON DATABASE SYSTEMS, 2019, 44 (01)
[4]  
[Anonymous], OPENAPI CLASSIFICAIT
[5]   Scalable and data-aware SQL query recommendations [J].
Arzamasova, Natalia ;
Boehm, Klemens .
INFORMATION SYSTEMS, 2021, 96
[6]  
Balkus SV, 2022, Arxiv, DOI [arXiv:2205.10981, 10.48550/arXiv.2205.10981]
[7]  
Brown TB, 2020, ADV NEUR IN, V33
[8]   Better Few-Shot Text Classification with Pre-trained Language Model [J].
Chen, Zheng ;
Zhang, Yunchen .
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT II, 2021, 12892 :537-548
[9]  
Cohan A, 2019, Arxiv, DOI arXiv:1909.04054
[10]  
Degueule Thomas., 2015, SLE 2015 P 2015 ACM, P25, DOI DOI 10.1145/2814251.2814252