TRANSQL: A Transformer-based Model for Classifying SQL Queries

被引:0
作者
Tahmasebi, Shirin [1 ]
Payberah, Amir H. [1 ]
Paragraph, Ahmet Soylu [2 ]
Roman, Dumitru [3 ]
Matskin, Mihhail [1 ]
机构
[1] KTH Royal Inst Technol, Stockholm, Sweden
[2] Oslo Metropolitan Univ, Oslo, Norway
[3] SINTEF AS, Trondheim, Norway
来源
2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA | 2022年
关键词
SQL Classification; BERT; GPT;
D O I
10.1109/ICMLA55696.2022.00131
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Domain-Specific Languages (DSL) are becoming popular in various fields as they enable domain experts to focus on domain-specific concepts rather than software-specific ones. Many domain experts usually reuse their previously-written scripts for writing new ones; however, to make this process straightforward, there is a need for techniques that can enable domain experts to find existing relevant scripts easily. One fundamental component of such a technique is a model for identifying similar DSL scripts. Nevertheless, the inherent nature of DSLs and lack of data makes building such a model challenging. Hence, in this work, we propose TRANSQL, a transformer-based model for classifying DSL scripts based on their similarities, considering their few-shot context. We build TRANSQL using BERT and GPT-3, two performant language models. Our experiments focus on SQL as one of the most commonly-used DSLs. The experiment results reveal that the BERT-based TRANSQL cannot perform well for DSLs since they need extensive data for the fine-tuning phase. However, the GPT-based TRANSQL gives markedly better and more promising results.
引用
收藏
页码:788 / 793
页数:6
相关论文
共 31 条
[11]  
Devlin J, 2019, Arxiv, DOI [arXiv:1810.04805, 10.48550/arXiv.1810.04805]
[12]  
Feng ZY, 2020, Arxiv, DOI [arXiv:2002.08155, 10.48550/arXiv.2002.08155]
[13]  
Fowler M., 2010, Domain-specific languages
[14]   COCLUBERT: Clustering Machine Learning Source Code [J].
Hagglund, Marcus ;
Pena, Francisco J. ;
Pashami, Sepideh ;
Al-Shishtawy, Ahmad ;
Payberah, Amir H. .
20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, :151-158
[15]  
Kanade A, 2020, PR MACH LEARN RES, V119
[16]   SnipSuggest: Context-Aware Autocompletion for SQL [J].
Khoussainova, Nodira ;
Kwon, YongChul ;
Balazinska, Magdalena ;
Suciu, Dan .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2010, 4 (01) :22-33
[17]   Similarity Metrics for SQL Query Clustering [J].
Kul, Gokhan ;
Duc Thanh Anh Luong ;
Xie, Ting ;
Chandola, Varun ;
Kennedy, Oliver ;
Upadhyaya, Shambhu .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2018, 30 (12) :2408-2420
[18]  
Liu JC, 2021, Arxiv, DOI arXiv:2101.06804
[19]  
Liu YH, 2019, Arxiv, DOI arXiv:1907.11692
[20]  
Xie SM, 2022, Arxiv, DOI [arXiv:2111.02080, 10.48550/arXiv.2111.02080, 10.48550/2111.02080, DOI 10.48550/2111.02080]