A BERT-Based Generation Model to Transform Medical Texts to SQL Queries for Electronic Medical Records: Model Development and Validation

被引:7
|
作者
Pan, Youcheng [1 ]
Wang, Chenghao [1 ]
Hu, Baotian [1 ]
Xiang, Yang [2 ]
Wang, Xiaolong [1 ]
Chen, Qingcai [1 ,2 ]
Chen, Junjie [1 ]
Du, Jingcheng [3 ]
机构
[1] Harbin Inst Technol, Intelligent Comp Res Ctr, 6,Pingshan 1st Rd, Shenzhen 518055, Peoples R China
[2] Peng Cheng Lab, Shenzhen, Peoples R China
[3] Univ Texas Hlth Sci Ctr Houston, Houston, TX 77030 USA
关键词
electronic medical record; text-to-SQL generation; BERT; grammar-based decoding; tree-structured intermediate representation;
D O I
10.2196/32698
中图分类号
R-058 [];
学科分类号
摘要
Background: Electronic medical records (EMRs) are usually stored in relational databases that require SQL queries to retrieve information of interest. Effectively completing such queries can be a challenging task for medical experts due to the barriers in expertise. Existing text-to-SQL generation studies have not been fully embraced in the medical domain. Objective: The objective of this study was to propose a neural generation model that can jointly consider the characteristics of medical text and the SQL structure to automatically transform medical texts to SQL queries for EMRs. Methods: We proposed a medical text-to-SQL model (MedTS), which employed a pretrained Bidirectional Encoder Representations From Transformers model as the encoder and leveraged a grammar-based long short-term memory network as the decoder to predict the intermediate representation that can easily be transformed into the final SQL query. We adopted the syntax tree as the intermediate representation rather than directly regarding the SQL query as an ordinary word sequence, which is more in line with the tree-structure nature of SQL and can also effectively reduce the search space during generation. Experiments were conducted on the MIMICSQL dataset, and 5 competitor methods were compared. Results: Experimental results demonstrated that MedTS achieved the accuracy of 0.784 and 0.899 on the test set in terms of logic form and execution, respectively, which significantly outperformed the existing state-of-the-art methods. Further analyses proved that the performance on each component of the generated SQL was relatively balanced and offered substantial improvements. Conclusions: The proposed MedTS was effective and robust for improving the performance of medical text-to-SQL generation, indicating strong potential to be applied in the real medical scenario.
引用
收藏
页数:14
相关论文
共 26 条
  • [11] Task-oriented evaluation of electronic medical records systems: development and validation of a questionnaire for physicians
    Hallvard Lærum
    Arild Faxvaag
    BMC Medical Informatics and Decision Making, 4 (1)
  • [12] Masked Sentence Model Based on BERT for Move Recognition in Medical Scientific Abstracts
    Yu, Gaihong
    Zhang, Zhixiong
    Liu, Huan
    Ding, Liangping
    JOURNAL OF DATA AND INFORMATION SCIENCE, 2019, 4 (04) : 42 - 55
  • [13] Validation of gout diagnosis in electronic primary care medical records: A population-based study
    Pou, Maria A.
    Martinez-Laguna, Daniel
    Estebanez, Jose L.
    Aivar, Meritxell
    Gayarre, Raquel
    Conesa, Angels
    Hoyo, Jordi
    Carbonell, Cristina
    Reyes, Carlen
    Diaz-Torne, Cesar
    JOINT BONE SPINE, 2023, 90 (06)
  • [14] Application of Entity-BERT model based on neuroscience and brain-like cognition in electronic medical record entity recognition
    Lu, Weijia
    Jiang, Jiehui
    Shi, Yaxiang
    Zhong, Xiaowei
    Gu, Jun
    Huangfu, Lixia
    Gong, Ming
    FRONTIERS IN NEUROSCIENCE, 2023, 17
  • [15] Identifying Named Entities of Chinese Electronic Medical Records Based on RoBERTa-wwm Dynamic Fusion Model
    Yunqiu Z.
    Yang W.
    Bocheng L.
    Data Analysis and Knowledge Discovery, 2022, 6 (2-3) : 242 - 250
  • [16] Similarity calculation model between patients with Chinese electronic medical records based on multi-view hierarchical learning network
    Wang, Huina
    Li, Jianqiang
    Liu, Bo
    Li, Jinshu
    Long, Junqi
    2024 IEEE 48TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC 2024, 2024, : 2153 - 2158
  • [17] An electronic medical record access control model based on intuitionistic fuzzy trust
    Jiang, Rong
    Liu, Rui
    Zhang, Tao
    Ding, Weiping
    Tian, Shenghu
    INFORMATION SCIENCES, 2024, 658
  • [18] Development of a Keyword-Based Search Strategy to Identify Gender Diverse Youth in Electronic Medical Records
    Burnside, Amanda N.
    Hoffmann, Jennifer A.
    Lorenz, Douglas
    Walter, Jessica
    Berona, Johnny
    Janssen, Aron
    TRANSGENDER HEALTH, 2024,
  • [19] Development and validation of an electronic medical record (EMR)-based computed phenotype of HIV-1 infection
    Paul, Devon W.
    Neely, Nigel B.
    Clement, Meredith
    Riley, Isaretta
    Al-Hegelan, Mashael
    Phelan, Matthew
    Kraft, Monica
    Murdoch, David M.
    Lucas, Joseph
    Bartlett, John
    McKellar, Mehri
    Que, Loretta G.
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2018, 25 (02) : 150 - 157
  • [20] Validation of an algorithm that determines stroke diagnostic code accuracy in a Japanese hospital-based cancer registry using electronic medical records
    Yasufumi Gon
    Daijiro Kabata
    Keichi Yamamoto
    Ayumi Shintani
    Kenichi Todo
    Hideki Mochizuki
    Manabu Sakaguchi
    BMC Medical Informatics and Decision Making, 17